Automating EC2 With Python

15 min readAug 9, 2019

In this tutorial, we’ll take a look at using Python scripts to interact with the infrastructure provided by Amazon Web Services (AWS). You’ll learn to configure a workstation with Python and the Boto3 library. Then, you’ll learn how to programmatically create and manipulate:

Dependencies and Environment Setup

To start I will need to create a user in my AWS account that has programmatic access to the REST-API’s. For simplicity I will be granting these user admin rights, but please note that is only for simplicity in creating this tutorial. If you are following along you should consult your organization’s IT security policies before using this user in a production environment.

Step 1: In my AWS console I must go to the IAM section under the services menu, then click the Users link and finally click the Add user button which takes me to the screen shown below. In this screen, I give the user the name “boto3-user” and check the box for Programmatic access before clicking the next button.

Step 3: Click through to next since I am not adding any optional tags.

Step 4: I review the user about to be created and then click Create user.

Step 5: Finally, I download credentials as a CSV file and save them.

Next up I need to install the necessary Python 3 libraries locally within a virtual environment, like so:

$ python -m venv venv
$ source venv/bin/activate
(venv)$ pip install boto3 pprint awscli

Lastly, I configure the credentials for the boto3 library using the awscli library making sure to add in the credentials for the Access Key and Secret Key I downloaded in step 5 above.

$ aws configure
AWS Access Key ID [****************3XRQ]: **************
AWS Secret Access Key [****************UKjF]: ****************
Default region name [None]:
Default output format [None]:

Creating an EC2 Instance to Work On

In this section, I am going to go over how to create an AWS region-specific boto3 session as well as instantiate an EC2 client using the active session object. Then, using that EC2 boto3 client, I will interact with that region’s EC2 instances managing startup, shutdown, and termination.

To create an EC2 instance for this article I take the following steps:

Step 1: I click the EC2 link within the Services menu to open the EC2 Dashboard and then click the Launch Instance button in the middle of the screen.

Step 2: In the Choose Amazon Machine Image (AMI) page I click the Select button next to the Amazon Linux AMI.

Step 3: Accept the default t2.micro instance type and click the Review and Launch button.

Step 4: On the review page I expand the Tags section and click Edit Tags to add tags for Name and BackUp, then click the Launch Review and Launch again to go back to the review page before finally clicking the Launch button to launch the instance.

I now have a running EC2 instance, as shown below.

Boto3 Session and Client

At last, I can get into writing some code! I begin by creating an empty file, a Python module, called awsutils.py and at the top, I import the library boto3 then define a function that will create a region-specific Session object.

# awsutilsimport boto3def get_session(region):
    return boto3.session.Session(region_name=region)

If I fire up my Python interpreter and import the module just created above I can use the new get_session function to create a session in the same region as my EC2 instance, then instantiate an EC2.Client object from it, like so:

>>> import awsutils
>>> session = awsutils.get_session('us-east-1')
>>> client = session.client('ec2')

I can then use this EC2 client object to get a detailed description of the instance using pprint to make things a little easier to see the output of calling describe_instances on the client object.

>>> import pprint
>>> pprint.pprint(client.describe_instances())
...

I am omitting the output as it is quite verbose, but know that it contains a dictionary with a Reservations entry, which is a list of data describing the EC2 instances in that region and ResponseMetadata about the request that was just made to the AWS REST API.

Retrieving EC2 Instance Details

I can also use this same describe_instances the method along with a Filter the parameter to filter the selection by tag values. For example, if I want to get my recently created instance with the Nametag with a value of 'demo-instance', that would look like this:

>>> demo = client.describe_instances(Filters=[{'Name': 'tag:Name', 'Values': ['demo-instance']}])
>>> pprint.pprint(demo)
...

There are many ways to filter the output of describe_instances and I refer you to the official docs for the details.

Starting and Stopping an EC2 Instance

To stop the demo-instance I use the stop_instances method of the client object, which I previously instantiated, supplying it the instance ID as a single entry list parameter to the InstanceIds the argument as shown below:

>>> demo = client.describe_instances(Filters=[{'Name': 'tag:Name', 'Values': ['demo-instance']}])
>>> pprint.pprint(client.terminate_instances(InstanceIds=[instance_id]))
{'ResponseMetadata': {'HTTPHeaders': {'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 13:59:20 GMT',
                                      'server': 'AmazonEC2',
                                      'transfer-encoding': 'chunked',
                                      'vary': 'Accept-Encoding'},
                      'HTTPStatusCode': 200,
                      'RequestId': '78881a08-0240-47df-b502-61a706bfb3ab',
                      'RetryAttempts': 0},
 'TerminatingInstances': [{'CurrentState': {'Code': 32,
                                            'Name': 'shutting-down'},
                           'InstanceId': 'i-0c462c48bc396bdbb',
                           'PreviousState': {'Code': 16, 'Name': 'running'}}]}

The output from the last command indicates that the method call is stopping the instance. If I re-retrieve the demo-instance and print the State I now see it is stopped.

>>> demo = client.describe_instances(Filters=[{'Name': 'tag:Name', 'Values': ['demo-instance']}])
>>> demo['Reservations'][0]['Instances'][0]['State']
{'Code': 80, 'Name': 'stopped'}

To start the same instance back up there is a complement method called start_instancesthat works similar to the stop_instances the method that I demonstrate next.

>>> pprint.pprint(client.start_instances(InstanceIds=[instance_id]))
{'ResponseMetadata': {'HTTPHeaders': {'content-length': '579',
                                      'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 14:10:20 GMT',
                                      'server': 'AmazonEC2'},
                      'HTTPStatusCode': 200,
                      'RequestId': '21c65902-6665-4137-9023-43ac89f731d9',
                      'RetryAttempts': 0},
 'StartingInstances': [{'CurrentState': {'Code': 0, 'Name': 'pending'},
                        'InstanceId': 'i-0c462c48bc396bdbb',
                        'PreviousState': {'Code': 80, 'Name': 'stopped'}}]}

The immediate output of the command is that it is pending startup. Now when I refresh the instance and print its state it shows that it is running again.

>>> demo = client.describe_instances(Filters=[{'Name': 'tag:Name', 'Values': ['demo-instance']}])
>>> demo['Reservations'][0]['Instances'][0]['State']
{'Code': 16, 'Name': 'running'}

Alternative Approach to Fetching, Starting, and Stopping

In addition to the EC2.Client a class that I've been working with thus far, there is also an EC2.Instance class that is useful in cases such as this one where I only need to be concerned with one instance at a time.

Below I use the previously generated session object to get an EC2 resource object, which I can then use to retrieve and instantiate an Instance an object for my demo-instance.

>>> ec2 = session.resource('ec2')
>>> instance = ec2.Instance(instance_id)

In my opinion, a major benefit to using the Instance class is that you are then working with actual objects instead of a point in time dictionary representation of the instance, but you lose the power of being able to perform actions on multiple instances at once that the EC2.Client class provides.

For example, to see the state of the demo-instance I just instantiated above, it is as simple as this:

>>> instance.state
{'Code': 16, 'Name': 'running'}

The Instance the class has many useful methods, two of which are start and stop which I will use to start and stop my instances, like so:

>>> pprint.pprint(instance.stop())
{'ResponseMetadata': {'HTTPHeaders': {'content-length': '579',
                                      'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 14:40:20 GMT',
                                      'server': 'AmazonEC2'},
                      'HTTPStatusCode': 200,
                      'RequestId': 'a2f76028-cbd2-4727-be3e-ae832b12e1ff',
                      'RetryAttempts': 0},
 'StoppingInstances': [{'CurrentState': {'Code': 64, 'Name': 'stopping'},
                        'InstanceId': 'i-0c462c48bc396bdbb',
                        'PreviousState': {'Code': 16, 'Name': 'running'}}]}

After waiting about a minute for it to fully stop... I then check the state again:

>>> instance.state
{'Code': 80, 'Name': 'stopped'}

Now I can start it up again.

>>> pprint.pprint(instance.start())
{'ResponseMetadata': {'HTTPHeaders': {'content-length': '579',
                                      'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 14:50:20 GMT',
                                      'server': 'AmazonEC2'},
                      'HTTPStatusCode': 200,
                      'RequestId': '3cfc6061-5d64-4e52-9961-5eb2fefab2d8',
                      'RetryAttempts': 0},
 'StartingInstances': [{'CurrentState': {'Code': 0, 'Name': 'pending'},
                        'InstanceId': 'i-0c462c48bc396bdbb',
                        'PreviousState': {'Code': 80, 'Name': 'stopped'}}]}

Then checking the state again after a short while...

>>> instance.state
{'Code': 16, 'Name': 'running'}

Creating a Backup Image of an EC2.Instance

An important topic in server management is creating backups to fall back on in the event a server becomes corrupted. In this section, I am going to demonstrate how to create an Amazon Machine Image (AMI) backup of my demo-instance, which AWS will then store in it’s Simple Storage Service (S3). This can later be used to recreate that EC2 instance, just like how I used the initial AMI to create the demo-instance.

To start I will show how to use the EC2.Client class and it's create_image method to create an AMI image of demo-instance by providing the instance ID and a descriptive name for the instance.

>>> import datetime
>>> date = datetime.datetime.utcnow().strftime('%Y%m%d')
>>> date
'20190809'
>>> name = f"InstanceID_{instance_id}_Image_Backup_{date}"
>>> name
'InstanceID_i-0c462c48bc396bdbb_Image_Backup_20181221'
>>> name = f"InstanceID_{instance_id}_Backup_Image_{date}"
>>> name
'InstanceID_i-0c462c48bc396bdbb_Backup_Image_20181221'
>>> pprint.pprint(client.create_image(InstanceId=instance_id, Name=name))
{'ImageId': 'ami-00d7c04e2b3b28e2d',
 'ResponseMetadata': {'HTTPHeaders': {'content-length': '242',
                                      'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 15:00:20 GMT',
                                      'server': 'AmazonEC2'},
                      'HTTPStatusCode': 200,
                      'RequestId': '7ccccb1e-91ff-4753-8fc4-b27cf43bb8cf',
                      'RetryAttempts': 0}}

Similarly, I can use the Instance class's create_image method to accomplish the same task, which returns an instance of an EC2.Image the class that is similar to the EC2.Instanceclass.

>>> image = instance.create_image(Name=name + '_2')

Tagging Images and EC2 Instances

A very powerful, yet extremely simple, feature of EC2 instances and AMI images are the ability to add custom tags. You can add tags both via the AWS management console, as I showed when creating the demo-instance with tags Name and BackUp, as well as programmatically with boto3 and the AWS REST API.

Since I have an EC2.Instance object still floating around in memory in my Python interpreter I will use that to display the demo-instance tags.

>>> instance.tags
[{'Key': 'BackUp', 'Value': ''}, {'Key': 'Name', 'Value': 'demo-instance'}]

Both the EC2.Instance and the EC2.Image classes have an identically functioning set of create_tags methods for adding tags to their represented resources. Below I demonstrate adding a RemoveOn tag to the image created previously, which is paired with a date at which it should be removed. The date format used is "YYYYMMDD".

>>> image.create_tags(Tags=[{'Key': 'RemoveOn', 'Value': remove_on}])
[ec2.Tag(resource_id='ami-081c72fa60c8e2d58', key='RemoveOn', value='20190809')]

Again, the same can be accomplished with the EC2.Client class by providing a list of resource IDs, but with the client, you can tag both images and EC2 instances at the same time if you desire by specifying their IDs in the Resource parameter of create_tagsfunction, like so:

>>> pprint.pprint(client.create_tags(Resources=['ami-00d7c04e2b3b28e2d'], Tags=[{'Key': 'RemoveOn', 'Value': remove_on}]))
{'ResponseMetadata': {'HTTPHeaders': {'content-length': '221',
                                      'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 15:10:20 GMT',
                                      'server': 'AmazonEC2'},
                      'HTTPStatusCode': 200,
                      'RequestId': '645b733a-138c-42a1-9966-5c2eb0ca3ba3',
                      'RetryAttempts': 0}}

Creating an EC2 Instance from a Backup Image

I would like to start this section by giving you something to think about. Put yourself in the uncomfortable mindset of a system administrator, or even worse a developer pretending to be a sysadmin because the product they are working on doesn’t have one (admonition… that’s me), and one of your EC2 servers has become corrupted.

Eeek! Its scramble time… you now need to figure out what OS type, size, and services were running on the down server… fumble through setup and installation of the base server, plus any apps that belong on it, and pray everything comes up correctly.

Whew! Take a breath and chill because I’m about to show you how to quickly get back up and running, plus… spoiler alert… I am going to pull these one-off Python interpreter commands into a workable set of scripts at the end for you to further modify and put to use.

Ok, with that mental exercise out of the way let me get back to work. To create an EC2 instance from an image ID I use the EC2.Client class's run_instances method and specify the number of instances to kick-off and the type of instance to run.

>>> pprint.pprint(client.run_instances(ImageId='ami-081c72fa60c8e2d58', MinCount=1, MaxCount=1, InstanceType='t2.micro'))
...

I am omitting the output again due to its verbosity. Please have a look at the official docs for the run_instances method, as there are a lot of parameters to choose from to customize exactly how to run the instance.

Removing Backup Images

Ideally, I would be making backup images on a fairly frequent interval (ie, daily at the least) and along with all these backups come three things, one of which is quite good and the other two are somewhat problematic. On the good side of things, I am making snapshots of known states of my EC2 server which gives me a point in time to fall back to if things go bad. However, on the bad side, I am creating clutter in my S3 buckets and racking up charges with each additional backup I put into storage.

A way to mitigate the downsides of clutter and rising storage charges is to remove backup images after a predetermined set of time has elapsed and, that is where the Tags I created earlier are going to save me. I can query my EC2 backup images and locate ones that have a particular RemoveOn tag and then remove them.

I can begin by using the describe_images method on the EC2.Client class instance along with a filter for the 'RemoveOn' tag to get all images that I tagged to remove on a given date.

>>> remove_on = '20190809'
>>> images = client.describe_images(Filters=[{'Name': 'tag:RemoveOn', 'Values': [remove_on]}])

Next up I iterate over all the images and call the client method deregister_image passing it the iterated image ID and voila - no more image.

>>> remove_on = '201812022'
>>> for img in images['Images']:
...     client.deregister_image(ImageId=img['ImageId'])

Terminating an EC2 Instance

Well, having covered starting, stoping, creating, and removing backup images, and launching an EC2 instance from a backup image, I am nearing the end of this tutorial. Now all that is left to do is clean up my demo instances by calling the EC2.Client class's terminate_instances and passing in the instance IDs to terminate. Again, I will use describe_instances with a filter for the name of demo-instance to fetch the details of it and grab its instance ID. I can then use it terminate_instances to get rid of it forever.

Note: Yes, this is a forever thing so be very careful with this method.

>>> demo = client.describe_instances(Filters=[{'Name': 'tag:Name', 'Values': ['demo-instance']}])
>>> pprint.pprint(client.terminate_instances(InstanceIds=[instance_id]))
{'ResponseMetadata': {'HTTPHeaders': {'content-type': 'text/xml;charset=UTF-8',
                                      'date': 'Fri, 09 Aug 2019 15:55:20 GMT',
                                      'server': 'AmazonEC2',
                                      'transfer-encoding': 'chunked',
                                      'vary': 'Accept-Encoding'},
                      'HTTPStatusCode': 200,
                      'RequestId': '78881a08-0240-47df-b502-61a706bfb3ab',
                      'RetryAttempts': 0},
 'TerminatingInstances': [{'CurrentState': {'Code': 32,
                                            'Name': 'shutting-down'},
                           'InstanceId': 'i-0c462c48bc396bdbb',
                           'PreviousState': {'Code': 16, 'Name': 'running'}}]}

Pulling Things Together for an Automation Script

Now that I have walked through these functionalities issuing commands one-by-one using the Python shell interpreter (which I highly recommend readers to do at least once on their own to experiment with things) I will pull everything together into two separate scripts called ec2backup.py and amicleanup.py.

The ec2backup.py script will simply query all available EC2 instances that have the tag BackUp then create a backup AMI image for each one while tagging them a with a RemoveOn tag with a value of 3 days into the future.

# ec2backup.pyfrom datetime import datetime, timedelta
import awsutilsdef backup(region_id='us-east-1'):
    '''This method searches for all EC2 instances with a tag of BackUp
       and creates a backup images of them then tags the images with a
       RemoveOn tag of a YYYYMMDD value of three UTC days from now
    '''
    created_on = datetime.utcnow().strftime('%Y%m%d')
    remove_on = (datetime.utcnow() + timedelta(days=3)).strftime('%Y%m%d')
    session = awsutils.get_session(region_id)
    client = session.client('ec2')
    resource = session.resource('ec2')
    reservations = client.describe_instances(Filters=[{'Name': 'tag-key', 'Values': ['BackUp']}])
    for reservation in reservations['Reservations']:
        for instance_description in reservation['Instances']:
            instance_id = instance_description['InstanceId']
            name = f"InstanceId({instance_id})_CreatedOn({created_on})_RemoveOn({remove_on})"
            print(f"Creating Backup: {name}")
            image_description = client.create_image(InstanceId=instance_id, Name=name)
            images.append(image_description['ImageId'])
            image = resource.Image(image_description['ImageId'])
            image.create_tags(Tags=[{'Key': 'RemoveOn', 'Value': remove_on}, {'Key': 'Name', 'Value': name}])if __name__ == '__main__':
    backup()

Next up is the amicleanup.py script which queries all AMI images that have a RemoveOn tag equal to the day’s date it was run on in the form “YYYYMMDD” and removes them.

# amicleanup.pyfrom datetime import datetime
import awsutilsdef cleanup(region_id='us-east-1'):
    '''This method searches for all AMI images with a tag of RemoveOn
       and a value of YYYYMMDD of the day its ran on then removes it
    '''
    today = datetime.utcnow().strftime('%Y%m%d')
    session = awsutils.get_session(region_id)
    client = session.client('ec2')
    resource = session.resource('ec2')
    images = client.describe_images(Filters=[{'Name': 'tag:RemoveOn', 'Values': [today]}])
    for image_data in images['Images']:
        image = resource.Image(image_data['ImageId'])
        name_tag = [tag['Value'] for tag in image.tags if tag['Key'] == 'Name']
        if name_tag:
            print(f"Deregistering {name_tag[0]}")
        image.deregister()if __name__ == '__main__':
    cleanup()

Cron Implementation

A relatively simple way to implement the functionality of these two scripts would be to schedule two cron tasks on a Linux server to run them. In an example below I have configured a cron task to run every day at 11PM to execute the ec2backup.py script then another at 11:30 PM to execute the amicleanup.py script.

0 23 * * * /path/to/venv/bin/python /path/to/ec2backup.py
30 23 * * * /path/to/venv/bin/python /path/to/amicleanup.py

AWS Lambda Implementation

A more elegant solution is to use AWS Lambda to run the two as a set of functions. There are many benefits to using AWS Lambda to run code, but for this use-case of running a couple of Python functions to create and remove backup images the most pertinent are high availability and avoidance of paying for idle resources. Both of these benefits are best realized when you compare using Lambda against running the two cron jobs described in the last section.

If I were to configure my two cron jobs to run on an existing server, then what happens if that server goes down? Not only do I have the headache of having to bring that server back up, but I also run the possibility of missing a scheduled run of the cron jobs that are controlling the EC2 server back up and cleanup process. This is not an issue with AWS Lambda as it is designed with redundancy to guarantee extremely high availability.

The other main benefit of not having to pay for idle resources is best understood in an example where I may have spun up an instance just to manage these two scripts running once a day. Not only does this method fall under the potential availability flaw of the last item, but an entire virtual machine has now been provisioned to run two scripts once a day constituting a very small amount of computing time and lots of wasted resources sitting idle. This is a prime case for using AWS Lambda to improve operational efficiency.

Another operational efficiency resulting from using Lambda is not having to spend time maintaining a dedicated server.

To create an AWS Lambda function for the EC2 instance image backups follows these steps:

Step 1. Under the Service, menu clicks Lambda within the Compute section.

Step 2. Click the Create function button.

Step 3. Select the Author from scratch option, type “ec2backup” as a function name, select Python 3.6 from the run-time options, then add the boto3-user for the role and click Create Function as shown below:

Step 4. In the designer select CloudWatch Events and add a cron job of cron(0 11 * ? * *) which will cause the function to run every day at 11 PM.

Step 5. In the code editor add the following code:

import boto3
import os
from datetime import datetime, timedeltadef get_session(region, access_id, secret_key):
    return boto3.session.Session(region_name=region,
                                aws_access_key_id=access_id,
                                aws_secret_access_key=secret_key)def lambda_handler(event, context):
    '''This method searches for all EC2 instances with a tag of BackUp
       and creates a backup images of them then tags the images with a
       RemoveOn tag of a YYYYMMDD value of three UTC days from now
    '''
    created_on = datetime.utcnow().strftime('%Y%m%d')
    remove_on = (datetime.utcnow() + timedelta(days=3)).strftime('%Y%m%d')
    session = get_session(os.getenv('REGION'),
                          os.getenv('ACCESS_KEY_ID'),
                          os.getenv('SECRET_KEY'))
    client = session.client('ec2')
    resource = session.resource('ec2')
    reservations = client.describe_instances(Filters=[{'Name': 'tag-key', 'Values': ['BackUp']}])
    for reservation in reservations['Reservations']:
        for instance_description in reservation['Instances']:
            instance_id = instance_description['InstanceId']
            name = f"InstanceId({instance_id})_CreatedOn({created_on})_RemoveOn({remove_on})"
            print(f"Creating Backup: {name}")
            image_description = client.create_image(InstanceId=instance_id, Name=name)
            image = resource.Image(image_description['ImageId'])
            image.create_tags(Tags=[{'Key': 'RemoveOn', 'Value': remove_on}, {'Key': 'Name', 'Value': name}])

Step 6. In the section under the code, the editor adds a few environment variables.

REGION with a value of the region of the EC2 instances to backup which is us-east-1 in this example
ACCESS_KEY_ID with the value of the access key from the section where the boto3-user was setup
SECRET_KEY with the value of the secret key from the section where the boto3-user was setup

Step 7. Click the Save button at the top of the page.

For the image clean up functionality follow the same steps with the following changes.

Step 3. I give it the name of “amicleanup”

Step 4. I use a slightly different time configuration cron(30 11 * ? * *) to run at 11:30 PM

Step 5. Use the following cleanup function:

import boto3
from datetime import datetime
import osdef get_session(region, access_id, secret_key):
    return boto3.session.Session(region_name=region,
                                aws_access_key_id=access_id,
                                aws_secret_access_key=secret_key)def lambda_handler(event, context):
    '''This method searches for all AMI images with a tag of RemoveOn
       and a value of YYYYMMDD of the day its ran on then removes it
    '''
    today = datetime.utcnow().strftime('%Y%m%d')
    session = get_session(os.getenv('REGION'),
                          os.getenv('ACCESS_KEY_ID'),
                          os.getenv('SECRET_KEY'))
    client = session.client('ec2')
    resource = session.resource('ec2')
    images = client.describe_images(Filters=[{'Name': 'tag:RemoveOn', 'Values': [today]}])
    for image_data in images['Images']:
        image = resource.Image(image_data['ImageId'])
        name_tag = [tag['Value'] for tag in image.tags if tag['Key'] == 'Name']
        if name_tag:
            print(f"Deregistering {name_tag[0]}")
        image.deregister()

Conclusion

In this article, I have covered how to use the AWS Python SDK library Boto3 to interact with EC2 resources. I demonstrate how to automate the operational management tasks to AMI image backup creation for EC2 instances and subsequent clean up of those backup images using scheduled cron jobs on either a dedicated server or using AWS Lambda.