Why an AWS EC2 Instance May Slow Down over Time, Part 1: Storage Types
We all have probably seen situations in which someone creates a new EC2 instance in the Amazon Web Services (AWS) cloud and finds that while the instance performed really well at first, it gets slower over time. There are any number of reasons why this can happen, but if an instance runs well at first and then gets slower later on, and you cannot attribute the slowdown to an increased workload, then the problem is likely tied to the instance's storage.
As much as I would like to jump right in to troubleshooting problems where EC2 instances get slower over time, I need to give you some background information first. The problem is very often connected to the type of storage that is being used. As such, I want to start out by describing some of the storage types that are used by EC2 instances and then in Part 2 I will show you how this information plays into the troubleshooting process.
Any time that you create a new EC2 instance, you are required to provision storage that the instance can use. If you look at Figure 1, for example, you can see that the instance that I am creating is being configured to use 30 GB of GP2 storage as its root volume. These are the default settings for a Windows Server instance.
As you look at the figure above, you will notice that there are two different configuration changes that you can make with regard to storage (not counting adding extra volumes). You can change the volume size and you can change the type of storage that is being used. GP2 storage, which is what is being used in this case, is general-purpose storage that is based on the use of SSD media. While GP2 storage is suitable for many workloads, it is not the only type of storage that is available. As you can see in Figure 2, there are seven different types of storage that you can choose from.
As you can see in the figure, there are actually two different general-purpose storage options (GP2 and GP3). The GP2 and GP3 storage options are very similar to one another. Both have the same level of durability (99.8 percent to 99.9 percent). They also both support volumes ranging in size from 1 GB to 16 TB, and under the right circumstances they can deliver up to 16,000 IOPS. The main difference between GP2 and GP3 storage is that GP3 offers a greater throughput per volume. GP2 has a maximum throughput of 250 MiB/s whereas GP3 storage can deliver up to 1000 MiB/s of throughput.
Although general-purpose storage can handle a significant number of IOPS, organizations that need to run IOPS intensive workloads often opt for provisioned IOPS volumes (Types IO1 and IO2). As you can see in Figure 3, a Provisioned IOPS volume allows you to specify the number of IOPS that you need for the volume to be able to handle. An IO1 volume can be configured to provide between 100 and 5000 IOPS, while an IO2 volume can deliver between 100 and 100,000 IOPS.
One thing to keep in mind about provisioned IOPS volumes is that in order to achieve such high levels of throughput, Amazon stripes data across multiple disks. The reason why I mention this is because the use of multiple disks means that there is a direct relationship between IOPS and capacity. You will notice in the figure above, for example, that even though an IO2 volume can achieve up to 100,000 IOPS, that number is capped at 1,000 IOPS per GB of storage. Hence, a 10 GB volume would only be able to deliver a maximum of 10,000 IOPS as opposed to 100,000 IOPS.
The third type of volume that I need to discuss (although there are some additional volume types that are beyond the scope of this article) is a Throughput Optimized HDD. A Throughput Optimized HDD volume (ST1) consists of HDD storage that is optimized for throughput rather than IOPS. It's a good choice for workloads that tend to perform large sequential read or write operations. You cannot use an ST1 volume as a boot volume.
So now that I have outlined some of the more common volume types, I want to turn my attention to how the volume type selection can cause an EC2 instance to initially perform well and then begin to slow down for no apparent reason. I will explain everything in Part 2 of this series.
Brien Posey is a 21-time Microsoft MVP with decades of IT experience. As a freelance writer, Posey has written thousands of articles and contributed to several dozen books on a wide variety of IT topics. Prior to going freelance, Posey was a CIO for a national chain of hospitals and health care facilities. He has also served as a network administrator for some of the country's largest insurance companies and for the Department of Defense at Fort Knox. In addition to his continued work in IT, Posey has spent the last several years actively training as a commercial scientist-astronaut candidate in preparation to fly on a mission to study polar mesospheric clouds from space. You can follow his spaceflight training on his Web site.