It's easy to spend too much on AWS. The promise is "pay for what you need" - but in reality, it's "pay for everything you spin up and forget about".
But don't worry!
It's also easy to optimize your AWS costs. Follow our detailed guide and weguarantee you'll improve your bill.
Who are we to say this?
Stax is a tool that helps companies with AWS cost management. We've analysed data from 500+ big and small AWS customers. We'll tell you exactly what to do.
Step 1: Work Out What to Focus On (3 minutes)
Don't waste time looking at services which aren't costing you much.
Everyone uses different AWS services depending on their architecture. Have a look at your most recent bill and you'll see what you're using most.
Or even quicker - we can tell you what to care about. Our data shows similar patterns for most customers. As a simple rule of thumb:
- Concentrate on EC2, RDS, EBS, ELB, NAT Gateway, S3, and CloudWatch.
- EC2 is the biggest for 54% of our customers, top ten for 100%.
- RDS is in the top two for 40% of our customers, top ten for 91%.
- EBS is in the top two for 33% of our customers, top ten for 98%.
- ELB isn't near the top often but is top ten for 85%.
- S3 is top ten for 70%.
- CloudWatch is top ten for 67%.
- NAT Gateway is top ten for 65%.
Next on the list are ElastiCache, Elasticsearch, Elastic IP, Route 53, and then the others. But the iron-clad Law of Diminishing Returns kicks in here. You might end up spending lots of time to save little. Look at these in detail only if you know you're spending a lot on them.
(If you use Stax it takes two seconds to see where your money is going.)
Step 2: Check for Unused Resources (1 to 8 hours)
The most effective way to waste money on AWS is to pay for infra you're not using.
For each of the most important services in your bill, go to the console and look at each resource. There's always a "list" page which shows them all for a given region. Click into each one and check foryourself that it's in use:
- Is it something you recognise?
- Would you expect it to be still in use?
These checks are slightly different for each resource type:
- EC2 instances: look at CloudWatch to see if they're in use.
- RDS instances: look at CloudWatch to see if they're in use.
- EBS volumes: check that they're attached.
- EBS snapshots: check that they're not too old.
- ELB load balancers: check that they've got instances attached to them.
- S3 buckets: check that you think they should still be in use.
- NAT gateways: look at CloudWatch to make sure they have traffic.
- Elastic IPs: make sure they're in use.
Don't go into too much detail at this step. You're probably going to find a bunch of resources you can get rid of without thinking too much. No need to look at individual S3 objects or CloudWatch metrics yet.
Remember to check every region in every account. AWS makes it easy to spin resources up in the wrong place by accident.
There are potentially big gains to be had here. On average Stax clients save about 5% to 15% of their AWS spend on unused resources when they first use the tool!
(If you use Stax, you get a continuous, automated check for unused resources - across all regions.)
Step 3: Update Obsolete Instances (1 to 5 days)
Save by using the newest, cheapest EC2 and RDS instances.
Every year or two, AWS releases new "generations" of instance type "families" for EC2 and RDS. These are invariably cheaper and perform better than their ancestors. If you keep on using the old instance types, you're paying a tax straight to Bezos's pocket.
This step is straightforward, but the details can get tricky.
You need to look at each of your EC2 and RDS instances. (Also ElastiCache, Elasticsearch and Redshift if you're using them enough.)
- Go through each instance.
- Work out its instance type.
- Now check the AWS pricing page for the service to see if there's a newer generation.
- (Newer generations have the same first letters, but a higher number. For example, t1 went to t2 then to t3.)
- If there is, work out the price difference. There are about 730 hours in a month so multiply the hourly price by 730 to get a useful number.
- If the price difference is big enough to justify your time, then work out how to upgrade the instance.
This last step is easy if you're fully automated - "cattle, not pets", as the saying goes. Change the type in your infrastructure code and run your scripts or pipeline. This will kill the old instance and spin up the new one. No mess, no fuss.
What if you're not automated but are using auto-scaling groups? Change the instance type in the launch config. The ASG will take care of replacing the obsolete instances.
No automation? Don't worry, it should still be easy enough, with a small outage. These pages from the AWS site tells you how to change instance types:
There are big gains to be had here too. New Stax clients find they can save 3% to 10% of their AWS bill by modernising their instance types.
(Stax users get notified immediately when cheaper instance types become available.)
Step 4: Rightsize Your Resources (1 to 5 days)
Reduce your AWS bill by scaling down your resources to the right size.
It's easy to end up with the wrong resource sizes. When an app or service is first implemented, no-one is sure what its load profile will be. The developers choose an arbitrary size. If that size is too small, alerts go off and gets increased. But if it's too large, there's no immediate alert to decrease it.
Rightsizing in AWS takes some manual effort. Follow these steps:
- Look at each sizable resource.
- This is across each account, each region, and each service.
- For each of them, click through to the CloudWatch monitoring.
- Look at the appropriate load metrics to see if it's underutilised.
- EC2 instances: CPU and memory metrics.
- RDS instances: CPU, memory, and IOPS metrics.
- Other instances: usually CPU metrics.
- io1 EBS volumes: IOPS metrics.
(Memory metrics for EC2 instances: memory isn’t published to CloudWatch by default. This is because AWS can't see inside the hypervisor to the OS. Adding the metrics isn't hard but requires some work:
- On Linux: Monitoring Memory and Disk Metrics for Amazon EC2 Linux Instances
- On Windows: Sending Logs, Events, and Performance Counters to Amazon CloudWatch)
Now make a list of your underutilised resources. You now need to work out what size to scale them down to. This is reasonably easy, with some tricks:
- EC2 instances go down in size, for instance from c5.xlarge to c5.large. Have a look on the AWS website (https://aws.amazon.com/ec2/pricing/on-demand/) to get the possible instance types.
- In some cases you can't go down further in the same family. For instance, m5.large is the smallest m5. In this case you might scale down to a t3.large. It's similar size, but cheaper. (This is because it's "burstable" so the performance is more limited.)
- You also need to consider the goal of the instance type. An r5 is a "memory-optimized" instance. If it's using minimal memory, you might want to go to a cheaper m5 of the same size.
- RDS or other instances are the same as EC2 instances. There are less options here so less to consider.
- io1 EBS volumes are for high-IO workloads. They have "provisioned IOPS", which means a certain number of guaranteed IO operations. You can reduce this number without an outage. This is important because provisioned IOPS are super-expensive.
Never change down by more than one step. No-one can fully model how a workload will perform on a smaller instance size. It's rare that metrics become exactly half or double. If you change by more than one step, you put your workloads in danger of an outage.
- Now you have the list of changes to make. Take the time to work out the cost difference between the old instance type and the new one. (Multiply the hourly price difference by 730.)
- Ignore everything where the monthly saving is too small to justify your time. Every change has a risk. Don't take that risk for $2 - but for $200 or $2000, it might make sense.
- Now make the changes. See the previous section (on obsolete instances) for an explanation of how.
This is high effort but there is a lot of money to save. New Stax clients find they can save 5% to 15% of their AWS bill by rightsizing their resources.
(Stax clients get all this rightsizing modelling done automatically by the tool. It's updated continuously using the current state of your metrics. It reduces the effort a lot - this is why we built Stax.)
Step 5: Check Your S3 Storage Types (2 to 8 hours)
Heavy use of S3 can always be more cost efficient.
S3 is a great file store. It's cheap until you're using terabytes of storage. Skip this section if it's not costing you enough to bother with.
But if you've got significant data in S3 then you need to understand storage types:
- Standard: for files that are active - cheap storage, cheap access.
- Infrequent Access: for files that are inactive - cheaper storage, more expensive access.
- Glacier: for backup copies - much cheaper storage, much more expensive access.
We'll ignore Glacier. It has a different access pattern and requires changes to your application to use. We'll also ignore One-Zone Infrequent Access. It's like Infrequent Access but a little cheaper with a little more possible data loss. Don't use it unless you know what you're doing.
The easy way to do this is to use AWS's S3 Analytics:
But this takes time to set up. If you need a fix right now, do this:
- Look at all your S3 buckets.
- Consider the objects in that bucket and how your apps access them.
- If they're not accessed frequently, move them to Infrequent Access.
To move objects, set up a lifecycle rule:
This can be a big win. Most Stax clients don't use enough S3 for it to matter. But for those who do, we've seen up to 20% of their AWS bill saved.
(Stax clients see this analysis done continuously for each bucket.)
Step 6: Buy Reserved Instances (1 to 8 hours)
RIs can save you a lot of money - in the right circumstances.
Reservations are crucial but wait until your cost baseline is clean. Otherwise you end up with reservations which cost you money rather than saving it.
When reserving capacity, you commit to paying money to AWS over a period. In exchange, you get a significant discount. Only buy reservations for those apps where:
- Your cost baseline is clean.
- Your architecture won't change in the next 6-9 months.
- They cost a significant amount of money.
- They're using EC2, RDS, Redshift, DynamoDB, or ElastiCache.
When buying RIs:
- Always choose a 1 year term.
- Always buy standard, not convertible RIs.
- Don't specify an availability zone.
- This gives you the best balance of discounts vs flexibility.
What actual RIs to buy? AWS Cost Explorer does a good job of that:
This modelling uses your data, so you should use it when you have 30 to 60 days without change.
Keep in mind that AWS allows "instance size flexibility" for Linux instances. This means that a smaller size will "flex" up to cover parts of a bigger instance. Conversely, a bigger size will flex down to cover smaller sizes.
Do your modelling on a "no upfront cost" RI. But if you happen to have the cash, you'll always save more money by paying more upfront.
Stax customers see 10% to 20% of their AWS bill saved with reservations.
(Stax pulls the RI recommendations directly from AWS, so clients have convenient access.)
End of the Easy Part
These first six steps are easy. It should take a week or two to finish these for average accounts. You could save 30% to 40% of your AWS bill.
The next steps are harder. They require architectural changes. Changing applications themselves is always harder than clicking around in AWS. But there's a lot of power here.
Step 7: Schedule Your Services Off and On
Development or testing infrastructure is probably used during work hours. Outside that time, you can turn it off.
This isn't easy. AWS provide a scheduler to do this for EC2:
It's limited but might be good enough. For other services you have to engineer a system to do this. This could take weeks.
We've seen clients save 10% to 35% of their AWS bill by doing this.
Step 8: Auto-Scale Your Services
Every EC2 instance in AWS should run in an auto-scaling group for resilience. This applies even if it doesn't actually auto-scale.
Auto-scaling is also great for cost saving - if your workload suits it. AWS provide a large product around this:
This is more an art than a science. It requires constant tuning and monitoring. It's always a balance between cost and resilience and quality.
One rule of thumb is to use smaller instance sizes for auto-scaling. The smaller sizes give better granularity and responsiveness.
This is the next step from scheduling. It provides better tailoring of your app's supply to the market demand. But it requires a significant amount of engineering effort.
We've seen clients save 10% to 35% of their AWS bill here too.
Step 9: Use Spot Instances
Spot instances are a different way of paying for EC2. A spot instance is from the spare "pool" of AWS instances. It's heavily discounted - but if someone else needs it, AWS can take it back.
Spot instances are the next level of engineering challenge. Your workload needs to be resilient and quick to launch. If you can auto-scale, then you can use spot. But it does come with more challenges.
We've seen clients save 10% to 35% of their AWS bill here.
Step 10: Talk To AWS
Once your spend gets higher, AWS will give you an Account Manager. Above about $100k per month you can start to talk to them about discounts. There are discounts at the individual service level. There are also whole account discounts, though they start at an even higher spend.
AWS will also give you help reducing your costs. This doesn't sound like it's in their interest. But they really do want you to have the best possible experience on AWS. They know that if you have a good experience, you'll stay around.
(I know this sounds ridiculous, but we've seen in practice many times with Stax customers.)
How Stax Can Help
We built Stax to help with all of these AWS cost optimization tasks. We’ve done them all ourselves more times than we want to think about. Stax helps us sleep better at night because we’re more comfortable with our AWS. It can help you too.
The key points are:
- Stax gives you visibility of your whole AWS environment.
- Stax notifies you when there’s an unexpected cost change.
- Stax allows you to break up your AWS into parts that make sense to you.
- Stax tells you exactly how much money you’re wasting.
- Stax exposes the Cost Explorer RI recommendations more conveniently.
Aside from cost, Stax also allows you to stay across your compliance posture. Out of the box we check the CIS AWS Foundations Benchmark. You can also configure our checks for your own best practice. This gives us a lot more confidence that our AWS is done right.