AWS Billing Part 3: Savings Plans + Usage Quirks

Jack McClary - Member of Technical Staff, CloudNatix

In the first post in this series, I detailed how AWS CUR files calculate an instance’s usage along with an interesting quirk of the system where usage is moved around that I will call “Usage shifting”. In the last blog post in this series, I dove into detail of how Savings Plans appear in AWS CUR files. In this post I will walk through how the processes laid out in the other two posts are combined in AWS and lead to significantly unexpected behavior.

Things start to get interesting when you combine the interactions of a usage calculation that is eventually consistent and Savings Plans that have hourly bounds. What we find is that frontloading an instance’s usage can cause it to be billed at a higher on-demand rate when, at a more granular level, one would expect usage to fall within a Savings Plan’s commitment.

For the sake of understanding the interplay between the Usage calculation and Savings Plans, in examples 1 and 2, I will make some changes to the CUR file to simplify the dynamics of Savings Plans. In these examples, I will collapse the Rate behavior to a single column that represents the rate that would be charged, and I will ignore the SavingsPlanRecurringFee and SavingsPlanNegation LineItemTypes. Please note that an easy place to get confused is that a LineItemType of “SavingsPlanUsage” represents Savings Plans, but a LineItemType of “Usage” represents usage not covered by Savings Plans. Although these choices introduce some inaccuracies, I will clear them up later and decreasing the complexity of each example will help with understanding. In addition to these changes, I will present both the expected values without Usage shifting and the AWS values for each example.

Example 1: Full Savings Plan coverage

In this example, you have a savings plan commitment of $1/hr with a 20% discount (80% of the cost). You use 1 instance “A” with a rate of $1/hr, starting at 1:30, ending at 2:45. The expected result without the Usage shifting is:

Expected Result

Data Table

And the result that we see back from AWS with Usage shifting is:

AWS Result

Data Table

In this example, because the Savings Plan is never exceeded, even with Usage shifting in the first time block, the overall cost of the AWS result is consistent with our expectations. If the usage values in this example don’t make sense, please refer to the first blog post in this series to understand.

Example 2: Usage shifting causing Savings Plan overutilization

Issues start to arise when the savings plan is exceeded due to AWS shifting usage to the start of an instance’s life. This causes usage to be charged at a higher rate than expected. In this example, you again have a savings plan commitment of $1/hr at a 20% discount (80% of the cost). But this time, you use 2 instances “A” and “B” each at a rate of $1/hr, and both starting at 1:30, and ending at 3:30. This looks like:

Expected Result

Data Table

Because the usage exceeds the Savings Plan in the time period of 2:00-3:00 for 1 hour, we expect to see 1 hour of usage at on-demand price. Now the behavior we actually see is:

AWS Result

Data Table

The difference here is that because the usage in the first time block is set to 1 for both instances, the total usage in the 1:00-2:00 time period appears to be 2.0, which exceeds the savings plan commitment and is thus billed at full price. When the usage is decreased later to account for the initial increase, the expected usage would have fallen within the savings plan commitment, and thus theoretically would have been billed at a lower rate. Creating a difference in total cost between what one would expect and what AWS reports.

Now, If you have been paying close attention, you may find yourself saying, “Wait, but don’t you have to commit to spending a certain amount of money as the basis for a savings plan? Also, how did those simplifications affect the final result?” Those questions are what we hope to answer in our last example!

Example 3: An accurate picture

For our last example, we will remove the simplifications that I mentioned above. I will be changing the Rate column for SavingsPlanCoveredUsage and SavingsPlanNegation to show the on-demand rates, and I will be showing the SavingsPlanNegation and SavingsPlanRecurringFee LineItemTypes. As warned of in the start, this will make the process more complex. The expected file would look like this:

Expected Result

Data Table

And the real AWS report looks like this:

AWS Result

Data Table

The differences between what we expect and what AWS reports largely follows the same differences as in example 2. But, because we are accounting for the savings plan behavior more accurately, we can see that even though there is no reported usage in the 3:00-4:00 time block, we still pay the savings plan commitment. This change increases the cost from an expected $3.40 to an actual $4.40, a 29% cost increase. Although this is an artificial scenario, we have found this dynamic causing unexpected On-Demand usage in real scenarios. In fact, we discovered this behavior while trying to accurately model and predict Savings Plan utilizations to enable us to intelligently balance Spot, On-Demand, and SavingsPlanCovered Usage for our Autopilot customers.

Summary of Examples

From the examples in this blog post, we see that the quirks in usage calculation that previously were simply interesting and “eventually consistent”, lead to unexpected behavior when paired with mechanisms that rely on more granular temporal consistency. This can lead to potential confusion for end users. While I searched for a negation of this difference, I could not find it in the CUR files of our experiments.

Disclaimer

I would like to point out that according to this response, hourly billed instances are stated as being charged for a full hour when the instance enters the running state. Thus, AWS is not doing anything “wrong” or outside of their SLAs to my knowledge. In fact, by giving increased accuracy in the tail end of an instance’s life they are exceeding their SLA even if it may be in an unexpected way. I would also like to explicitly state that the examples given in this article are based on experiments that we performed, but individual numbers have been tweaked to be easier to understand (nobody wants gross decimals in a blog post). These examples are faithful to the findings from our experiments.

When we discovered this behavior, we reached out to AWS support to understand what was happening. The guidance that we received was that “... the best way to avoid charge discrepancy’s is to avoid stopping and starting the instances constantly which may cause the charge concern. Savings Plan commitments will apply correctly to instances constantly running as there will be no change to the way EC2 is then charged…” which, based on our knowledge, is sound advice.

Why this matters

At Cloudnatix, we provide highly optimized and cost efficient cloud operations for our customers. When integrating savings plans with our sophisticated auto-pilot capabilities, the possibility of creating and destroying instances for our customers regularly is non-trivial. This atypical utilization of cloud resources must be paired with a deep understanding of cloud systems and infrastructure. This technical depth and attention to detail are what CloudNatix prides ourselves on and what makes us best in class. We worry about the details so you don’t have to.

Previous
Previous

“Right-scaling” clusters

Next
Next

AWS Billing Part 2: Savings Plans