Part 1 – Virtual Compute
When designing infrastructure systems, whether creating new applications or deploying existing software, it’s crucial to manage cost. Costs come from a variety of sources, and every approach to delivering infrastructure has its own tradeoffs and complexities. Cloud infrastructure systems create a whole new range of variables in these complex equations.
In addition, no two clouds are the same! Some bundle components while others offer more granular purchasing. Some bill in different time increments, and many offer a variety of payment structures, each with differing economic ramifications. How do you figure out what each costs and make a choice?
To help you work this through, we’ve created an example for you. For this example, let’s look at a fairly common scenario, a mobile application with its backend in the cloud. This application shares pictures in some way, and has about 5 million active monthly users. Let’s go through what instance types this application will need to meet that user-driven workload and then price out what that will cost in an average month on Google Cloud Platform and compare against Amazon Web Services.
Our example application has 4 components:
- An API frontend that mobile devices will contact for requests and actions. This portion will consume the majority of the compute cycles.
- A static marketing and blog front end.
- An application layer that will process and store images as they come in or are accessed.
- And on the back end, a Cassandra cluster to store operational metadata.
For capacity planning, we have scoped as follows:
- The API frontend instances can respond to roughly 80 requests per second. We expect about 350 requests per second given this number of users. Therefore we should only need four regular instances for this layer.
- The marketing front end shouldn’t need more than two instances for redundancy.
- The application layer will need four instances for image processing and storage control.
- The Cassandra cluster will need five instances with a higher memory footprint. Let’s assume for now that the workload is entirely static, and autoscaling isn’t being used (oh don’t worry, we’ll add that and more back in later).
In Figure 1, you can see our example application logical architecture looks like this:
To explain the nuances of cloud pricing, let’s use Google Cloud Platform and Amazon Web Services as the example cloud infrastructure providers, and start at the most simple, on-demand model. We can use calculators that each provider offers to find out correct pricing quickly:
Please note that we completed these calculations on January 12, 2015, and have included the output prices in this post. Any discrepancies are likely due to pricing or calculator changes following the publishing of this post.
Here is the output of the pricing calculators:
Google Cloud Platform estimate:
Amazon Web Services estimate:
It’s important to note that right away things don’t look equivalent, with Google’s pricing being 38% lower. Why? Google includes an automatic discount called Sustained Usage Discount, which reduces the cost of long-running instances. Since we didn’t autoscale or otherwise vary our system over the course of the month, the full 30% discount applies. Even without that, pricing before the discount comes in at $3729.86, or an 11% discount off Amazon’s on-demand rates. Over the course of a year, going with Google would save you just over $19,000!
Amazon Web Services has an alternate payment model, where you can make a commitment to run infrastructure for a longer period of time (either 1 or 3 years), and opt to pay some portion of the costs up front, which they call Reserved Instances. Here are the costs for our example app with Amazon’s Reserved Instance pricing:
Amazon Web Services, no-upfront, 1 year estimate:
Over a one-year term with Amazon, if you commit to pay for the instance for that entire period, and you opt for the “no-upfront” option, you still end up with a 13% higher cost than making no commitment to Google.
Amazon Web Services, partial upfront, 1 year estimate:
Effective monthly: $2607.21
If you opt to pay over $18k up front using the “partial upfront” model, you arrive at a lower price, saving $44 dollars (not thousands) over the course of the year
Amazon Web Services, all upfront, 1 year estimate:
Effective monthly: $2554.08
If you choose instead to pay 100% of the yearly cost up front, you’d end up saving $681.78 over the course of the year versus Google Cloud Platform, or 2.3%. As you can see, however, the upfront payment is over $30,000!
Similarly, Amazon offers three-year options for the partial upfront and all upfront models:
Partial upfront, 3 year estimate:
Effective monthly: $1664.15
All upfront, 3 year estimate:
Effective monthly: $1563.97
If you’re willing to part with just over $56,000 for the three-year, all upfront Reserved Instance, you’d receive a 40% discount off of Google’s rate, for a total projected gap of over $37k.
However, as I’m sure you can surmise, there are several risks that a significant up front commitment and payment create. The bottom line –- you’re locked in to a long-term pricing contract, and you risk missing out on substantial savings. Lets look at why:
- Infrastructure prices will drop, either for Google (which has happened 3 times in the last 12 months, as we’ve reintroduced Moore’s law to the cloud), or for Amazon (which has happened 2 times in the last 12 months). For 2014, this worked out to an average of a 4.85% price reduction per month on Google Cloud Platform. Due to on-demand pricing, any reduction in prices is something you automatically receive on GCP.
- Also, don’t forget, capital is expensive! Most businesses pay a ~7% per year cost of capital, which reduces the value of these up-front purchases significantly. For this example, that adds an effective $11,823.63 to the 3-year all up-front Reserved Instance price from Amazon.
So, let’s revisit that $37,689.40 gap. By adding in the cost of capital, and subtracting likely instance price reductions, even at the most aggressive discount AWS offers, AWS costs $60,244.21 and Google Cloud Platform costs $57,959.57, which equates to a 3.9% cost advantage.
By combining conservative evaluations of the basic facts of public cloud pricing dynamics (3% per month price reductions, 7% cost of capital) even 3-year all-upfront RI’s from AWS are not cost efficient compared to on-demand Sustained Use Discounts from Google Cloud Platform.
There are also cost risks to this structure presented by commitment to specific usage choices.
- New instance types might make your old choices inefficient (c3 instances from AWS are substantially more cost efficient for some workloads than older m3 instances, for example).
- Your software might change. For example, what if you improve the efficiency of your software to reduce your infrastructure requirements by 50%? Or what if you re-platform from Windows to Linux? (Reserved Instances require a commit on OS type) Or what if your memory needs to grow, and instances need to switch from standard to high-memory variants?
- Your needs might change. For example, what if a new competitor arrives who takes ½ of your customers, which reduces the load on your infrastructure by 50%?
- What if you picked everything right but the geography, and your app is suddenly popular in Asia or Europe?
The “on-demand” agility and flexibility of cloud computing is supposed to be a huge financial benefit, especially when your requirements change. Let’s imagine in the second month, several of those risks above actually happen: you move to the Asian market, resize a few instances to better map to actual workload, and shrink a bit on the cassandra cluster redundancy due to how reliable instances with live-migration are. That would look something like Figure 2.
Google Compute Engine estimate:
Amazon Web Services Partial upfront, 1 year, estimate:
Effective monthly: $860.59
This system costs less than ½ of what the original system costs, and is on an entirely different continent, but what does it cost to change your plan? This change costs very little at Google: you don’t pay any direct penalty for changing your infrastructure design. Your only costs would be how long the two different systems are up and running simultaneously to facilitate a zero-downtime cut-over.
In stark contrast, the cost for changing the Amazon system are essentially the total loss of whatever committed funds you applied to earn the discount, plus, the new requirement for upfront funds to get an efficient price (and re-commit!) in your new configuration, on top of the above-mentioned dual system usage (which costs more per hour…)
Let’s look at this from a cash flow perspective, not even in the worst case, but just assuming that you wanted to break-even with Google pricing on Amazon and chose the partial up front one-year Reserved Instance.
Google: Month 1 usage: $2610.90 + Month 2-13 usage: $909.72 x 12 = $13,527.54
Amazon: Month 1 Commit: $18,164.00 + Month 1 usage: $1093.54 + Month 2 commit: $6350.00 + Month 2-12 usage: 331.42 *12 = $29,584.58
That’s a big gap, even without figuring in the cost of capital! You can see how risky those commitments can be. AWS has a service to mitigate some of that risk, a RI marketplace, which allows you to attempt to sell back Reserved Instance units to other AWS customers. However, as I’m sure you can imagine, this is another process that presents a few risks:
- Are the RI’s you’re selling, for instance, types that are now clearly inefficient for many workloads and therefore not desirable to other customers?
- Will your RI’s sell for full price, or some discount to encourage a sale?
- How many buyers are there in the marketplace, and how quick will your RI’s sell, if at all?
- What if you didn’t start out in the US? The RI Marketplace is only available for customers with a US bank account.
One risk that’s a guaranteed loss: every sale on the RI marketplace comes with a 12% fee, payable to Amazon. Let’s say you have great luck and are able to sell 10 months of your original 12-month RI (they have to be sold in whole-month increments, rounding down), at full original price, which nets you back $13,320.27 after fees. Now your 13-month total is $16,083.19, so you’ve only lost $2,555.65 compared to what you would have paid using Google. But what a hassle, and how much risk did you take on? What if the RI’s didn’t sell for a few months? Every month, you lose $1,332. Ouch!
But this is a backwards example you say, cloud isn’t intended for this kind of static sizing, you’re supposed to be autoscaling to tightly follow load. True! So, let’s imagine that the above reflects the requirements of our steady-state load, and we have four small peaks during the day: morning rush, lunch peak, after-work, and midnight madness, each of which pop at 10x the above workload. (Our application passes the toothbrush test!) Our backend handles these spikes fine, but our web and API tiers need to autoscale dramatically. Let’s say each of these peaks onset very rapidly, say over the course of five minutes, and last for 15 minutes each. Note, we see systems that spike at 100x or more, so this scenario isn’t extreme!
This kind of system is pretty easy to build efficiently on Google. Instances take roughly a minute to launch, so we can easily autoscale to accommodate load, and since we charge only a minimum of 10 minutes and bill in per-minute increments, this only adds $110.77 a month to our bill. 10x peaks!
Google Compute Engine estimate:
Monthly additional: $110.77
Building this on AWS is just not as efficient. Because instances take >5 minutes on average to launch, we need to pre-trigger our instance boots (read, timing logic or manual maintenance). Also, AWS bills for instances in full hour increments, so we pay for 60 minutes when we only use ~20, for each of our 4 peaks. This makes the total additional cost $341.60, and without any ability to appropriately discount via reserved instances, that’s a number an AWS customer can’t bring down today.
Amazon Web Services estimate:
Monthly additional: $341.60
+ instance launch management logic manual ops or development
While this spike example is one utilization behavior we see frequently, we also see basic diurnal (twice daily, aka day/night) variability on almost every customer-facing service of anywhere from 2x-5x utilization. If that natural variation isn’t being followed by use of Autoscaler or other automated resource management, you are definitely leaving money on the table!
While there are many more dimensions to evaluate, hopefully this is a helpful analysis of how pricing systems differ between Google and Amazon. We’re not stopping here; look forward to more comparisons with more cloud providers and more workloads to help you understand exactly what you get for your money.
We are hyper-focused on driving cost out of cloud services, and leading the way with innovations such as Sustained Usage Discounts and per-minute billing. As one of our customers, StarMaker Interactive VP of Engineering Christian F. Howes said, “App Engine’s minute-by-minute scaling and billing saves us as much as $3,000 USD per month.”
We think pricing considerations are critical for users trying to make the best decision they can about infrastructure systems design. I’d love to hear your thoughts and what matters to you in cloud pricing? What areas are confusing, hard to analyze, hard to predict? What ideas do you have? Reach out!
-Posted by Miles Ward, Global Head of Solutions, Google Cloud Platform