AWS S3 vs Google Cloud vs Azure:
Cloud Storage Performance

I am working on a new cloud service that analyses big amounts of scientific data quickly. So far, our largest customer dataset consists of approximately 3,000 tables, each of which is 40 to 80 MB in size and totals 150 GB, which we plan to analyse in 10 seconds or less. Because each table may be processed independently, we massively parallelize to achieve this goal — our deployment employs 1000 vCPUs or more as needed. The difficult part is reading so much data into memory quickly. Currently, nearly 80% of compute time is spent reading data, which brings us to the subject of this three-part series: cloud storage performance. I looked into a few various techniques to tackling this problem, including object storage, database-backed storage, and connected storage, which I will go into in more part in future postings.

Google launched its multi-cloud PerfKitBenchmarker just as I finished writing this. This is also, as far as I can find, the first set of PerfKitBenchmarker findings to be published.

Part 1: Storage of Objects

Obligatory disclosure: Benchmarks are application-specific and fluctuate depending on network load, surrounding VM activity, month of birth, and moon phase. Consider whether these benchmarks are relevant to your application as you read this, and keep in mind that the results may not be representative of your actual experience.

Object storage (Amazon AWS Simple Storage Service (S3), Google Cloud Storage (GCS), and Microsoft Azure Storage) offers a simple PUT/GET/HEAD/LIST interface for storing data that is too large for a database. It is desirable because it is inexpensive, provides redundancy for safety, and scales automatically to handle several concurrent demands. The negatives are the latency (due in part to the fact that each file requires a new HTTP connection) and the throughput being constrained by network throughput and availability.

Downloads

When downloading files from a VM hosted by the same vendor in the same region, I looked at two crucial metrics: time to first byte (TTFB) to quantify latency and throughput (Figure 1). The percentile (or quantile) is on the x axis, and the metric is on the y (note the log scale) in these plots, allowing you to examine the normal reaction (median, 0.5) as well as the best and worst case behaviours.

Figure 1. Time to first byte (left) and single-stream API throughput (right) for downloads. X axis (p) is percentile/quantile, i.e. median is 0.5. Results obtained via the PerfKitBenchmarker object_storage_service benchmark. Time to first byte tests were run 1000 times. Download throughput tests were run 100 times with objects ranging from 16 KB to 32 MB. All benchmarks used standard storage class buckets. VM and bucket locations: GCE us-central1-a/US; AWS us-east-1a/us-east-1; Azure East US/East US.

Azure and AWS S3 had nearly same latency, however GCS had a latency that was more than three times higher. Google, on the other hand, gave roughly 4x the throughput of Azure and roughly 2x the throughput of S3. As a result, GCS should finish downloading files larger than 1 MB and files larger than 5 MB faster than Azure or S3. (It is important to stress the importance of completing the download.) S3 and Azure will perform better for any file size if your application can handle streaming input and is processing the data faster than the data is being downloaded.). On AWS S3 at 91 MB/s (732 Mbps), see that unusually flat right-tail? I am guessing it is AWS’s maximum throughput.

Caps on network throughput are a problem.

These benchmarks rely heavily on network throughput, hence VM type must be considered into account while planning an architecture. I have shown findings from VM types with enough network bandwidth to reliably assess storage throughput rather than VM throughput in the graph above. I will go through network throughput in more detail in a later post, but for now, Figure 2 shows that a 1 vCPU machine is sufficient for GCE and Azure, however AWS EC2 required a much larger machine — an 8 vCPU c4.2xlarge would have sufficed, but I utilised a 16 vCPU c4.4xlarge.

If you download one file per vCPU, few EC2 instance types have enough network throughput to make S3’s throughput the bottleneck (i.e. divide network bandwidth by vCPU count, which is a crude but useful view; Figure 2-right). Despite having a higher throughput than S3, Google Compute Engine has a far higher network throughput, allowing most machine types to have more bandwidth than GCS. The low throughput of Azure is accommodated by the majority of their instance types.

Figure 2. Measured intra-datacenter network throughput for varying VM types, total (left) and normalized by CPU count (right). Showing mean and standard deviation. Throughput was measured for 60 seconds using iperf. All tests were repeated two to six times. Shared-core machines were considered to have one vCPU when normalizing by CPU count. GCE tests were run in us-central1-a, except n1-standard-32 which was run in us-central1-b. AWS tests were run in us-east-1a. Azure tests were run in East US. See also [1] and [2].

I ran the storage benchmarks on 16 vCPU instances from GCE (n1-standard-16) and Azure (D5 v2), as well as a single vCPU instance from AWS, to be comprehensive (m3.medium). The smaller Google and Microsoft VMs had slightly higher storage throughput than the bigger VMs (not shown), which could be due to how VMs are deployed on shared hardware. The m3.medium on AWS was restricted at 32.5 MB/s.

GCS multi-region buckets are fantastic

Figure 1 shows the GCS findings for a US “multi-region” bucket. S3 buckets are comparable to GCS regional buckets; there are no multi-region buckets in S3. Multi-region buckets, on the other hand, offer access from that set of regions without incurring data transmission or duplication expenses (Accessing a bucket from another region costs $0.02 per GB on AWS.) If you have data processing servers in several locations, this is great. Both (…all two!) US data centres have great throughput (Figure 3). I can not find anything informative in the terms of service, although multi-region buckets seem to be more fault-tolerant.

Figure 3. Download throughput from all seven zones accessing a US multi-region bucket. (Data from Figures 4-right and 5-right.)

According to the documentation at the above link, a regional bucket provides lower latency and higher throughput than a multi-region bucket, so I also tested an us-central1 regional bucket from all four us-central1 compute engine zones (Figure 4) and a GCS us-east1 regional bucket from all three us-east1 zones (Figure 5) to see if we could get even faster throughput or improve GCS’s poor latency.

Figure 4. Benchmarks for accessing a US multi-region bucket (solid) or a us-central1 regional bucket (dashed) from the four us-central1 zones.
Figure 5. Benchmarks for accessing a US multi-region bucket (solid) or a us-east1 regional bucket (dashed) from the four us-east1 zones.

Latency increased as a result of regional buckets. With regional buckets, median throughput was either the same or slightly greater, while tails were frequently worse. Odd.

Uploads

I do not rely on rapid uploads for my application, but I am providing some of the benchmark data here for completeness’ sake (Figure 6). Similar to downloads, Google had higher latency but faster throughput, Microsoft had the lowest latency but also the lowest throughput, and AWS was in the middle.

Concurrency is well handled by object storage

Finally, Google’s gsutil package contains a perfdiag command, which can be used for both GCS and S3 because it is based on boto. This tool allows you to read or write numerous objects at the same time, which adds some extra details to the story (Table 3). This demonstrates that these systems can handle concurrent queries at the scale we will be using them.

Conclusions

For both uploads and downloads, Amazon and Azure have the lowest latency, while Google has the highest throughput. This indicates that AWS and Azure excel at handling smaller files, but GCE shines at handling larger files, emphasising the significance of benchmarking with data that is comparable in size to the data your application consumes.

When developing high-speed data processing systems, the significant constraints on AWS EC2 network throughput must be taken into account.
When working with data from various datacenters in the same region, Google’s unique multi-region buckets keep prices down (e.g. continent).

To deliver high aggregate throughput, object storage scales automatically.
Finally, note in mind that I am only providing data from API access (which is the same boto code for AWS and Google), and I have noticed significant changes in performance between clients (vendor-specific CLIs, node.js API package, cURL’ing URLs, and so on).

While EBS-optimized instances have dedicated EBS NICs, these values appear to be the same as the non-EBS NIC limitations based on my personal testing. There are several discrepancies between what I measured and what this page says; for example, a m3.xlarge should have a limit of 62.5 MB/s, but I saw 126 MB/s, which is the m3.2xlarge’s claimed limit. I would not be shocked if the problem was caused by a faulty AWS configuration setting. The t2.micro’s strong network performance is also in contrast to what others have experienced. In a future blog article, I will go over network throughput in further detail.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Umair Akbar

Umair Akbar

Tesla Software Engineer. Autopilot.