AWS S3 vs Google Cloud vs Azure:
Cloud Storage Performance

Umair Akbar
8 min readFeb 9, 2022

I’m currently working on a brand-new cloud service created specifically for processing large amounts of scientific data quickly. With a total storage requirement of 150 GB, our largest customer dataset to date consists of about 3,000 tables with sizes ranging from 40 to 80 MB. We want to process this data in no more than ten seconds. Since each table can be processed independently, we distribute the workload across 1000 vCPUs or more as necessary by heavily utilizing parallelization.

The quick ingestion of such a large amount of data into memory is one of the biggest problems we face. The fact that data retrieval currently takes up about 80% of computation time has led me to concentrate on finding solutions for this problem in a three-part blog series on cloud storage performance. I’ll delve into the specifics of three different strategies throughout this series: attached storage, database-backed storage, and object storage. Each strategy will be examined in its own post, offering a thorough analysis of both its advantages and disadvantages.

As I was putting together this blog series, I learned about Google’s timely release of the multi-cloud PerfKitBenchmarker. It’s interesting to note that, according to my research, this post will also contain the first set of results from the use of PerfKitBenchmarker to be published.

Part 1: Storage of Objects

Obligatory disclosure: Benchmarks are application-specific and fluctuate depending on network load, surrounding VM activity, month of birth, and moon phase. Consider whether these benchmarks are relevant to your application as you read this, and keep in mind that the results may not be representative of your actual experience.

Object storage (Amazon AWS Simple Storage Service (S3), Google Cloud Storage (GCS), and Microsoft Azure Storage) offers a simple PUT/GET/HEAD/LIST interface for storing data that is too large for a database. It is desirable because it is inexpensive, provides redundancy for safety, and scales automatically to handle several concurrent demands. The negatives are the latency (due in part to the fact that each file requires a new HTTP connection) and the throughput being constrained by network throughput and availability.

Downloads

When downloading files from a VM hosted by the same vendor in the same region, I looked at two crucial metrics: time to first byte (TTFB) to quantify latency and throughput (Figure 1). The percentile (or quantile) is on the x axis, and the metric is on the y (note the log scale) in these plots, allowing you to examine the normal reaction (median, 0.5) as well as the best and worst case behaviours.

Figure 1. Time to first byte (left) and single-stream API throughput (right) for downloads. X axis (p) is percentile/quantile, i.e. median is 0.5. Results obtained via the PerfKitBenchmarker object_storage_service benchmark. Time to first byte tests were run 1000 times. Download throughput tests were run 100 times with objects ranging from 16 KB to 32 MB. All benchmarks used standard storage class buckets. VM and bucket locations: GCE us-central1-a/US; AWS us-east-1a/us-east-1; Azure East US/East US.

Azure and AWS S3 had comparable latency, while Google Cloud Storage (GCS) had over triple the latency. Despite the higher latency, GCS provided about 4x the throughput of Azure and 2x that of S3. As a result, GCS should finish downloading files larger than 1 MB and files larger than 5 MB faster than Azure or S3. (It is important to stress the importance of completing the download). S3 and Azure will perform better for any file size if your application can handle streaming input and is processing the data faster than the data is being downloaded. A flat right-tail at 91 MB/s (732 Mbps) on AWS S3 could suggest AWS’s maximum enforced throughput.

Caps on network throughput are a problem.

Network throughput plays a crucial role in these benchmarks, making the selection of the appropriate VM type essential in architectural design. In the above section, I present results obtained from VM types that possess adequate network bandwidth to accurately measure storage throughput, distinct from VM throughput. I will delve into network throughput in greater detail in a forthcoming post. However, Figure 2 provides a preliminary overview. It illustrates that, for GCE (Google Compute Engine) and Azure, a machine with just 1 vCPU delivers satisfactory throughput. In contrast, AWS EC2 necessitates a substantially larger machine, such as an 8 vCPU c4.2xlarge, although for these benchmarks, I employed a 16 vCPU c4.4xlarge.

It is worth noting that only a few EC2 instance types possess sufficient network throughput to render S3’s throughput the bottleneck, assuming the scenario where one file is downloaded per vCPU (which serves as a rough but informative metric; refer to Figure 2, right). Conversely, Google Compute Engine exhibits significantly higher network throughput, with most machine types surpassing GCS (Google Cloud Storage) in terms of bandwidth, despite GCS already offering higher throughput than S3. Azure’s lower throughput is also accommodated by the majority of their instance types.

Figure 2. Measured intra-datacenter network throughput for varying VM types, total (left) and normalized by CPU count (right). Showing mean and standard deviation. Throughput was measured for 60 seconds using iperf. All tests were repeated two to six times. Shared-core machines were considered to have one vCPU when normalizing by CPU count. GCE tests were run in us-central1-a, except n1-standard-32 which was run in us-central1-b. AWS tests were run in us-east-1a. Azure tests were run in East US. See also [1] and [2].

For the sake of thoroughness, I conducted storage benchmarks on three specific instances: a 16 vCPU instance (n1-standard-16) from GCE (Google Compute Engine), a 16 vCPU instance (D5_v2) from Azure, and a 1 vCPU instance (m3.medium) from AWS (Amazon Web Services). Interestingly, it was observed that the smaller VMs from Google and Microsoft exhibited slightly higher storage throughput compared to their larger counterparts (although these results are not shown here). This could potentially be attributed to the distribution of VMs on shared hardware within their respective infrastructures. In the case of AWS, the m3.medium instance was observed to be strictly capped at 32.5 MB/s.

GCS multi-region buckets are fantastic

The results presented in Figure 1 for GCS pertain to a US “multi-region” bucket. It’s important to note that S3 does not have multi-region buckets. In fact, S3 buckets are functionally equivalent to GCS regional buckets. On the other hand, multi-region buckets in GCS allow access from a specific set of regions without incurring data transfer charges or duplication costs. In contrast, AWS charges $0.02 per GB for accessing a bucket from another region. This is particularly advantageous when you have data processing servers located in multiple regions. Furthermore, the throughput from both US data centers (the only two considered in this context) is excellent, as indicated in Figure 3. While I couldn’t find specific details in the terms of service, it is plausible to assume that multi-region buckets offer enhanced fault tolerance as well.

Figure 3. Download throughput from all seven zones accessing a US multi-region bucket. (Data from Figures 4-right and 5-right.)

According to the documentation at the above link, a regional bucket provides lower latency and higher throughput than a multi-region bucket, so I also tested an us-central1 regional bucket from all four us-central1 compute engine zones (Figure 4) and a GCS us-east1 regional bucket from all three us-east1 zones (Figure 5) to see if we could get even faster throughput or improve GCS’s poor latency.

Figure 4. Benchmarks for accessing a US multi-region bucket (solid) or a us-central1 regional bucket (dashed) from the four us-central1 zones.
Figure 5. Benchmarks for accessing a US multi-region bucket (solid) or a us-east1 regional bucket (dashed) from the four us-east1 zones.

Latency increased as a result of regional buckets. With regional buckets, median throughput was either the same or slightly greater, while tails were frequently worse. Odd.

Uploads

I do not rely on rapid uploads for my application, but I am providing some of the benchmark data here for completeness’ sake (Figure 6). Similar to downloads, Google had higher latency but faster throughput, Microsoft had the lowest latency but also the lowest throughput, and AWS was in the middle.

Concurrency is well handled by object storage

Finally, Google’s gsutil package contains a perfdiag command, which can be used for both GCS and S3 because it is based on boto. This tool allows you to read or write numerous objects at the same time, which adds some extra details to the story (Table 3). This demonstrates that these systems can handle concurrent queries at the scale we will be using them.

Conclusions

  • Amazon and Azure have the lowest latency, while Google has the highest throughput for both uploads and downloads. This indicates that AWS and Azure excel at smaller files, whereas GCE excels at larger files, emphasizing the need of benchmarking using data that is comparable in size to what your application consumes.
  • When constructing high-speed data processing systems, the significant constraints on AWS EC2 network performance must be considered.
  • When working with data from various datacenters in the same region (for example, continent), Google’s distinctive multi-region buckets reduce expenses.
  • Object storage scales automatically to provide high aggregate throughput.
  • Lastly, note that I’m only displaying data from API access (which is the exact same boto code for AWS and Google), and I have, as expected, observed significant disparities in performance from various clients (the vendor-specific CLIs, node.js API package, cURL’ing URLs, etc.).

For both uploads and downloads, Amazon and Azure have the lowest latency, while Google has the highest throughput. This indicates that AWS and Azure excel at handling smaller files, but GCE shines at handling larger files, emphasising the significance of benchmarking with data that is comparable in size to the data your application consumes.

When developing high-speed data processing systems, the significant constraints on AWS EC2 network throughput must be taken into account.
When working with data from various datacenters in the same region, Google’s unique multi-region buckets keep prices down (e.g. continent).

To deliver high aggregate throughput, object storage scales automatically.
Finally, note in mind that I am only providing data from API access (which is the same boto code for AWS and Google), and I have noticed significant changes in performance between clients (vendor-specific CLIs, node.js API package, cURL’ing URLs, and so on).

Links

  1. http://googlecloudplatform.blogspot.com/2015/11/bringing-you-more-flexibility-and-better-Cloud-Networking-performance-GA-of-HTTPS-Load-Balancing-and-Akamai-joins-CDN-Interconnect.html
  2. http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-ec2-config.html. While EBS-optimized instances have separate NICs dedicated to EBS, from my own testing these values seem to be the same as the limits found on the non-EBS NIC. There are some inconsistencies between what I measured and this page; for example, an m3.xlarge should have a limit of 62.5 MB/s, but I observed 126 MB/s, which is their reported limit for an m3.2xlarge. I wouldn’t be surprised if this was a botched AWS config setting. Same goes for the t2.micro’s high network performance, which contradicts what others (such as http://www.azavea.com/blogs/labs/2015/01/selecting-a-nat-instance-size-on-ec2/) have reported.

--

--

Umair Akbar

Hi, I'm Umair Akbar. Cloud Engineer. Artificially Intelligent. Experienced in deploying and managing cloud infrastructure, proficient in AWS and Google Cloud