Thwarting an Advanced Exploit in GCP in a Hybrid Cloud Run, Dataflow, and KMS Environment

Umair Akbar
6 min readNov 12, 2023

RapidFeed, a startup focused on real-time data analytics, built an innovative pipeline using Google Cloud services. By combining Cloud Run, Dataflow, and Key Management Service (KMS), they powered low-latency data processing with automatic scaling and encryption. However, intricate integrations can also create unique security risks. RapidFeed learned this lesson the hard way when an attacker exploited vulnerabilities in their implementation.

For privacy, the company’s name and some details in this account have been changed or omitted. The events described occurred over a year ago and are mostly factual.

Understanding the Moving Parts

Google Cloud Run provides a serverless environment to deploy containerized applications that can instantly scale to meet demand. The Cloud Run runtime manages the underlying infrastructure and scaling logic, allowing developers to focus on application code.

Google Cloud Dataflow facilitates large-scale data processing by providing managed resources for streaming and batch data pipelines. It automates cluster management, scaling, and optimization using Apache Beam and a data-parallel processing model.

Google Cloud KMS enables creating, managing, and using cryptographic keys for encryption and signing, while ensuring keys never leave Google’s secure environment. It provides fine-grained access controls and auditing to manage keys throughout their lifecycle.

This combination enables an event-driven architecture where Cloud Run hosts the application logic, Dataflow manages the streaming data pipeline, and KMS secures the keys for encrypting sensitive data.

Examining the Integration Mechanics

The tight integration between these services is enabled through service accounts and permissions. The Cloud Run application was assigned a service account that has permissions to invoke Dataflow jobs and access keys in KMS. This service account served as the bridge for the Cloud Run instances to communicate with the other two services.

The data pipeline worked as follows:

  1. Events and data streams triggered Cloud Run instances via HTTP requests in real-time
  2. Cloud Run instances started Dataflow jobs to process these event streams with sub-second latency
  3. Dataflow jobs accessed cryptographic keys from KMS for encrypting the data
  4. Results from the Dataflow jobs were streamed back to Cloud Run in real-time

This integration improved scalability by leveraging Cloud Run’s auto-scaling capabilities. It also benefited from Dataflow’s distributed data processing architecture. Encrypting the data through KMS enhanced security. However, the symbiotic interaction between the three services also widened the attack surface.

Uncovering the Vulnerabilities in This Architecture

The tight coupling between Cloud Run, Dataflow, and KMS introduced security risks for RapidFeed:

  • Overly permissive service account permissions violating the principle of least privilege
  • Lack of granular IAM controls for managing access to KMS keys
  • Insufficient monitoring of Dataflow jobs and Cloud Run instances
  • Unencrypted data flowing between the services enabling data exfiltration

An attacker exploiting these vulnerabilities could intercept streaming data and manipulate Dataflow jobs for data integrity attacks. Moreover, gaining unauthorized access to KMS keys further amplifies the damage potential.

Analyzing the Anatomy of an Attack

This integration was recently exploited by an attacker who gained access to the Cloud Run service account credentials through a compromised developer laptop. With its far-reaching permissions, the attacker started Dataflow jobs to exfiltrate sensitive data processed by the pipeline in real-time.

Limited monitoring allowed this data exfiltration to go undetected initially. The attacker then accessed the cryptographic keys stored in KMS to decrypt the stolen data. Furthermore, the lack of encryption between services allowed the attacker to intercept streaming data by eavesdropping on the network.

This attack highlighted multiple failures in RapidFeed’s implementation:

  • Permissive service account permissions
  • Unrestricted access to KMS keys
  • Inadequate activity monitoring
  • Lack of encryption-in-transit

A multi-pronged strategy was urgently needed to address these gaps and prevent further damage.

Isolating the Threat in Real-Time

Upon detection of the breach, RapidFeed’s incident response team initiated a series of immediate actions to isolate the threat and mitigate further damage. The first step involved temporarily suspending the compromised Cloud Run service accounts to prevent any further unauthorized Dataflow job executions or KMS key access. Simultaneously, network access controls were tightened, and traffic to and from the affected services was closely monitored to identify and block any suspicious data transfers. This was followed by a rapid rollback of the Cloud Run environments to a known safe state using pre-configured snapshots. The team also initiated a complete audit of KMS, revoking and rotating all keys that were potentially compromised. To understand the depth of the intrusion, the incident response team deployed additional logging and monitoring tools across the architecture, enhancing visibility into the real-time operations of Cloud Run and Dataflow. These immediate steps were critical in containing the breach, allowing RapidFeed to regain control of their cloud environment and prevent further unauthorized access or data leakage.

Uncovering the Extent of the Breach

The real-time measures implemented by RapidFeed’s incident response team yielded significant insights into the breach. The enhanced monitoring and logging revealed that the attacker had initiated several unauthorized Dataflow jobs, primarily aimed at data exfiltration. Analysis of network traffic patterns identified unusual outbound connections, indicating attempts at data leakage to external servers. The audit of the KMS revealed that multiple encryption keys had been accessed, though, fortunately, a majority of the sensitive keys remained untouched. The suspension of the Cloud Run service accounts halted any ongoing unauthorized activities, confirming that these accounts were the primary vector of the attack. Furthermore, the review of the rollback snapshots and comparison with the current state provided a clear picture of the changes made by the attacker, most notably in the configuration of Dataflow jobs and Cloud Run environment variables. This comprehensive examination not only highlighted the areas of immediate concern but also helped RapidFeed to understand the sophistication of the attack and the attacker’s potential objectives, guiding their subsequent long-term security strategy enhancements.

A Hardened Security Posture

Several best practices can help mitigate the risks highlighted in this scenario:

  • Implement least privilege for all service accounts and IAM roles. Conduct periodic access reviews and audits.
  • Enable strong access controls for KMS keys including key rotation, auditing, and mandatory encryption.
  • Stream logs from Cloud Run, Dataflow, and KMS into a centralized security information and event management (SIEM) system for threat monitoring and detection.
  • Enforce encryption-in-transit using TLS or VPC service controls for all data communication between services.
  • Validate Dataflow job configurations and implement input sanitization to prevent data integrity issues.
  • Adopt anomaly detection systems that use machine learning to identify suspicious access patterns and job configurations.

These measures can significantly improve the security posture for a multi-service architecture like RapidFeed’s.

Optimizing for Performance While Prioritizing Security

When safeguarding complex cloud environments, it is vital to balance security with performance. Follow these best practices:

  • Profile workloads and use autoscaling controls judiciously to meet security needs while optimizing costs.
  • Segregate cryptographic workloads from performance-sensitive workloads and cache encrypted data where possible to reduce latency.
  • Use VPC Service Controls for granular access control without compromise on performance.
  • Set up separate development, test, and production environments with appropriate security levels for each.
  • Monitor performance metrics from each service and fine-tune configurations to meet security and performance objectives.

Adopting a Continuous Security Mindset

As cloud platforms rapidly evolve, so do the threats against them. A robust defense necessitates continuous security efforts:

  • Actively monitor security bulletins and new features released by Google Cloud. Promptly adopt new capabilities that improve the security posture.
  • Institute strong configuration management practices that allow detecting and rolling back suspicious configuration changes.
  • Conduct regular attack simulation exercises to proactively uncover potential attack vectors before they are exploited.
  • Implement continuous training to keep staff updated on the latest cloud security threats, challenges, and best practices.

Conclusion

RapidFeed’s experience revealed the need for a comprehensive security strategy when integrating multiple cloud services into an interconnected architecture. The technical complexity arising from leveraging Cloud Run, Dataflow, and KMS necessitates balancing performance with security through ongoing vigilance, strict controls, and continuous adoption of new protections. As cloud platforms continue to innovate, adopting an adaptable and proactive approach to security will prove vital in defending against emerging threats.

--

--

Umair Akbar

Hi, I'm Umair Akbar. Cloud Engineer. Artificially Intelligent. Experienced in deploying and managing cloud infrastructure, proficient in AWS and Google Cloud