July 10, 2024 | 5 minute read

Making Insights-Driven Decisions in an Ecosystem of Ecosystems

Our approach to securing our cloud environments

In this blog post, we explore our data-driven methodology for measuring, securing, and monitoring Block’s distinctive environment. Our objective is to prioritize security and infrastructure initiatives using robust data and insights drawn from our extensive resources. We will discuss our advanced technologies and cloud security strategies, and we will present a case study on managing static credentials in the cloud.

Background

Block is an intricate ecosystem comprising multiple sub-ecosystems, each with unique requirements and diverse cloud use cases. From Bitcoin transactions to traditional banking to hi-fi music, we cover a wide spectrum of services. Our Cloud Security team is crucial in protecting this diverse environment.

Despite our scale, fundamental questions remain relevant:

Paved Roads & North Star Requirements: How do we establish the essential guardrails, frameworks, and patterns?
Guardrails: When is the appropriate time to enforce a guardrail?
Buy vs. Build: How do we determine whether to develop a solution in-house or purchase one?
Developer Empowerment Through Exceptional Solutions: What practices are suitable for each specific business unit, acknowledging that a one-size-fits-all approach is impractical?

Our goal is to reduce uncertainty and unknowns in these decision-making processes by leveraging quantifiable data and various tools at our disposal. This approach empowers our developers to make informed security decisions while fully utilizing cloud resources and computing capabilities.

The Problem

Given the eclectic nature of our ecosystem, how do we determine the relevance of our guardrails and detective controls across the company? This entails answering questions such as:

Which businesses are these controls relevant to?
Are there common patterns being reused?
What are classes of problems in different business units ?

In addition to understanding the state of the ecosystem we also have to:

Identify initiatives based on the data
Determine prioritization
Identify correct stakeholders and partners to work with

Our Approach

To address the challenges outlined, we developed a solution encompassing three major steps: data collection, contextualization, and validation.

Data-Driven Insights:

By aggregating data from our security tooling and internal data sources, we can extract overarching insights that guide our critical decisions and initiatives. This approach enables us to pinpoint risk, identify trends, and prioritize actions, ensuring that our security measures are not only proactive and reactive but also strategic.

Platform Overview

Our platform is designed to provide comprehensive data-driven insights by integrating various data sources, processing them through a managed infrastructure, and presenting actionable information to stakeholders. The workflow is structured as follows:

Data Sources:

Security Tooling: Integrates data from tools like CSPM, DSPM, and other internal tooling to monitor and manage security posture.
Cloud Providers: Aggregates attribution data from major cloud providers, ensuring comprehensive cloud security management.
Business-Specific Data: Includes data unique to specific business units.

Insights Platform:

Managed on Kubernetes (Kube), this platform orchestrates various jobs that process the ingested data. These jobs are designed to derive meaningful insights from the raw data collected from the different sources.

Data Lake:

Data storage: Utilizes these robust data warehousing solutions to store and analyze large volumes of data efficiently.
ETL Process: The Extract, Transform, Load (ETL) process ensures data is properly formatted and transferred into the Data Lake, maintaining data integrity and readiness for analysis.

Presentation:

Visualization Tools: Insights and analytics are presented through Looker, Google Sheets, and Google Slides. This allows stakeholders to access, explore, and understand the data in a user-friendly and interactive manner.
By leveraging this platform, we can draw overarching insights that inform critical decisions and initiatives, ensuring our security measures are both effective and aligned with business objectives.

Incorporating these insights has significantly strengthened our partnerships with the cloud security team. We engage in discussions about the objectives of each initiative and how to measure its risk. This approach encourages everyone to think deeply about security priorities and look beyond just being secure by default, such as in static credentials.

Case Study

Eliminating Static Credentials in the Cloud

This case study illustrates how, In the context of our data-driven insights platform, we embarked on a comprehensive initiative to address key questions and streamline static credentials in our cloud environment.

Key questions

How many keys?
Who owns the keys?
When were they last used?
What are they used for?
What is the risk level for each key?
Can they be replaced with temporary security credentials?

Data Collection

Welcome to the world of data science!

To tackle the issue of static credentials in our cloud environment, we leveraged various data collection methods:

Leveraged CSPM’s, CSP’s, and internal tooling: We utilized Cloud Security Posture Management (CSPM) tools, Cloud Service Providers (CSPs), and other internal tools available within our infrastructure to gather comprehensive data on static credentials.
Static and manual data collection: Although collecting static and manual data can be tiresome, it proved effective in ensuring no key was overlooked. This thorough approach was crucial in capturing a complete dataset.
Comprehensive Data Capture: We aimed to capture everything, leaving no stone unturned. This included all forms of data related to keys, their usage, ownership, and associated risks.

Contextualization

Once the data was collected, the next step was to contextualize it, providing meaningful insights and actionable information:

Team and business unit mapping: We mapped the collected data to specific teams and business units. This step was essential for understanding ownership and responsibility for each key.
Business context of the keys: We assessed the context in which each key was used, determining whether it was a vendor necessity, part of a platform, or served another purpose. This context was gathered mainly through manual efforts or existing static lists.
Risk measurement: To prioritize our efforts, we measured the risk associated with each credential. This included evaluating privileges, access from outside the organization, last used dates, and other relevant factors that could impact the overall risk level.

Validation

To ensure the accuracy and reliability of our data, we conducted a validation process:

Cross-referencing CSPM and CSP data: We cross-referenced data from our CSPM tools with that from our Cloud Service Providers. This step helped in verifying the accuracy of the data collected and the coverage overall.
Validation of owning teams: We validated that the teams associated with each key still existed and added specific attribution logic to ensure accurate mapping of keys to current teams.
Audit log sampling: By sampling keys and analyzing their last used dates based on audit logs, we further ensured the relevance and accuracy of our data. This validation step was crucial in identifying obsolete or high-risk keys.

Execution

Our analysis revealed a significant number of keys being used for SaaS vendor applications and cross-cloud communication. Many of these keys were necessitated by limitations, such as vendors relying on outdated infrastructure and lacking support for federation.

To address this issue, we provided teams with automation tools, comprehensive documentation, and robust support to eliminate these keys. Where exceptions were necessary, we offered migration steps and services to transition to better solutions, such as federation. As a result of these efforts, we have successfully reduced the number of static credentials in our GCP environment by 71% in less than a year.

State of GCP service account keys in July 2023

State of GCP service account keys in May 2024

Key insights acquired:

60% of keys last used over 90 days ago: A significant portion of static credentials were inactive, indicating potential security risks and opportunities for optimization.
35% of keys belonged to legacy platforms: Many keys were tied to outdated systems, highlighting the need for modernization and phasing out obsolete infrastructure.
Top 5 cloud accounts held 80% of the keys: A large concentration of keys in a few accounts pointed to centralization and potential points of failure, necessitating a review of key management practices.
20% of keys were for vendor usage: Despite advances in security, vendors were still using static keys in 2024. This underscored the need for stricter vendor management and enforcement of security best practices.
Blocking key creation in numerous accounts: As we progressed in eliminating static credentials, we identified numerous accounts where we could block the creation of new static keys, further strengthening our security posture.

Conclusion

Our data-driven methodology has proven instrumental in enhancing the security of Block's diverse cloud environment. By systematically eliminating static credentials, we've significantly reduced potential security risks and streamlined our operations. The insights gained from our comprehensive platform have empowered our teams to make informed, strategic decisions, reinforcing our commitment to robust cloud security. As we continue to evolve, these practices will remain crucial in maintaining and improving the security and efficiency of our ecosystems.

Authored By

Stephanie Shi