Your company’s secrets are lying in plain sight. 12.8 million secrets exposed on GitHub in 2023

Klaudia Ciesielska
23 Min Read
Cloud, cloud, technology

Today’s technology landscape is witnessing a phenomenon that can be described as a digital perfect storm. On the one hand, we are witnessing an unprecedented 75% year-on-year increase in cloud intrusions, as reported in the CrowdStrike 2024 Global Threat Report. On the other, the Verizon 2024 Data Breach Investigations Report (DBIR) analysis confirms that the use of stolen credentials remains the most common method of initiating attacks, with human factors playing a role in up to 68% of all security breaches. Finally, the Tenable 2025 Cloud Security Risk Report sheds light on a devastating consequence: 97% of data that accidentally leaks from cloud resources are companies’ most guarded secrets, classified as confidential or proprietary.

The mass migration to the cloud, driven by the promise of scalability and innovation, coupled with immense pressure for speed of deployments in DevOps cycles, has led to a paradoxical security crisis. Its source is not an inherent flaw in cloud technology, which inherently offers advanced protection mechanisms. The problem is fundamental, yet avoidable, mistakes in identity management practices and its digital credentials, known as ‘secrets’. These simple mistakes create ideal conditions for attackers, who increasingly rarely need to ‘hack’ into systems. Instead, they are increasingly simply ‘logging in’ to them, using keys that organisations themselves have left in plain sight.

Anatomy of a secret – Digital keys to the kingdom

To fully understand the seriousness of the problem, it is necessary to define precisely what a ‘secret’ is in the context of modern IT. This concept goes far beyond the traditional user password. It is a broad category of digital credentials that are used to authenticate non-human identities – applications, scripts, containers or microservices. They are the digital bloodstream of modern distributed architectures.

A secret is any confidential information to which we want to strictly control and limit access. The essence of a secret is the need to maintain tight control over it, which is extremely difficult to achieve without implementing dedicated tools and rigorous processes.

Ad imageAd image

Secrets have become an essential enabler of communication between system components. They allow scripts in the CI/CD pipeline to access code repositories, containers to retrieve configurations and microservices to call each other securely.

However, this ubiquity leads to a phenomenon known as ‘secrets sprawl’ – the uncontrolled proliferation and dispersion of secrets throughout the IT environment. In dynamic, ephemeral architectures where machine identities are created and destroyed on the fly (e.g. in serverless systems or container orchestrators like Kubernetes), the number of secrets grows exponentially and managing them becomes a huge challenge.

Compromising a secret is fundamentally more dangerous than stealing a password belonging to a human user. An employee’s password is typically limited by the scope of his or her permissions and often protected by additional layers of security, such as multi-factor authentication (MFA). In contrast, a secret, such as an API key associated with an administrative account in the cloud, may have permissions to create, modify and delete an entire company’s infrastructure. Furthermore, secrets are, by definition, used by machines, meaning their exploitation is automated and can occur on a massive scale within seconds of compromise. An attacker who acquires such a secret does not need to break any security measures – he or she becomes a fully authorised, trusted actor in the system. This is why, as the IBM report points out, incidents caused by attackers using valid accounts involve almost 200% more complex countermeasures than the average incident, as security teams need to distinguish legitimate activity from malicious activity.

Negligence bill – Financial and operational cost of leakage

Negligence in the management of secrets is not just a theoretical technical risk. They translate into measurable, often astronomical financial and operational losses. Analysis of leading industry reports, in particular the IBM Cost of a Data Breach Report 2024, allows this cost to be quantified precisely.

The figures for 2024 are alarming. The average global cost of a single data breach reached a record $4.88 million, a 10% increase on the previous year and the biggest jump since the pandemic. The figure is even more dramatic in the US, where the average cost of a breach is as high as US$9.36 million. This increase is mainly driven by business disruption costs and post-incident remediation, which together account for the largest share of total losses.

Paradoxically, cloud environments that offer advanced security tools are becoming the arena for the most expensive incidents. Breaches in which data was stored in the public cloud generated an average cost of US$5.17 million. The situation is even worse for so-called ‘shadow data’ – information stored in unmanaged and often unknown to IT departments. As many as 35% of breaches involved such data, and the cost was 16% higher, reaching US$5.27 million. What’s more, the time required to detect and contain such incidents was almost 25% longer.

IBM’s analysis clearly shows which errors are the most costly. Attacks using stolen or compromised credentials – the core of the secret management problem – cost an average of US$4.81 million. Even more expensive are attacks initiated by malicious insiders, with an average cost of US$4.99 million.

The time dimension is also key. Incidents involving credential theft have the longest lifecycle, with an average of 11 months from breach to full containment. This is almost a year during which the organisation suffers losses and is vulnerable to further action by attackers.

The scale of the breach has a direct bearing on its cost. The most commonly compromised data type is Personally Identifiable Information (PII). In 2024, the average cost of a single compromised PII record rises to US$173. For companies processing the data of millions of customers, the potential losses become an existential threat.

Investment in proactive secret and cloud configuration management generates one of the highest returns on investment (ROI) across the cyber security landscape. The IBM report shows that organisations that extensively use artificial intelligence and automation in their security processes – which is key to effective secret management at scale – save an average of US$2.2 million in breach costs compared to companies that do not. Since the most expensive and longest-lasting attack vectors are directly linked to identity management failures, every penny spent on secret scanning tools, centralised management systems (vaults) or developer training directly reduces the risk of the most catastrophic incidents.

The seven deadly sins of managing secrets in the cloud

The billion-dollar losses described in the previous chapter are rarely the result of sophisticated, zero-day attacks. Far more often, their source is a series of basic but catastrophic errors that can be described as the ‘seven deadly sins’ of secret management. Analysis of these errors shows that the problem lies not in a lack of technology, but in process and cultural negligence.

Sin #1: Hard-coding secrets (The Cardinal Sin)

This is the original sin and most common mistake of developers: putting sensitive data, such as API keys or database passwords, directly into source code, configuration files, deployment scripts (e.g. Ansible Playbooks) or container images. Such code, even if it ends up in a private repository, becomes a ticking bomb. All it takes is one careless commit to a public branch, a mistake in repository permissions or access by a former employee for the secret to leak out. Rotating such a hardcoded secret is an operational nightmare, as it requires a code change and a full application deployment cycle. The problem is further compounded by generative AI which, learning from publicly available code, replicates this bad pattern itself, suggesting ready-made code snippets with embedded, sample secrets to developers.

Sin #2: Over-Privileged Identities

The second common mistake is to ignore the Principle of Least Privilege (PoLP). Machine-based identities (applications, services) are repeatedly granted ‘spare’ privileges – much broader than those absolutely necessary to perform their tasks. Compromising the secret associated with such a ‘privileged’ identity acts as an attack force multiplier. It gives the hacker wide scope to move around the system (lateral movement), escalate privileges and ultimately take control of the entire cloud environment.

Sin #3: Static, immortal secrets (Lack of Rotation)

Many organisations do not have a formal policy or automated processes for regularly rotating secrets. Once generated, an API key or password remains active for months or even years. The longer a secret is valid, the more likely it is to be compromised and the more time an attacker has to exploit it. A critical design flaw is when changing a secret requires the system to stop running, which discourages regular rotation for fear of downtime.

Sin #4: Improper Storage

Instead of using dedicated, encrypted ‘vaults’, teams often store secrets in highly risky locations: .env files committed to a repository, in a database alongside application data, in server environment variables or on shared network drives. This approach creates a so-called Single Point of Failure. Breaking into the database means that the attacker gets the keys to all external systems and services integrated with the application “free of charge”.

Sin #5: Insecure Transmission

Even the best-guarded secret can be compromised when it is transmitted. A common but dangerous practice is to transmit API keys or passwords via corporate messengers (Slack, Microsoft Teams), in emails, or even paste them during a public screen presentation. In this way, the organisation loses control over who had access to the secret and where it was stored.

Sin #6: Environment Bleeding

Using the same secrets, especially keys to paid third-party services, in development, test and production environments is asking for trouble. Development and test environments inherently have lower levels of security. Their compromise can lead to a direct threat to production systems. It can also result in the accidental execution of costly production operations during testing, such as sending thousands of emails to real customers.

Sin #7: Ignorance and lack of auditing (Lack of Auditing & Education)

The final sin is of an organisational nature. It is the lack of mechanisms for logging and monitoring access to secrets, making it impossible to answer the fundamental question of ‘who, what and when’ in the event of an incident. Combined with the lack of formal procedures in the event of a compromise, this leads to chaos, delayed response and an inability to assess the extent of the damage. Underlying all of this is a lack of education for developers on good security practices.

The journey of a compromised key – From public repository to ransomware

To understand how critical the mistakes described in the previous chapter are, it is necessary to trace the typical lifecycle of a compromised secret. This journey, from careless publicity to full exploitation by cybercriminals, is often automated and extremely fast.

Phase 1: Harvesting

Modern attackers no longer need to manually scour the internet for vulnerabilities. Public code repositories, in particular GitHub, have become the main source of exposed secrets. The process of acquiring them is fully automated. Both attackers and security researchers use bots that monitor public commits (code changes) on platforms such as GitHub in real time. They scan each new line of code for patterns (using regular expressions and entropy analysis) that match the formats of known API keys, passwords or tokens.

The scale of this phenomenon is striking. According to a report by GitGuardian, in 2023 alone, 12.8 million unique secrets were exposed in public repositories on GitHub, found in more than 3 million repositories. This means that thousands of new valid credentials are entering the public domain every day.

Crucially, the leakage is immediate. In response to this threat, platforms such as GitHub have implemented their own scanning mechanisms. When they detect a potential secret, they automatically notify the relevant service provider (e.g. AWS, Google, Slack) within just a few seconds. This starts a race against time – whether the provider manages to invalidate the key before it is intercepted and exploited by a bot belonging to the attackers. Unfortunately, the data shows that defenders often lose this race: as many as 91.6 per cent of disclosed secrets remain active and valid even five days after the leak.

Phase 2: Distribution and Sales

Once obtained, the stolen credentials begin their life on the black market. They are automatically sorted, verified for validity (whether the key still works) and packaged into so-called ‘combo lists’ or sold individually on specialised dark web forums (such as the erstwhile Genesis Market) and on instant messaging channels such as Telegram.

In this ecosystem, so-called Initial Access Brokers (IABs) – specialised criminal groups that do not carry out ransomware attacks themselves, but focus on selling verified initial access to corporate networks – play a key role. A valid API key to a large company’s cloud environment is an extremely valuable commodity for them.

Phase 3: Exploitation – Case Studies

The theoretical risk materialises in the form of high-profile incidents that perfectly illustrate the mechanisms of attack:

Rabbit R1 (June 2024): This is a classic example of the effects of hardcoding. API keys to a number of critical services (ElevenLabs, Azure, Yelp, Google Maps) were found in the device’s source code. This enabled ethical hackers to not only read users’ query history, but also to modify the assistant’s responses and even remotely ‘uceglate’ (permanently damage) devices.

Dropbox (April 2024): Attackers gained access to the production environment of the Dropbox Sign service by compromising a service account with excessive privileges. This led to the leak of API keys, OAuth tokens and hashes of customers’ passwords, paving the way for the takeover of their accounts.

Trello (January 2024): A publicly disclosed API key allowed the mass downloading of data and linking of private email addresses to the public profiles of 15 million Trello users, creating a huge database for further phishing attacks.

Mercedes-Benz (March 2024): The leak of a single GitHub access token allowed an attacker to access the company’s internal Enterprise GitHub repositories, resulting in the disclosure of source code, schemas, API keys and other sensitive data.

An attack on secrets is not so much ‘hacking’ in the traditional sense as ‘information arbitrage’. Attackers do not need to break through complex cryptographic security. Instead, they exploit information asymmetry – finding publicly available but improperly secured keys before the defenders do. The whole process relies on publicly available data and the speed of automation. This completely changes the paradigm of defence: it becomes crucial not to have perimeter protection, but to have perfect ‘visibility’ of one’s own assets and to react instantly to one’s own mistakes.

Looking to the future – Preparing for the risks of tomorrow

The threat landscape is constantly evolving. In order to effectively protect secrets, organisations must not only fix the mistakes of the past, but also prepare for the challenges of the future. Three key trends that will define identity management in the coming years are the Zero Trust model, post-quantum cryptography and the growing importance of business context.

Zero Trust as the target architecture

The Zero Trust model is a philosophical and architectural response to the demise of the traditional networked perimeter. Its fundamental principle is ‘never trust, always verify’ (never trust, always verify). In the context of secret management, this represents an evolution from static, long-lived credentials to ephemeral, short-lived credentials delivered in a just-in-time (JIT) model. Instead of granting the application permanent access to the database, the JIT system generates a unique, temporary password valid only for the duration of a single session or even a single query. Once used, the credential is immediately invalidated.

This approach drastically reduces the time window in which a compromised secret can be exploited. This is particularly relevant in modern, dynamic environments, such as serverless architectures or containers, where machine identities are inherently ephemeral. The model is also applicable in the context of the Gig Economy, where temporary employees and contractors need access to company resources, but only on a limited basis and for a strictly defined period of time.

Post-quantum cryptography (PQC) and its impact on secrets

The emergence of powerful quantum computers on the horizon poses an existential threat to much of the asymmetric cryptography currently in use (e.g. RSA, ECC), which protects internet communications and secures stored data. The most pressing threat is an attack of the type

‘Harvest Now, Decrypt Later’. Attackers can already capture and store encrypted data – including entire vaults of secrets – with the intention of decrypting it in the future as soon as they gain access to a quantum computer.

The answer to this threat is post-quantum cryptography (PQC) – the development of new algorithms resistant to quantum attacks. However, the transition to PQC is a huge challenge, requiring organisations to build what is known as crypto-agility – the ability to seamlessly exchange cryptographic algorithms without disrupting systems.

Business context: Insurance and regulation

Secret management is no longer solely the domain of technical departments. It has become a key element of business risk management, as reflected in the increasing demands of insurers and regulators.

Cyber Insurance: Insurers are asking increasingly detailed questions about security practices when assessing risk before issuing a policy. Having a mature, documented policy for managing secrets – including their definition, storage policies, access control and rotation – is becoming a prerequisite for insurance and has a direct impact on premiums. Application questionnaires explicitly ask about data encryption, password policies, access control or firewall use.

Compliance:

GDPR (RODO): Leaking an API key that gives access to EU citizens’ personal data is treated as a serious data breach. Effective secret management is necessary to meet the requirements of ‘data integrity and confidentiality’ (Art. 5) and ‘security of processing’ (Art. 32).

PCI DSS: A Security Standard for the Payment Card Industry contains stringent requirements that directly relate to the management of secrets. These include a ban on the use of manufacturer-supplied default passwords (Requirement 2), strong encryption of stored cardholder data (Requirement 3) and strict access controls based on the ‘need to know’ principle and unique identifiers (Requirements 7 and 8).

In light of these trends, secret management must be seen as a strategic investment in business resilience. Technical leaders need to be able to present this topic to boards of directors not as an IT cost, but as a fundamental part of ensuring business continuity, compliance and building long-term competitive advantage in an increasingly uncertain digital world.

Share This Article