No Secrets Left Behind: Mastering Git Cleanup and Security Best Practices
As a seasoned developer, I may be scratching my head, wondering why anyone may put secrets inside a git repository. But the fact is, secrets inside git repositories are the current state of the world.
The Prevalence of Secrets in Git Repositories
I’ve recently stumbled upon the fact regarding the public Git repositories. Every 5 commits out of 1000 exposes at least one secret. 😮 It’s a known “secret” that most personal public leaks belong to companies and are “accidentally” pushed to developers’ personal repos.
The Challenge with Private Repositories
I can tell what surprised me more: this previous statement or the recent encounter with one of the seniors/lead team members pushing secret data from the project’s initial start to a private repo. When confronted, the response was: It is a private repo, and we will change that later. Both of these statements left me speechless.
The Git Nature of Code Duplication
Source code repositories are meant to be shared with teammates, within the company, or the entire world (as is the case for open-source software). Code is copied and transferred everywhere. Git is designed to allow, even promote, code to be freely distributed. Projects can be cloned onto multiple machines, forked into new projects, distributed to customers, made public, and so forth. Each time a project is duplicated on Git, the entire history of that project is also duplicated. Private repositories don’t openly publish the source code to the Internet, but they don’t have adequate protection to store such sensitive information.
The Challenge with Git History
Another important consideration is that code removed from a git repository is never actually gone. Git keeps track of all changes that are made. Code that is removed - or, more technically correct, code that is committed over - still exists within the git history. This means that the code within repositories is much deeper than the first layer, and secrets could be buried deep within the git history under a mass of commits that have been long forgotten.
Scenarios and Solutions
“We all make mistakes” is one of my mantras as a developer. It shouldn’t be an excuse, but it can be an opportunity to grow. I have been on both sides of the fence — unintentionally adding the wrong content and was also a part of the “cleanup” squad. As such, I would like to share how you can scrub this content from your Git repositories thoroughly.
Typically, we have two scenarios.
Recommended by LinkedIn
Tools to Safeguard Against Accidental Exposure of Sensitive Information
Both scenarios, in reality, are bringing to light the pressing need for tools and practices that can help developers safeguard their code against accidental exposure of sensitive information. Enter the realm of secret scanning tools like GitGuardian, TruffleHog, and Gitleaks. These powerful allies in the fight against security breaches operate by scanning your repositories for known patterns that resemble secrets, such as API keys, passwords, and tokens. Whether it’s a piece of code or an infrastructure-as-code configuration, these tools diligently sift through every line to identify potential vulnerabilities before they can be exploited.
Rewriting Git history
However, identifying secrets is only half the battle. What happens when sensitive data makes its way into your Git history? This is where tools like git-filter-repo and BFG Repo-Cleaner come into play, offering a lifeline for developers looking to rewrite history—literally. These tools allow you to purge files or sensitive content from your repository’s history, effectively erasing any trace of the data you never intended to commit. This process, known as history rewriting, is a crucial step in mitigating the damage caused by accidental exposure and ensuring that your secrets don’t become public knowledge.
After the arduous task of scrubbing your Git history clean of any sensitive data, it’s tempting to think the job is done. However, this cleanup is just the beginning of a series of critical steps required to ensure the integrity and security of your repository moving forward, especially when collaborating on platforms like Bitbucket or GitHub.
Post-Cleanup Actions
Preventing Future Leaks
The best defence against sensitive data exposure is a good offence. Implementing strategies to prevent accidental commits of sensitive information is crucial.
Conclusion
The journey through cleansing Git repositories and securing CI/CD pipelines underscores a broader theme in software development: the paramount importance of vigilance and proactive security measures. The lessons learned extend beyond the technical nuances of Git history rewriting tools on fundamental principles every developer should internalize: Never store sensitive data in Git with the old but gold; never trust the user input blindly.