The Day After DevOps: When the Real Problems Begin

The Day After DevOps: When the Real Problems Begin

We built the platform. We shipped the golden paths. We demoed the workflows.

And then, everything fell apart.

The truth? Most platform problems don’t start on Day 1... they begin on Day 2.

When the lights dim, the documentation goes stale, and developers start improvising because the friction is too high.

You won’t hear this in the shiny talks or the well-polished Medium posts. But if you’ve ever lived through a real internal platform rollout, you know what I’m talking about.

Let’s unpack the mess.

Onboarding Is a Lie

Everyone loves to talk about “self-service” and “golden paths”. But here’s what usually happens:

  • Dev joins the team.
  • Gets dropped into an empty Slack channel.
  • Finds three different onboarding guides, all outdated.
  • Ends up asking ChatGPT how to deploy.

Sounds familiar?

The truth is: onboarding isn’t a moment. It’s a process.

And it should be part of your platform’s product lifecycle, not a README dumped in a repo that nobody maintains.

  1. Track where people get stuck.
  2. Shadow new hires.
  3. Bake onboarding into your feedback loops.

Golden paths don’t stay golden without polish.

Drift Happens

  • Day 1: Git is your source of truth.
  • Day 30: someone hotfixes in production.
  • Day 90: no one remembers what Git was supposed to reflect.

Drift is real, and it’s not just about infrastructure. It’s about trust. Trust that what you see is what’s running.

Ignoring drift is how you end up debugging a bug that was supposedly “fixed last month”.

  1. Use tools like drift detection, continuous reconciliation, or even periodic diffs.
  2. Flag divergence early, not just when pipelines break.

The platform isn’t stable if Git and reality disagree.

Secrets Sprawl Fast

You set up Vault. You made a cool UI. You trained everyone.

And then someone hardcodes an AWS key in values.yaml because “it’s just staging”.

Now multiply that by 12 teams and 6 environments.

Secret sprawl isn’t just a security risk... it’s a sign your platform isn’t respecting developer flow.

  1. Short-lived credentials.
  2. GitHub OIDC.
  3. Secret scanning and feedback loops.
  4. Make the right thing the easiest thing.

Security has to be boring, invisible, and default. Or it will be bypassed.

The Orphaned Infrastructure Graveyard

You killed the feature. You forgot the infrastructure.

Six months later, someone’s still paying for the zombie RDS. And nobody wants to delete it, just in case.

This is how you get platform bloat, surprise bills, and resource limits in clusters you thought were empty.

  1. Tag infra with owners and TTLs.
  2. Build cleanup routines into your platform lifecycle.
  3. If no one owns it, it shouldn’t exist.

Infrastructure without purpose becomes legacy by accident.

The Platform with No Purpose

We built it, but no one uses it.

I’ve heard this more times than I can count.

You launched CI templates. You standardized CD pipelines. You built an internal UI. But devs are still copy-pasting old Actions workflows.

Why? Because your platform is solving your problems, not theirs.

  1. Platforms are products.
  2. They need roadmaps, user interviews, success metrics.
  3. If it’s not helping developers ship faster, safer, and happier, it’s just overhead.

Adoption isn’t guaranteed. It’s earned.

The Day After DevOps

Day 1 is the easy part. It’s the demo, the decks, the dashboards.

But Day 2? That’s when the platform meets reality.

  • People onboard.
  • Infrastructure drifts.
  • Secrets leak.
  • Ownership fades.
  • Tools go unused.

That’s the real test.

If you’re not building for Day 2, you’re just setting yourself up for entropy.


What’s the most painful Day 2 issue you’ve faced? Or maybe you’ve lived through the infra graveyard, the broken onboarding, or the “platform nobody uses” nightmare?

I’d love to hear your tales. Let’s share the scars and build better platforms, together.

Drop your Day 2 war stories in the comments.

Guilherme Luiz Maia Pinto

Back End Engineer | Software Engineer | TypeScript | NodeJS | ReactJS | AWS | MERN | GraphQL | Jenkins | Docker

17h

Thanks for sharing 🚀

Like
Reply
Samuel Lima

Solution Architect | Senior Software Engineer | Microsoft Environment | Agile Certified

1d

Thoughtful post!

Fabrício Ferreira

Senior Software Engineer | Flutter & Dart Specialist | Mobile Developer | Mobile Engineer | iOS • Android • Swift • Kotlin

3d

Very informative

Igor Matsuoka

Full Stack Engineer | React.js | Node.js | NextJS | AWS

5d

Nice point of view Leo E.!

Cassio Santiago

Senior Data Engineer | AWS Certified | Python | SQL | ETL | Data Warehouse | Redshift | Data Modeling | Data Ingestion | Cloud | AI | ML | LLM

5d

This post brilliantly captures the ongoing challenge of platform sustainability beyond the initial launch phase. Planning for the inevitable chaos is crucial in ensuring long-term success in platform engineering. I look forward to reading your insights on designing for resilience amidst complexity. #PlatformEngineering #DevOps #DeveloperExperience #CloudEngineering #InternalPlatforms

To view or add a comment, sign in

More articles by Leo E.

Insights from the community

Others also viewed

Explore topics