PART 2 // Understanding LLM Behavior in Production // Why Your AI Feature Works in Dev and Breaks in Prod
Every now and again, I get calls from founders who I advise on AI-native products saying: “It was amazing in testing. But now it’s giving weird results in prod.”
What works in staging often fails in prod. Not because the model changed, but because you didn’t understand how it thinks.
The Paradigm Shift: From User to Co-Creator
Here’s the truth: LLMs don’t behave like traditional APIs. They operate probabilistically. That means you're managing variance, not just logic. This shift from deterministic logic to probabilistic variance is a core paradigm shift for product leaders entering the AI space to grasp.
When you ship an LLM-powered product, your users aren’t just “using a feature”, they’re co-authoring the outcome with the model. This co-authorship model necessitates a move beyond traditional user journey mapping to anticipate a wider range of potential interactions and outcomes.
That makes your product inherently:
LLMs behave differently depending on:
So if you’re not logging and analyzing usage at scale, you’re operating blind. Scalable logging and sophisticated analytics are no longer optional; they are the essential sensors for understanding and managing your AI product in the real world.
Real World Example
Duolingo Max launched with GPT-4 to power Explain My Answer. Before launch, their team battle-tested thousands of edge cases, because they knew LLMs behave differently when exposed to messy, real-world language. Their proactive focus on real-world edge cases, rather than just ideal scenarios, highlights a mature understanding of LLM deployment.
Recommended by LinkedIn
They didn’t just test prompts. They studied model behavior patterns at scale. This shift from individual prompt testing to understanding systemic behavioral patterns is a key differentiator in building robust AI products.
That’s what let them ship something robust, not gimmicky.
The Technical Jargon
Practical Actions for Product Managers
Suggested Resource for Deeper Learning
OpenAI’s API Behavior Guide covers temperature, max tokens, and managing behavioral drift. Read the documentation (and encourage your teams too as well) to actively experiment with these parameters to develop an intuitive understanding of their impact on model behavior.
If you don’t understand LLM behavior in the wild, you’re not managing a product, you’re simply crossing your fingers. Proactive understanding and management of LLM behavior is a core differentiator for successful AI product leaders.
Treat your model like a team member:
Viewing the LLM as a dynamic team member requiring ongoing guidance and management reflects a mature approach to AI product leadership.
EV charging and energy management solutions for business as SaaS, CaaS and EaaS @Remea
4dBlaz Jakopin, PhD Žan Dapčevič Adrian Drago Poprijan