You wouldn’t merge to main without tests, so why ship LLM features without observability? Gal Kleinman, co-founder of Traceloop, just broke down what it takes to see inside your prompts: • 𝐒𝐭𝐚𝐫𝐭 𝐰𝐢𝐭𝐡 𝐚𝐧 𝐞𝐯𝐚𝐥 𝐬𝐮𝐢𝐭𝐞, 𝐧𝐨𝐭 𝐚 𝐝𝐚𝐬𝐡𝐛𝐨𝐚𝐫𝐝: Treat evals as the spec for your LLM flow; they’re the only way to know if “works on my machine” is actually working for users. • 𝐈𝐧𝐬𝐭𝐫𝐮𝐦𝐞𝐧𝐭 𝐞𝐯𝐞𝐫𝐲𝐭𝐡𝐢𝐧𝐠 𝐰𝐢𝐭𝐡 𝐎𝐩𝐞𝐧𝐋𝐋𝐌𝐞𝐭𝐫𝐲: One line of code wraps OpenTelemetry around OpenAI, Anthropic, LangChain, vector DBs, the lot—so you trace tokens, cost, and quality from dev to prod. “𝘛𝘩𝘦 𝘸𝘩𝘰𝘭𝘦 𝘵𝘦𝘳𝘮 𝘰𝘧 𝘨𝘰𝘰𝘥 𝘰𝘳 𝘣𝘢𝘥, 𝘰𝘳 𝘣𝘦𝘵𝘸𝘦𝘦𝘯 𝘴𝘶𝘤𝘤𝘦𝘴𝘴 𝘢𝘯𝘥 𝘧𝘢𝘪𝘭𝘶𝘳𝘦, 𝘪𝘵'𝘴 𝘯𝘰𝘵 𝘵𝘩𝘢𝘵 𝘤𝘭𝘦𝘢𝘳… 𝘪𝘵’𝘴 𝘯𝘰𝘵 𝘢 𝘣𝘪𝘯𝘢𝘳𝘺 𝘳𝘦𝘴𝘶𝘭𝘵.” — 𝘎𝘢𝘭 𝘒𝘭𝘦𝘪𝘯𝘮𝘢𝘯 Read the blog, link in the comments #AINativeDev #LLMObservability #DevTools #OpenTelemetry
A good reminder that evaluation is not a one-time setup. Needs to evolve with the app.
Listen to the full episode here: http://ainativedev.co/hl2
Really fun chat, Gal Kleinman! Thanks for the insights!
👑