Edition 36 - Improving LLM Safety & Reliability

Edition 36 - Improving LLM Safety & Reliability

This month's edition of the Evaluator is packed with cutting-edge insights and practical know-how from our team. Learn how to instrument your LLM app, how to create multi-agent applications, or check out our agents series (now on-demand) for real-world examples of agents in production.

As always, we conclude with some of our favorite news, papers, community threads, and upcoming events.


Improving LLM Safety & Reliability in LLM Applications

Article content

Today’s AI engineering loop is very brittle, where small changes can result in big performance drops. Building better AI requires that you address LLM safety and reliability, and in this blog, we’ll show you how. Eric Xiao reviews all the different ways to improve safety and reliability in your LLM applications, including tracing, evaluations, experiments, guardrails, and more. Read It


LLM Evaluation Course

Article content

LLM evaluations can take many forms, from code-based comparisons against ground-truth data, to LLM as a Judge queries to validate outputs. This resource by Aparna Dhinakaran and Steven Miller covers different types of LLM evals, how they are used, and important factors to consider when structuring your LLM evaluation system. Read It


OpenAI's Realtime API

Article content

Sally-Ann DeLucia and Aparna Dhinakaran cover how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or enhancing real-time collaboration, we walk through the API’s capabilities and potential use cases. Read it


Instrumenting Your LLM Application: Arize Phoenix and Vercel AI SDK

Article content

Evan Jolley dives into why instrumentation matters for LLM applications, the benefits of implementing instrumentation, and provides a guide on integrating Arize Phoenix with Vercel AI SDK for observability in Next.js applications. Read it


What is AutoGen? 

Article content

AutoGen is a framework that helps you easily create multi-agent applications. Multi-agent applications are a relatively recent idea that involve defining multiple LLM agents, each with their own goals and capabilities, and allowing them to work together to achieve an end goal. John Gilhuly explains how it works. Read it


On Demand: Building an Agent or Assistant Series

Article content

Our 5-part series on real-life agents deployed in production is now available to watch. We deep dive into the agent architectures, the systems used in their development, and lessons learned from using them in production. Each week, we unpack a new example agent or agent component used in a real-world agent.

If you want a primer first, our previous series is available on-demand, and covers basic agent components, architectures, and frameworks. Watch it


Staff Picks 🤖

Here's a roundup of our team's favorite news, papers, and community threads recently.

🔥 Just Released: Phoenix 6.0

🔥 Reverse Thinking Makes LLMs Stronger Reasoners

🔥 Probabilistic Weather Forecasting with Machine Learning

🔥 Pydantic AI Launches Agent Framework

To view or add a comment, sign in

More articles by Arize AI

Insights from the community

Others also viewed

Explore topics