Edition 36 - Improving LLM Safety & Reliability

Arize AI

Arize AI is unified AI observability and LLM evaluation platform - built for AI engineers, by AI engineers

Published Dec 6, 2024

This month's edition of the Evaluator is packed with cutting-edge insights and practical know-how from our team. Learn how to instrument your LLM app, how to create multi-agent applications, or check out our agents series (now on-demand) for real-world examples of agents in production.

As always, we conclude with some of our favorite news, papers, community threads, and upcoming events.

Improving LLM Safety & Reliability in LLM Applications

Today’s AI engineering loop is very brittle, where small changes can result in big performance drops. Building better AI requires that you address LLM safety and reliability, and in this blog, we’ll show you how. Eric Xiao reviews all the different ways to improve safety and reliability in your LLM applications, including tracing, evaluations, experiments, guardrails, and more. Read It

LLM Evaluation Course

LLM evaluations can take many forms, from code-based comparisons against ground-truth data, to LLM as a Judge queries to validate outputs. This resource by Aparna Dhinakaran and Steven Miller covers different types of LLM evals, how they are used, and important factors to consider when structuring your LLM evaluation system. Read It

OpenAI's Realtime API

Sally-Ann DeLucia and Aparna Dhinakaran cover how to seamlessly integrate powerful language models into your applications for instant, context-aware responses that drive user engagement. Whether you’re building chatbots, dynamic content tools, or enhancing real-time collaboration, we walk through the API’s capabilities and potential use cases. Read it

Instrumenting Your LLM Application: Arize Phoenix and Vercel AI SDK

What is AutoGen?

AutoGen is a framework that helps you easily create multi-agent applications. Multi-agent applications are a relatively recent idea that involve defining multiple LLM agents, each with their own goals and capabilities, and allowing them to work together to achieve an end goal. John Gilhuly explains how it works. Read it

On Demand: Building an Agent or Assistant Series

Our 5-part series on real-life agents deployed in production is now available to watch. We deep dive into the agent architectures, the systems used in their development, and lessons learned from using them in production. Each week, we unpack a new example agent or agent component used in a real-world agent.

If you want a primer first, our previous series is available on-demand, and covers basic agent components, architectures, and frameworks. Watch it

Staff Picks 🤖

Here's a roundup of our team's favorite news, papers, and community threads recently.

🔥 Just Released: Phoenix 6.0

🔥 Reverse Thinking Makes LLMs Stronger Reasoners

🔥 Probabilistic Weather Forecasting with Machine Learning

🔥 Pydantic AI Launches Agent Framework

Edition 36 - Improving LLM Safety & Reliability

Arize AI

Arize AI is unified AI observability and LLM evaluation platform - built for AI engineers, by AI engineers

Improving LLM Safety & Reliability in LLM Applications

LLM Evaluation Course

OpenAI's Realtime API

Instrumenting Your LLM Application: Arize Phoenix and Vercel AI SDK

Recommended by LinkedIn

What is AutoGen?

On Demand: Building an Agent or Assistant Series

Staff Picks 🤖

The Evaluator

5,524 followers

More articles by Arize AI

Insights from the community

Others also viewed

Learn how to evaluate and score results from GPT-like systems

GPT: Developer Tips, Tricks & Techniques

🚀 Run Powerful LLMs Locally on Your Machine! Here's How (Ollama + Enchanted + Ngrok + DeepSeek V3!) 🚀

Vector RAG w/o fine tuned LLM

🌟 Introduction to LLM Agents with LangChain: When RAG is Not Enough #4

AI Daily Newsletter - Pilot Edition!

The Engineering Behind the Magic: Shipping AI-Agent Systems That Survive Real-World Scrutiny

The temporary definitive guide to building and operating LLM solutions in production environments

Extending Klave AI with LLMs.

Article Review: Anthropic's Insights on LLM Agents

Explore topics

Improving LLM Safety & Reliability in LLM Applications

LLM Evaluation Course

OpenAI's Realtime API

Instrumenting Your LLM Application: Arize Phoenix and Vercel AI SDK

Recommended by LinkedIn

What is AutoGen?

On Demand: Building an Agent or Assistant Series

Staff Picks 🤖

The Evaluator

5,524 followers

More articles by Arize AI

Understanding LLM Benchmarks

Edition 37 – How to Build Smarter AI Agents

Edition 35 - Creating Self-Improving LLM Evals

Edition 34 - Choosing the Best LLM Eval Model

Edition 33 – How LLM Tracing Works

Edition 32 – How to Protect Your LLM App

Edition 31 – How to Build a Great LLM App

Edition 30 - Should You Trust an LLM to Pick Stocks?

Edition 29 - There is More Than One LLM Eval

Edition 28 – How Well Do LLMs Conduct Numeric Evaluations?

Insights from the community

Others also viewed

Learn how to evaluate and score results from GPT-like systems

GPT: Developer Tips, Tricks & Techniques

🚀 Run Powerful LLMs Locally on Your Machine! Here's How (Ollama + Enchanted + Ngrok + DeepSeek V3!) 🚀

Vector RAG w/o fine tuned LLM

🌟 Introduction to LLM Agents with LangChain: When RAG is Not Enough #4

AI Daily Newsletter - Pilot Edition!

The Engineering Behind the Magic: Shipping AI-Agent Systems That Survive Real-World Scrutiny

The temporary definitive guide to building and operating LLM solutions in production environments

Extending Klave AI with LLMs.

Article Review: Anthropic's Insights on LLM Agents

Explore topics