Structuring LLM Outputs

Ali Masri

Data and Machine Learning Engineer @ Ford Motor Company | Ph.D. in Computer Science

Published May 7, 2025

As large language models (LLMs) continue their rise in real-world applications, one stubborn problem persists: getting them to follow the rules - especially when it comes to structured output.

Whether you're extracting form data, populating APIs, or feeding a database that doesn't like surprises, structured responses are critical. Below are six techniques for wrangling clean, consistent output from your model.

1. Prompt Engineering

Approach: Give the model a detailed prompt and hope for the best (but design like you expect the worst). Use templates, JSON hints, or explicit format requests.

Example:

Please return data in this format:  
{  
  "name": "",  
  "age": "",  
  "email": ""  
}

✅ Pros:

No libraries needed
Great for quick iteration and prototyping

⚠️ Cons:

No guarantees—it might decide to write a poem instead
Fragile with complex inputs or ambiguous phrasing

2. Regular Expressions (Regex)

Approach: Don’t control output, just extract what you need afterward using regex. It's a post-processing trick, not a formatting guarantee. You can also combine it with Prompt Engineering for better chances of success.

Example:

import re  
match = re.search(r"Name: (\w+), Age: (\d+)", output)

✅ Pros:

Lightweight, fast, and dev-friendly
Excellent for simple flat data

⚠️ Cons:

Falls apart with nested data or slight format drift
Fragile

3. LangChain’s with_structured_output

Approach: Schema-first meets prompt-driven. Use a Pydantic model to define structure, then let LangChain wrangle the model into shape and validate outputs automatically.

Example:

class Person(BaseModel):  
    name: str  
    age: int  
    email: str  

structured_llm = llm.with_structured_output(Person)  
response = structured_llm.invoke("Generate a fictional user profile.")

✅ Pros:

Schema enforcement with clean Python objects
Reduces manual validation and weird corner cases

⚠️ Cons:

Requires LangChain and a compatible LLM backend
Still prompt-driven under the hood. Errors can still sneak in if the model misbehaves

4. Handling Complex Schemas with Trustcall, Instructor & Friends

Approach: When simple templates break down (which they will), bring in libraries that intelligently enforce and correct structure. These tools apply schema-aware logic, retries, and targeted fixes rather than re-generating everything.

Recommended by LinkedIn

Mastering the Ingestion Phase of Retriever Augmented…

Snigdha Kakkar 1 year ago

Implementing RAG (Retrieval-Augmented Generation) with…

Majid Sheikh 1 week ago

From Confusion to Clarity: Are AI Agent Frameworks…

Muhammad Bilal Akhtar 2 months ago

Libraries:

✅ Pros:

Built to handle nested and complex data structures
Feedback loops improve reliability

⚠️ Cons:

Can introduce latency from retries or multi-pass parsing
Upfront effort for schema design and integration

5. Constrained Decoding: Control the Output Before It Misbehaves

Approach: Why fix messy outputs when you can prevent them altogether? Constrained decoding guides the model as it generates, limiting the tokens it’s allowed to produce at each step. This method directly manipulates the model’s logits - the raw probabilities behind every word choice - so that only valid continuations are possible.

Instead of saying, “Please return JSON,” constrained decoding says, “You are physically incapable of doing anything else.”

Token-Level Control via Logits Masking

Each token generation step relies on a softmax over the logits. Constrained decoding techniques apply a mask, setting the probability of invalid tokens to -inf. Only tokens that keep the output valid are left in play.

This transforms the LLM from a creative improviser into a rule-abiding structured generator.

Check Outlines for examples on:

Multiple choice generation
Type enforcement (e.g., integers only)
Efficient regex-constrained output
JSON generation following a Pydantic model
Context-free grammar (CFG) based generation
“Open functions” (functions with strict signature enforcement)

Example:

from enum import Enum
import outlines

class Food(str, Enum):
    pizza = "Pizza"
    pasta = "Pasta"
    salad = "Salad"
    dessert = "Dessert"

generator = outlines.generate.choice(model, Food)
response = generator(prompt)

✅ Pros:

Hard guarantees: Invalid output can’t be generated
No retries or patching: Structure is enforced mid-generation
Ideal for high-stakes systems where output correctness matters

⚠️ Cons:

Requires access to low-level generation APIs (not yet supported by OpenAI’s hosted models)
Setup is more involved. It may require defining finite state machines, grammars, or schemas
Best performance often requires local model deployment or advanced SDKs

6. Vendor-Specific Features (e.g., OpenAI Function Calling / Schema Mode)

Approach: Use built-in structured output features from providers like OpenAI to enforce JSON schema adherence natively. Think of it as “hard mode” validation with none of the manual parsing.

Example:

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"}
    },
    "required": ["name", "age", "email"]
}

response = openai.ChatCompletion.create(
    model="gpt-4-1106-preview",
    messages=[{"role": "user", "content": "Generate a fictional profile."}],
    response_format="schema",
    schema=schema
)

✅ Pros:

Outputs guaranteed to match the schema (no funny business)
Perfect fit for API and backend integrations

⚠️ Cons:

Only available in specific models
Requires a fully defined schema upfront

Final Thoughts

Choosing the right strategy depends on how structured your data needs to be, how forgiving your system is of errors, and how much time you have to babysit output. And remember: sometimes, the best approach is a hybrid one. LLMs may be large and powerful, but they still need guardrails.

Structuring LLM Outputs

Ali Masri

Data and Machine Learning Engineer @ Ford Motor Company | Ph.D. in Computer Science

1. Prompt Engineering

2. Regular Expressions (Regex)

3. LangChain’s with_structured_output

4. Handling Complex Schemas with Trustcall, Instructor & Friends

Recommended by LinkedIn

5. Constrained Decoding: Control the Output Before It Misbehaves

Token-Level Control via Logits Masking

Example:

✅ Pros:

⚠️ Cons:

6. Vendor-Specific Features (e.g., OpenAI Function Calling / Schema Mode)

Final Thoughts

Insights from the community

Others also viewed

Building Your First RAG-powered LLM Application with Langchain: A Step-by-Step Guide

Understanding the RAG Pipeline: Components and Hyperparameters

Qwen3: self-hosting guide with vLLM and SGLang

Using ChatGPT to Replicate Results of Hidden Markov Modeling

Take your RAG system to the next level

Demystifying Semantic Kernel

Transform Any Website into an AI Knowledge Base Instantly

Technical Report on LLM Performance for Code Review

What is Qdrant?

Explore topics