Structuring LLM Outputs
As large language models (LLMs) continue their rise in real-world applications, one stubborn problem persists: getting them to follow the rules - especially when it comes to structured output.
Whether you're extracting form data, populating APIs, or feeding a database that doesn't like surprises, structured responses are critical. Below are six techniques for wrangling clean, consistent output from your model.
1. Prompt Engineering
Approach: Give the model a detailed prompt and hope for the best (but design like you expect the worst). Use templates, JSON hints, or explicit format requests.
Example:
Please return data in this format:
{
"name": "",
"age": "",
"email": ""
}
✅ Pros:
⚠️ Cons:
2. Regular Expressions (Regex)
Approach: Don’t control output, just extract what you need afterward using regex. It's a post-processing trick, not a formatting guarantee. You can also combine it with Prompt Engineering for better chances of success.
Example:
import re
match = re.search(r"Name: (\w+), Age: (\d+)", output)
✅ Pros:
⚠️ Cons:
3. LangChain’s with_structured_output
Approach: Schema-first meets prompt-driven. Use a Pydantic model to define structure, then let LangChain wrangle the model into shape and validate outputs automatically.
Example:
class Person(BaseModel):
name: str
age: int
email: str
structured_llm = llm.with_structured_output(Person)
response = structured_llm.invoke("Generate a fictional user profile.")
✅ Pros:
⚠️ Cons:
4. Handling Complex Schemas with Trustcall, Instructor & Friends
Approach: When simple templates break down (which they will), bring in libraries that intelligently enforce and correct structure. These tools apply schema-aware logic, retries, and targeted fixes rather than re-generating everything.
Recommended by LinkedIn
Libraries:
✅ Pros:
⚠️ Cons:
5. Constrained Decoding: Control the Output Before It Misbehaves
Approach: Why fix messy outputs when you can prevent them altogether? Constrained decoding guides the model as it generates, limiting the tokens it’s allowed to produce at each step. This method directly manipulates the model’s logits - the raw probabilities behind every word choice - so that only valid continuations are possible.
Instead of saying, “Please return JSON,” constrained decoding says, “You are physically incapable of doing anything else.”
Token-Level Control via Logits Masking
Each token generation step relies on a softmax over the logits. Constrained decoding techniques apply a mask, setting the probability of invalid tokens to -inf. Only tokens that keep the output valid are left in play.
This transforms the LLM from a creative improviser into a rule-abiding structured generator.
Check Outlines for examples on:
Example:
from enum import Enum
import outlines
class Food(str, Enum):
pizza = "Pizza"
pasta = "Pasta"
salad = "Salad"
dessert = "Dessert"
generator = outlines.generate.choice(model, Food)
response = generator(prompt)
✅ Pros:
⚠️ Cons:
6. Vendor-Specific Features (e.g., OpenAI Function Calling / Schema Mode)
Approach: Use built-in structured output features from providers like OpenAI to enforce JSON schema adherence natively. Think of it as “hard mode” validation with none of the manual parsing.
Example:
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"},
"email": {"type": "string"}
},
"required": ["name", "age", "email"]
}
response = openai.ChatCompletion.create(
model="gpt-4-1106-preview",
messages=[{"role": "user", "content": "Generate a fictional profile."}],
response_format="schema",
schema=schema
)
✅ Pros:
⚠️ Cons:
Final Thoughts
Choosing the right strategy depends on how structured your data needs to be, how forgiving your system is of errors, and how much time you have to babysit output. And remember: sometimes, the best approach is a hybrid one. LLMs may be large and powerful, but they still need guardrails.
AI Builder, Founder, Entrepreneur
1wThis is actually very useful thank you!
Computer Science Student @Lebanese University | 42 Beirut Student | 2x AWS
1wGreat advice