Structuring LLM Outputs

Structuring LLM Outputs

As large language models (LLMs) continue their rise in real-world applications, one stubborn problem persists: getting them to follow the rules - especially when it comes to structured output.

Whether you're extracting form data, populating APIs, or feeding a database that doesn't like surprises, structured responses are critical. Below are six techniques for wrangling clean, consistent output from your model.


1. Prompt Engineering

Approach: Give the model a detailed prompt and hope for the best (but design like you expect the worst). Use templates, JSON hints, or explicit format requests.

Example:

Please return data in this format:  
{  
  "name": "",  
  "age": "",  
  "email": ""  
}
        

✅ Pros:

  • No libraries needed
  • Great for quick iteration and prototyping

⚠️ Cons:

  • No guarantees—it might decide to write a poem instead
  • Fragile with complex inputs or ambiguous phrasing


2. Regular Expressions (Regex)

Approach: Don’t control output, just extract what you need afterward using regex. It's a post-processing trick, not a formatting guarantee. You can also combine it with Prompt Engineering for better chances of success.

Example:

import re  
match = re.search(r"Name: (\w+), Age: (\d+)", output)
        

✅ Pros:

  • Lightweight, fast, and dev-friendly
  • Excellent for simple flat data

⚠️ Cons:

  • Falls apart with nested data or slight format drift
  • Fragile


3. LangChain’s with_structured_output

Approach: Schema-first meets prompt-driven. Use a Pydantic model to define structure, then let LangChain wrangle the model into shape and validate outputs automatically.

Example:

class Person(BaseModel):  
    name: str  
    age: int  
    email: str  

structured_llm = llm.with_structured_output(Person)  
response = structured_llm.invoke("Generate a fictional user profile.")
        

✅ Pros:

  • Schema enforcement with clean Python objects
  • Reduces manual validation and weird corner cases

⚠️ Cons:

  • Requires LangChain and a compatible LLM backend
  • Still prompt-driven under the hood. Errors can still sneak in if the model misbehaves


4. Handling Complex Schemas with Trustcall, Instructor & Friends

Approach: When simple templates break down (which they will), bring in libraries that intelligently enforce and correct structure. These tools apply schema-aware logic, retries, and targeted fixes rather than re-generating everything.

Libraries:

✅ Pros:

  • Built to handle nested and complex data structures
  • Feedback loops improve reliability

⚠️ Cons:

  • Can introduce latency from retries or multi-pass parsing
  • Upfront effort for schema design and integration


5. Constrained Decoding: Control the Output Before It Misbehaves

Approach: Why fix messy outputs when you can prevent them altogether? Constrained decoding guides the model as it generates, limiting the tokens it’s allowed to produce at each step. This method directly manipulates the model’s logits - the raw probabilities behind every word choice - so that only valid continuations are possible.

Instead of saying, “Please return JSON,” constrained decoding says, “You are physically incapable of doing anything else.”

Token-Level Control via Logits Masking

Each token generation step relies on a softmax over the logits. Constrained decoding techniques apply a mask, setting the probability of invalid tokens to -inf. Only tokens that keep the output valid are left in play.

This transforms the LLM from a creative improviser into a rule-abiding structured generator.

Check Outlines for examples on:

  • Multiple choice generation
  • Type enforcement (e.g., integers only)
  • Efficient regex-constrained output
  • JSON generation following a Pydantic model
  • Context-free grammar (CFG) based generation
  • “Open functions” (functions with strict signature enforcement)

Example:

from enum import Enum
import outlines

class Food(str, Enum):
    pizza = "Pizza"
    pasta = "Pasta"
    salad = "Salad"
    dessert = "Dessert"

generator = outlines.generate.choice(model, Food)
response = generator(prompt)        

✅ Pros:

  • Hard guarantees: Invalid output can’t be generated
  • No retries or patching: Structure is enforced mid-generation
  • Ideal for high-stakes systems where output correctness matters

⚠️ Cons:

  • Requires access to low-level generation APIs (not yet supported by OpenAI’s hosted models)
  • Setup is more involved. It may require defining finite state machines, grammars, or schemas
  • Best performance often requires local model deployment or advanced SDKs


6. Vendor-Specific Features (e.g., OpenAI Function Calling / Schema Mode)

Approach: Use built-in structured output features from providers like OpenAI to enforce JSON schema adherence natively. Think of it as “hard mode” validation with none of the manual parsing.

Example:

schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"},
        "email": {"type": "string"}
    },
    "required": ["name", "age", "email"]
}

response = openai.ChatCompletion.create(
    model="gpt-4-1106-preview",
    messages=[{"role": "user", "content": "Generate a fictional profile."}],
    response_format="schema",
    schema=schema
)
        

✅ Pros:

  • Outputs guaranteed to match the schema (no funny business)
  • Perfect fit for API and backend integrations

⚠️ Cons:

  • Only available in specific models
  • Requires a fully defined schema upfront


Final Thoughts

Choosing the right strategy depends on how structured your data needs to be, how forgiving your system is of errors, and how much time you have to babysit output. And remember: sometimes, the best approach is a hybrid one. LLMs may be large and powerful, but they still need guardrails.

Rami Hoteit

AI Builder, Founder, Entrepreneur

1w

This is actually very useful thank you!

Hassan Kheireddin

Computer Science Student @Lebanese University | 42 Beirut Student | 2x AWS

1w

Great advice

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics