Extract Structured Data from LLMs: Python Tutorial with Pydantic and Instructor (GPT, Claude, Gemini)
If you've used an LLM in production, you know the drama: the model outputs a beautiful JSON for 3 days, and on the fourth day, it decides to invent a new key, skip a required field, or — worst of all — return plain text inside a block that should be JSON.
This problem has a name: schema enforcement. And until 2025, every team solved it with a hack — regex, retry loops, post-processing validators. In 2026, the story changed.
"Structured outputs have gone from a hack (regex parsing, retry loops) to a first-class feature across all major LLM providers." — TokenMix.ai, analysis of 2 million API calls
OpenAI, Anthropic, and Google now deliver schema guarantees at the API level. It's no longer "hoping it comes out right" — it's a contract. Each does it their own way, of course. This tutorial shows the 4 paths to extracting structured data in Python, from the most specific to the most universal, with code you can copy and adapt today.
The (Hidden) Cost of Loose JSON
Before diving into the code, it's worth understanding the scale of the problem. An analysis of 2 million API calls published by TokenMix.ai revealed that JSON mode without schema enforcement fails in 8 to 15% of requests. The reasons vary: missing field, wrong type, extra key, or simply invalid JSON.
Each such failure means a new retry call. And each retry doubles latency and burns tokens.
| Output Mode | Failure Rate | Extra Cost per Failure |
|---|---|---|
| Free JSON mode | 8–15% | Retry = 2× latency + discarded tokens |
| Structured output (OpenAI strict) | < 0.1% | 30–300 tokens overhead per call |
| Forced tool use (Anthropic) | ~0% on Carrick Benchmark (7/8 schemas) | Schema overhead in tool definition |
| Response schema (Gemini) | 0% on accepted schemas | Pre-flight validation rejects invalid schemas |
The conclusion is straightforward: schema enforcement pays for itself — 30 to 300 tokens of overhead per call, which is nothing compared to a 10% retry rate. As TokenMix.ai summarized:
"Schema enforcement is solved — pick OpenAI Structured Outputs, Anthropic tool use, or Gemini schema. Stop building retry logic for malformed JSON."
Let's get to the implementation.
Option 1: OpenAI Structured Outputs
OpenAI was the first to offer formal schema guarantees with strict: true in the response_format mode. In 2026, the recommended path is the SDK's parse() method — which already integrates Pydantic natively.
from pydantic import BaseModel, Field
from openai import OpenAI
Define the schema with Pydantic — strong typing, no surprises
class TicketTriage(BaseModel): prioridade: str = Field(description="Alta, Media ou Baixa") departamento: str = Field(description="Suporte, Faturamento, Tecnico") resumo: str = Field(description="Resumo em 1 frase") precisa_escalacao: bool = Field(description="Precisa de humano?")
client = OpenAI()
The parse() method guarantees schema compliance
resposta = client.beta.chat.completions.parse( model="gpt-5.5", messages=[ {"role": "system", "content": "Classifique o ticket do cliente."}, {"role": "user", "content": "Meu cartao foi cobrado duas vezes esse mes"} ], response_format=TicketTriage, )
ticket: TicketTriage = resposta.choices[0].message.parsed print(f"Departamento: {ticket.departamento}, Prioridade: {ticket.prioridade}")
Pros: 99.9% compliance proven in tests with 500k calls (Source: TokenMix.ai). Native integration with Pydantic.
Cons: OpenAI's strict mode rejects 6 out of 8 schemas pre-flight because it requires additionalProperties: false, all fields as required, and does not accept oneOf or type: array in certain cases (Source: Carrick Benchmark, May/2026). In other words: complex schemas can be blocked before they even reach the model.
Option 2: Anthropic Forced Tool Use
Anthropic treats structured output as a "forced tool call" — you define a tool with input_schema and force the model to use it with tool_choice: {type: "tool", name: "..."}.
from pydantic import BaseModel
from anthropic import Anthropic
class TicketTriage(BaseModel): prioridade: str departamento: str resumo: str precisa_escalacao: bool
client = Anthropic()
resposta = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, tools=[{ "name": "extrair_ticket", "description": "Classifica ticket de suporte", "input_schema": TicketTriage.model_json_schema() }], tool_choice={"type": "tool", "name": "extrair_ticket"}, messages=[ {"role": "user", "content": "Assinatura renovou mas nao consigo acessar"} ] )
Response comes as a tool call — guaranteed structure
tool_call = resposta.content[0] ticket = TicketTriage.model_validate(tool_call.input)
Anthropic performs very well on the Carrick Benchmark (May/2026): Claude Sonnet 4.6 produces 100% compliant output for 7 out of 8 tested schemas. The Achilles' heel? Schemas with deep nesting (7 levels), where compliance drops to 0%.
Pros: Flexible approach, works with any JSON schema, no draconian schema restrictions. Ideal for hierarchical data up to 4-5 levels.
Cons: The tool call round trip adds an extraction step. And if your schema is very nested, prepare for failures.
Option 3: Gemini response_schema
Google entered the fray with the response_schema parameter in generation_config, combined with response_mime_type: "application/json".
from pydantic import BaseModel
from google import genai
class TicketTriage(BaseModel): prioridade: str departamento: str resumo: str precisa_escalacao: bool
client = genai.Client()
resposta = client.models.generate_content( model="gemini-2.5-pro-3-1", contents="Nota fiscal veio com valor errado, preciso de ajuda", config={ "response_mime_type": "application/json", "response_schema": TicketTriage } )
ticket = TicketTriage.model_validate_json(resposta.text)
Gemini Pro 3.1 and Flash 3.5 maintain 100% strict adherence on all schemas accepted by the pre-flight validator (Source: Carrick Benchmark, May/2026). Google's differentiator is validation before the call: if the schema is invalid, the API tells you immediately.
Pros: Zero failure on schemas that pass validation. Clear documentation of what is or isn't accepted.
Cons: Less flexible than Anthropic for complex schemas. The Pydantic ecosystem is not as natively integrated as with OpenAI.
Option 4: Instructor — The Unifier
If you don't want to be tied to a single provider, the Instructor library (11k+ stars on GitHub, 3M+ monthly downloads) unifies all three providers with the same Pydantic-based API. Plus: it does automatic retry with validation.
from pydantic import BaseModel, Field
import instructor
from openai import OpenAI
Works the same with Anthropic or Google — just swap the client
client = instructor.from_openai(OpenAI())
class TicketTriage(BaseModel): prioridade: str = Field(description="Alta, Media ou Baixa") departamento: str = Field(description="Suporte, Faturamento, Tecnico") resumo: str = Field(description="Resumo conciso") precisa_escalacao: bool = Field(default=False)
Automatic retry if validation fails
ticket = client.chat.completions.create( model="gpt-5.5", response_model=TicketTriage, messages=[ {"role": "user", "content": "Sistema travou na minha maquina"} ], max_retries=3, # Instructor tries again if the JSON is invalid )
print(f"Prioridade: {ticket.prioridade}")
Instructor supports 15+ providers — OpenAI, Anthropic, Gemini, DeepSeek, Llama via Together, and more. With max_retries, it detects schema failure and re-calls the model automatically. It's the right choice for those who want portability between providers and robustness without writing boilerplate.
If you want to dive deeper into agents with tools, also check out our tutorial From Prompt to Agent — Instructor integrates naturally with agent loops.
Which One to Choose? A Decision Guide
Each provider has a sweet spot. Here's a rule of thumb:
| Scenario | Choice |
|---|---|
| You already use OpenAI and the schema is simple (flat, < 5 fields) | client.beta.chat.completions.parse |
| You need nested schemas or medium hierarchy (2-5 levels) | Anthropic tool use |
| You want zero surprises with simple schemas | Gemini response_schema |
| You switch providers or want portability | Instructor (any) |
| Deeply nested schema (6+ levels) | Use Instructor + Anthropic with fallback to chunking |
And loose JSON mode? Only use it for non-critical output, like brainstorming or drafts. For any flow that feeds another system, structured output is not a luxury — it's a requirement.
What Comes Next
Structured output is the foundation. The next step is orchestrating multiple calls with validation between stages. If you're already building systems with agents, it's worth checking out the RAG from Scratch tutorial — where structured entity extraction is a critical part of the pipeline.
The evolution is clear: in 2025, the question was "how do I make the LLM return valid JSON?". In 2026, the question is "which provider offers the best schema guarantee for my use case?". The answer changes depending on your schema, your budget, and your tolerance for failure — but one thing is certain: hand-written retry loops are a thing of the past.
Related Articles
Also check out: From Prompt to Agent: Build an AI Assistant with Memory and Tools in Python (2026) Also check out: What Are Large Language Models (LLMs) and How Are They Transforming Technology Also check out: Autonomous AI Agents in 2026: how they work, where they are being used, and what to expect
Related Articles
Function Calling in Practice: Python Tutorial for Chatbots with LLMs that Execute Actions in 2026
Learn how to implement function calling in Python with OpenAI, Anthropic Claude, and Google Gemini. Complete tutorial with code to integrate APIs, databases...
7 Free AI Tools for Developers in 2026: Which One Is Actually Worth the Setup?
Detailed analysis of 7 free AI tools for developers in 2026. Compare limits, usability, and pipeline integration to choose the ideal one.