Human brain symbolizing natural language processing and LLM reasoning

Extract Structured Data from LLMs: Python Tutorial with Pydantic and Instructor (GPT, Claude, Gemini)

NeuralPulse|31 de maio de 2026|12 min read|Ler em Português

If you've used an LLM in production, you know the drama: the model outputs a beautiful JSON for 3 days, and on the fourth day, it decides to invent a new key, skip a required field, or — worst of all — return plain text inside a block that should be JSON.

This problem has a name: schema enforcement. And until 2025, every team solved it with a hack — regex, retry loops, post-processing validators. In 2026, the story changed.

"Structured outputs have gone from a hack (regex parsing, retry loops) to a first-class feature across all major LLM providers." — TokenMix.ai, analysis of 2 million API calls

OpenAI, Anthropic, and Google now deliver schema guarantees at the API level. It's no longer "hoping it comes out right" — it's a contract. Each does it their own way, of course. This tutorial shows the 4 paths to extracting structured data in Python, from the most specific to the most universal, with code you can copy and adapt today.

The (Hidden) Cost of Loose JSON

Before diving into the code, it's worth understanding the scale of the problem. An analysis of 2 million API calls published by TokenMix.ai revealed that JSON mode without schema enforcement fails in 8 to 15% of requests. The reasons vary: missing field, wrong type, extra key, or simply invalid JSON.

Each such failure means a new retry call. And each retry doubles latency and burns tokens.

Output Mode	Failure Rate	Extra Cost per Failure
Free JSON mode	8–15%	Retry = 2× latency + discarded tokens
Structured output (OpenAI strict)	< 0.1%	30–300 tokens overhead per call
Forced tool use (Anthropic)	~0% on Carrick Benchmark (7/8 schemas)	Schema overhead in tool definition
Response schema (Gemini)	0% on accepted schemas	Pre-flight validation rejects invalid schemas

The conclusion is straightforward: schema enforcement pays for itself — 30 to 300 tokens of overhead per call, which is nothing compared to a 10% retry rate. As TokenMix.ai summarized:

"Schema enforcement is solved — pick OpenAI Structured Outputs, Anthropic tool use, or Gemini schema. Stop building retry logic for malformed JSON."

Let's get to the implementation.

Option 1: OpenAI Structured Outputs

OpenAI was the first to offer formal schema guarantees with strict: true in the response_format mode. In 2026, the recommended path is the SDK's parse() method — which already integrates Pydantic natively.

from pydantic import BaseModel, Field
from openai import OpenAI

Define the schema with Pydantic — strong typing, no surprises

class TicketTriage(BaseModel): prioridade: str = Field(description="Alta, Media ou Baixa") departamento: str = Field(description="Suporte, Faturamento, Tecnico") resumo: str = Field(description="Resumo em 1 frase") precisa_escalacao: bool = Field(description="Precisa de humano?")

client = OpenAI()

The parse() method guarantees schema compliance

resposta = client.beta.chat.completions.parse( model="gpt-5.5", messages=[ {"role": "system", "content": "Classifique o ticket do cliente."}, {"role": "user", "content": "Meu cartao foi cobrado duas vezes esse mes"} ], response_format=TicketTriage, )

ticket: TicketTriage = resposta.choices[0].message.parsed print(f"Departamento: {ticket.departamento}, Prioridade: {ticket.prioridade}")

Pros: 99.9% compliance proven in tests with 500k calls (Source: TokenMix.ai). Native integration with Pydantic.

Cons: OpenAI's strict mode rejects 6 out of 8 schemas pre-flight because it requires additionalProperties: false, all fields as required, and does not accept oneOf or type: array in certain cases (Source: Carrick Benchmark, May/2026). In other words: complex schemas can be blocked before they even reach the model.

Option 2: Anthropic Forced Tool Use

Anthropic treats structured output as a "forced tool call" — you define a tool with input_schema and force the model to use it with tool_choice: {type: "tool", name: "..."}.

from pydantic import BaseModel
from anthropic import Anthropic

class TicketTriage(BaseModel): prioridade: str departamento: str resumo: str precisa_escalacao: bool

client = Anthropic()

resposta = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, tools=[{ "name": "extrair_ticket", "description": "Classifica ticket de suporte", "input_schema": TicketTriage.model_json_schema() }], tool_choice={"type": "tool", "name": "extrair_ticket"}, messages=[ {"role": "user", "content": "Assinatura renovou mas nao consigo acessar"} ] )

Response comes as a tool call — guaranteed structure

tool_call = resposta.content[0] ticket = TicketTriage.model_validate(tool_call.input)

Anthropic performs very well on the Carrick Benchmark (May/2026): Claude Sonnet 4.6 produces 100% compliant output for 7 out of 8 tested schemas. The Achilles' heel? Schemas with deep nesting (7 levels), where compliance drops to 0%.

Pros: Flexible approach, works with any JSON schema, no draconian schema restrictions. Ideal for hierarchical data up to 4-5 levels.

Cons: The tool call round trip adds an extraction step. And if your schema is very nested, prepare for failures.

ElevenLabs

Transforme texto em voz com IA realista. Perfeito para narracoes, podcasts e audiolivros.

Testar gratuito

Option 3: Gemini response_schema

Google entered the fray with the response_schema parameter in generation_config, combined with response_mime_type: "application/json".

from pydantic import BaseModel
from google import genai

class TicketTriage(BaseModel): prioridade: str departamento: str resumo: str precisa_escalacao: bool

client = genai.Client()

resposta = client.models.generate_content( model="gemini-2.5-pro-3-1", contents="Nota fiscal veio com valor errado, preciso de ajuda", config={ "response_mime_type": "application/json", "response_schema": TicketTriage } )

ticket = TicketTriage.model_validate_json(resposta.text)

Gemini Pro 3.1 and Flash 3.5 maintain 100% strict adherence on all schemas accepted by the pre-flight validator (Source: Carrick Benchmark, May/2026). Google's differentiator is validation before the call: if the schema is invalid, the API tells you immediately.

Pros: Zero failure on schemas that pass validation. Clear documentation of what is or isn't accepted.

Cons: Less flexible than Anthropic for complex schemas. The Pydantic ecosystem is not as natively integrated as with OpenAI.

Option 4: Instructor — The Unifier

If you don't want to be tied to a single provider, the Instructor library (11k+ stars on GitHub, 3M+ monthly downloads) unifies all three providers with the same Pydantic-based API. Plus: it does automatic retry with validation.

from pydantic import BaseModel, Field
import instructor
from openai import OpenAI

Works the same with Anthropic or Google — just swap the client

client = instructor.from_openai(OpenAI())

class TicketTriage(BaseModel): prioridade: str = Field(description="Alta, Media ou Baixa") departamento: str = Field(description="Suporte, Faturamento, Tecnico") resumo: str = Field(description="Resumo conciso") precisa_escalacao: bool = Field(default=False)

Automatic retry if validation fails

ticket = client.chat.completions.create( model="gpt-5.5", response_model=TicketTriage, messages=[ {"role": "user", "content": "Sistema travou na minha maquina"} ], max_retries=3, # Instructor tries again if the JSON is invalid )

print(f"Prioridade: {ticket.prioridade}")

Instructor supports 15+ providers — OpenAI, Anthropic, Gemini, DeepSeek, Llama via Together, and more. With max_retries, it detects schema failure and re-calls the model automatically. It's the right choice for those who want portability between providers and robustness without writing boilerplate.

If you want to dive deeper into agents with tools, also check out our tutorial From Prompt to Agent — Instructor integrates naturally with agent loops.

Which One to Choose? A Decision Guide

Each provider has a sweet spot. Here's a rule of thumb:

Scenario	Choice
You already use OpenAI and the schema is simple (flat, < 5 fields)	`client.beta.chat.completions.parse`
You need nested schemas or medium hierarchy (2-5 levels)	Anthropic tool use
You want zero surprises with simple schemas	Gemini response_schema
You switch providers or want portability	Instructor (any)
Deeply nested schema (6+ levels)	Use Instructor + Anthropic with fallback to chunking

And loose JSON mode? Only use it for non-critical output, like brainstorming or drafts. For any flow that feeds another system, structured output is not a luxury — it's a requirement.

What Comes Next

Structured output is the foundation. The next step is orchestrating multiple calls with validation between stages. If you're already building systems with agents, it's worth checking out the RAG from Scratch tutorial — where structured entity extraction is a critical part of the pipeline.

The evolution is clear: in 2025, the question was "how do I make the LLM return valid JSON?". In 2026, the question is "which provider offers the best schema guarantee for my use case?". The answer changes depending on your schema, your budget, and your tolerance for failure — but one thing is certain: hand-written retry loops are a thing of the past.

Also check out: From Prompt to Agent: Build an AI Assistant with Memory and Tools in Python (2026) Also check out: What Are Large Language Models (LLMs) and How Are They Transforming Technology Also check out: Autonomous AI Agents in 2026: how they work, where they are being used, and what to expect

#structured-output#function-calling#pydantic#instructor#openai#gemini

Illustration of Python code running in a terminal, with LLM icons in the background

llms-chatbots|10 min

Function Calling in Practice: Python Tutorial for Chatbots with LLMs that Execute Actions in 2026

Learn how to implement function calling in Python with OpenAI, Anthropic Claude, and Google Gemini. Complete tutorial with code to integrate APIs, databases...

9 de junho de 2026Read more

Developer looking at multiple monitors with code and AI interfaces

free-ai-tools|10 min

7 Free AI Tools for Developers in 2026: Which One Is Actually Worth the Setup?

Detailed analysis of 7 free AI tools for developers in 2026. Compare limits, usability, and pipeline integration to choose the ideal one.

2 de junho de 2026Read more