Blue-toned artificial neural network visualization representing the JEPA world model architecture — the new machine learning paradigm that understands the physical world beyond Transformers
machine-learning

World Models 2026: Goodbye, Next Token? — The ML That Actually Understands the Physical World

NeuralPulse|19 de maio de 2026|11 min read|Ler em Português

Ask GPT-5 what happens if you drop a pen in the air. It will answer something about gravity. Ask how long it takes to fall from a height of one meter: the math comes out right. Now ask if a bowling ball the size of an orange can pass through a hole the size of an apple. Trillion-parameter models, trained on text from billions of pages, get it wrong. Not because they don't know the answer — because they don't have a model of the real world.

This paradox is the engine behind the biggest shift in machine learning since the invention of Transformers in 2017. While the entire industry chased scaling language models — bigger, more data, more GPUs — a group of researchers was asking the uncomfortable question: what if predicting the next token simply isn't enough?

In 2026, the answer arrived with force. Yann LeCun, the Turing Award winner who spent 12 years at Meta, left it all, founded AMI Labs, and raised $1.03 billion in a single seed round — the largest in European history. Fei-Fei Li, the godmother of computer vision, raised $1.23 billion with World Labs. NVIDIA, without fanfare, released SANA-WM as open source. And MIT Technology Review placed World Models on its list of the 10 things that matter most in AI right now (MIT Technology Review, April 2026).

In our view, this is not a passing fad. It is the beginning of the end of the "next token prediction" era as the dominant paradigm.

The Hole at the Heart of Transformers

LLMs are machines for completing textual patterns. Impressive, yes. Useful, yes. But fundamentally blind to the physical world. A Transformer does not know that solid objects do not pass through each other, that gravity pulls things down, that cause and effect have direction. It knows that, in 57% of training texts, when someone writes "drop a pen," the next word has a high probability of being "falls."

It's statistics, not understanding.

World Models are the attempt to fix this. Instead of learning only the statistical distribution of tokens, they learn representations of the world — how objects behave, how actions produce consequences, how physical space works. The difference is not incremental: it is philosophical.

While an LLM answers "the sky is blue" because it saw that phrase 14 million times, a world model can literally simulate the sky, sunlight, and deduce that blue comes from Rayleigh scattering. One understands the surface of text. The other understands the data-generating process.

"World models — these systems that learn from watching the world and the actions taken in it — are a fundamentally new kind of foundation model. They can compute what was previously uncomputable." — Packy McCormick, editor of Not Boring (Newsletter, Mar 19, 2026)

$2.3 Billion in 90 Days: The Money Arrived

The market doesn't usually bet $2.3 billion on ML philosophy. But that's exactly what happened in the first quarter of 2026.

AMI Labs, founded by Yann LeCun, raised $1.03 billion in a seed round at a pre-money valuation of $3.5 billion (TechCrunch, Mar 9, 2026; PitchBook, Mar 2026). It's the largest seed in European history — and the money came from funds that typically invest in biotech and semiconductors, not ML research.

Across the Atlantic, Fei-Fei Li's World Labs accumulated $1.23 billion since 2024, with $1 billion alone in February 2026 (World Labs blog, Feb 2026; The AI Innovator, Mar 2026). The stated goal: build intelligent spatial models that understand 3D geometry, physics, and object interaction.

AMI Labs CEO Alexandre LeBrun was candid about the moment. In an interview with TechCrunch, he said:

"My prediction is that 'world models' will be the next buzzword. In six months, every company will call itself a world model to raise funding." — Alexandre LeBrun, CEO of AMI Labs (TechCrunch, Mar 9, 2026)

In other words, even the hype has been predicted. But concrete results are arriving before the buzz.

LeWorldModel: 15 Million Parameters, One GPU, No Excuses

In March 2026, AMI Labs published LeWorldModel (LeWM) on arXiv (2603.19312). The numbers are almost an insult to the LLM industry:

FeatureLeWorldModel (AMI Labs)GPT-4 (reference)
Parameters15 million~1.8 trillion
GPUs to train1~25,000
Training timeHoursMonths
Planning48x faster than DinoWMNo planning
Detects physics violations?YesNo

Read that again: a model with 15 million parameters, trained in hours on a single GPU, can detect physics violations — something GPT-4, with 1.8 trillion parameters, cannot reliably do.

The secret is the JEPA (Joint Embedding Predictive Architecture) architecture, which LeCun has been advocating since 2022. Unlike Transformers, which must predict each token individually, JEPA learns to predict abstract representations of the world. It doesn't try to guess the exact pixel of the next video frame: it tries to predict the essence of what will happen. It's like the difference between drawing every strand of a person's hair and making a sketch that captures the silhouette and movement.

The result: the model is small, fast, and — most importantly — generalizes to situations it has never seen. Yann LeCun celebrated on Facebook:

"JEPA are finally easy to train end-to-end without any tricks! Excited to introduce LeWorldModel: a stable, end-to-end JEPA that learns world models directly from pixels, no heuristics. 15M params, 1 GPU, and full planning <1 second." — Yann LeCun, Turing Award Winner, AMI Labs (Official Facebook, Mar 2026)

The phrase "no heuristics" is the most important part. For years, world model architectures required engineering tricks, specific regularizations, and manual tweaks to work. LeWM shows that the JEPA architecture has matured to the point of training end-to-end, without gimmicks. This is a milestone.

SANA-WM: NVIDIA Opens Up

If AMI Labs proved that world models work with 15 million parameters, NVIDIA went in the opposite direction and showed what happens when you scale the idea.

In May 2026, NVIDIA released SANA-WM, an open-source world model with 2.6 billion parameters (arXiv 2605.15178; MarkTechPost, May 16, 2026). The capabilities are impressive:

  • Generates 60-second videos at 720p resolution
  • 6-DoF camera control (six degrees of freedom) — you decide where the camera looks
  • Runs on a single GPU (an RTX 5090 suffices)
  • 36x more throughput than comparable open baselines
  • Code and weights available on NVlabs GitHub

SANA-WM is the first open-source world model that any researcher can download, run, and modify. And this radically changes the research landscape. Until now, world models were the territory of labs with nine-digit budgets. With SANA-WM, any research group with a good GPU can start experimenting.

The Physics of What's to Come

The transition from Transformers to World Models won't happen overnight. LLMs remain extraordinary for language, code, and encyclopedic knowledge. But the paradigm is shifting — and the evidence is clear:

  1. Hybrid architectures — JEPA, Mamba, Titans-style memory, and hybrid MoE are delivering 4 to 17x more efficiency than pure scaling (NextBigFuture, Apr 2026; MIT Technology Review) — the same trend as SLMs dominating in 2026. The future isn't bigger: it's more efficient.
  1. The physical gap — applications involving the real world — robotics, autonomous cars, manufacturing, simulation — cannot advance without models that understand physics. That's where world models come in.
  1. The investment race — $2.3 billion in 90 days is not a sign of random hype. It's a sign that the planet's biggest investors believe this will generate returns.
  1. Open source — with NVIDIA's SANA-WM and LeWorldModel demonstrating viability, the barrier to entry has dropped. The world model research ecosystem will explode in the coming months.
  1. The LeCun factor — when one of the fathers of modern deep learning leaves a comfortable position at Meta, raises over a billion dollars, and publishes results that challenge the dominant paradigm, it's prudent to pay attention.

LLMs vs World Models: The Turning Point Table

DimensionLLMs (GPT, Claude, Gemini)World Models (JEPA, SANA-WM, LeWM)
GoalPredict next tokenUnderstand the world's generative process
InputTextPixels, video, actions, physical world
Physical understandingStatistical (texts about physics)Structural (causal simulation)
PlanningLimited (textual reasoning)48x faster (direct planning)
GeneralizationWithin textual distributionOutside visual/physical distribution
Efficiency (params)TrillionsMillions to billions
MaturityCommercial productAdvanced research (2026: inflection point)

The table above is not a verdict — it's a moment in time. In 2024, world models were an academic curiosity. In 2025, they became a promise. In 2026, they are a new category of foundation model.

"AI systems have already gained impressive mastery over the digital world, but the physical world is still humanity's domain. To get there, many researchers believe, you need something called a world model." — Grace Huckins, AI Reporter, MIT Technology Review (Apr 2026)

What to Expect

World models won't kill LLMs. But they will steal center stage. In two years, the dominant ML architecture will likely not be purely autoregressive — it will be hybrid, combining textual representations with models of the physical world. The companies betting on this now (AMI Labs, World Labs, NVIDIA) may be ahead of a new technological cycle.

MIT was right to put world models on its list of what matters. The question now is not "if" they will transform the paradigm — it's "when" and "who" will lead.

And frankly, a 15-million-parameter model that understands physics better than a 1.8-trillion-parameter one deserves to be taken seriously.

Frequently Asked Questions about World Models

What exactly are world models?

World models are machine learning architectures that learn representations of the physical world — how objects behave, how actions generate consequences, and how space works — rather than just predicting the next token in a text sequence.

Will world models replace LLMs?

No. The trend is toward hybrid architectures that combine the textual understanding of LLMs with the physical understanding of world models. LLMs remain the best choice for language, code, and encyclopedic knowledge.

What is the difference between JEPA and Transformers?

While Transformers predict each token individually (requiring trillions of parameters), JEPA (Joint Embedding Predictive Architecture) learns to predict abstract representations — capturing the essence of what will happen without needing to guess every pixel or word.

Why is LeWorldModel so efficient?

Because LeWorldModel uses the JEPA architecture with only 15 million parameters, which doesn't need to predict every detail — only the abstract representation. This allows training in hours on a single GPU, unlike LLMs that require thousands of GPUs and months of training.

Related Articles

Check out also: The Great Transformer Reform: May 2026 Is Rewriting the Rules of ML Check out also: Machine Learning Explained: Complete Guide for Beginners in 2026 Check out also: The End of ML Pilots: How 'AI Factories' Are Industrializing Machine Learning

#world-models#jepa#yann-lecun#nvidia-sana-wm#machine-learning#transformers
Compartilhar: