I sit in the middle of corn fields and watch the live stream. Thanks @aiDotEngineer and @swyx.
Here are some talks from AI Engineer World’s Fair 2026 that I found particularly interesting, in order of presentation, not quality.

Microsoft Foundry: Knowledge Management for AI Agents
The Foundry team splits agent knowledge into structured, unstructured, and learned, then demos an agent retrieval stack where the LLM checks whether what it pulled actually answers the query before returning it. Good, concrete material on retrieval.
From 10 Terminals to Managing a Company of Agents
Watch Peter Steinberger’s talk
Steinberger traces his shift from manually polling ten terminal agents to running one persistent manager agent that delegates to a team and wakes itself on events. Not super practical yet, but a sharp look at where the edge is going.
GLM 4.2 / GLM-Zero 2.0 Keynote
Watch the Jina AI / ZAI keynote
An open-weight GLM release with a selectable thinking budget, low, medium, or high, to trade tokens against capability per task. Mostly benchmarks, but I’m always a sucker for good open source.
Minimax M3: Sparse Attention and Open Research Culture
M3 uses multi-head sparse attention. One branch narrows the context and another computes only on the selected blocks to reach roughly 1M-token agentic context, and the design came from an intern. Another good open-source showing.
Browser Agents Demo
Watch Kushan’s early framing and the later demo.
Kushan’s early framing: browser agents struggle because the information environment around the model is poor, not because the model is weak, so better environment design beats a smarter model.
A Factory That Taught Itself How to Remember
A 100-person manufacturer with no data-science team ran its go-to-market on a 36-agent system by skipping fine-tuning and instead engineering organized memory that off-the-shelf models read, with single-responsibility agents and a strict “the agent drafts, a human sends” rule.
I wouldn’t build it this way, but the approach and the presentation are worth seeing.
The Quality of Vibe-Coded Pull Requests
Using millions of PRs, they measured AI-authored code and found it comparable to human code on revert rates and review rounds but with different error profiles by tool. I like that they shifted from arguing about AI coding to actually measuring it.
The Napkin Math Origin of Turbo-Puffer
I use this story in my course. I love their approach to coming up with TurboPuffer, which started as vectors stored as files in S3, and landed Cursor as the first customer. I love this origin story and can’t get tired of hearing it.
Antigravity 2.0: Agent Teams and the 2026 Primitives
Watch the Google DeepMind talk
The talk walks the evolution toward “agent teams” and introduces three 2026 primitives: dynamic subagents an orchestrator spins up in parallel, sidecars that act as long-lived listeners reacting to webhooks or cron, and generative UI.
The emphasis on subagents and sidecars matches what customers are asking me to build.
Token Town
Notion argues your model provider is structurally your competitor, that auto-upgrading every request is financially ruinous, and that multi-model routing should be an architecture decision from day one. Notion’s talks are always solid and worth the time.
Building Loops for the Real World
Kyle argues the “pipe a prompt into a loop and ship 40,000-line PRs” model works for a solo dev but breaks for teams with customers, SLAs, and compliance, since loop-written code is effectively read-only. A useful counterweight on the human and team side of how our practices need to change.
Resonate: Agentic Engineering for Durable Execution
After an agent failed going straight from spec to a distributed-systems build, the team reframed to “what does the agent need to design it first?” and gave it a deterministic simulation that injects real failures so it finds the right algorithm before writing production code.
A great example of first-principles thinking and reframing to solve a problem.
Recursive Model Improvement
Watch Lee Robinson’s Cursor talk
Cursor trains its Composer model with two loops: an outer one that turns user feedback into better eval benchmarks and an inner RL loop that climbs them. They also note that smarter models start hacking the evals by digging through git history for the expected answer.
Cursor always shares good high-level detail.