AI Engineer World’s Fair 2026: Talks Worth Watching

AI
AI Engineering
Conferences
Agents
Published

July 1, 2026

I sit in the middle of corn fields and watch the live stream. Thanks @aiDotEngineer and @swyx.

Here are some talks from AI Engineer World’s Fair 2026 that I found particularly interesting, in order of presentation, not quality.

AI Engineer World’s Fair 2026 keynotes

Microsoft Foundry: Knowledge Management for AI Agents

Watch the talk

The Foundry team splits agent knowledge into structured, unstructured, and learned, then demos an agent retrieval stack where the LLM checks whether what it pulled actually answers the query before returning it. Good, concrete material on retrieval.

From 10 Terminals to Managing a Company of Agents

Watch Peter Steinberger’s talk

Steinberger traces his shift from manually polling ten terminal agents to running one persistent manager agent that delegates to a team and wakes itself on events. Not super practical yet, but a sharp look at where the edge is going.

GLM 4.2 / GLM-Zero 2.0 Keynote

Watch the Jina AI / ZAI keynote

An open-weight GLM release with a selectable thinking budget, low, medium, or high, to trade tokens against capability per task. Mostly benchmarks, but I’m always a sucker for good open source.

Minimax M3: Sparse Attention and Open Research Culture

Watch the talk

M3 uses multi-head sparse attention. One branch narrows the context and another computes only on the selected blocks to reach roughly 1M-token agentic context, and the design came from an intern. Another good open-source showing.

Browser Agents Demo

Watch Kushan’s early framing and the later demo.

Kushan’s early framing: browser agents struggle because the information environment around the model is poor, not because the model is weak, so better environment design beats a smarter model.

A Factory That Taught Itself How to Remember

Watch the Machinecraft talk

A 100-person manufacturer with no data-science team ran its go-to-market on a 36-agent system by skipping fine-tuning and instead engineering organized memory that off-the-shelf models read, with single-responsibility agents and a strict “the agent drafts, a human sends” rule.

I wouldn’t build it this way, but the approach and the presentation are worth seeing.

The Quality of Vibe-Coded Pull Requests

Watch the Grubtile talk

Using millions of PRs, they measured AI-authored code and found it comparable to human code on revert rates and review rounds but with different error profiles by tool. I like that they shifted from arguing about AI coding to actually measuring it.

The Napkin Math Origin of Turbo-Puffer

Watch the talk

I use this story in my course. I love their approach to coming up with TurboPuffer, which started as vectors stored as files in S3, and landed Cursor as the first customer. I love this origin story and can’t get tired of hearing it.

Antigravity 2.0: Agent Teams and the 2026 Primitives

Watch the Google DeepMind talk

The talk walks the evolution toward “agent teams” and introduces three 2026 primitives: dynamic subagents an orchestrator spins up in parallel, sidecars that act as long-lived listeners reacting to webhooks or cron, and generative UI.

The emphasis on subagents and sidecars matches what customers are asking me to build.

Token Town

Watch the Notion talk

Notion argues your model provider is structurally your competitor, that auto-upgrading every request is financially ruinous, and that multi-model routing should be an architecture decision from day one. Notion’s talks are always solid and worth the time.

Building Loops for the Real World

Watch the HumanLayer talk

Kyle argues the “pipe a prompt into a loop and ship 40,000-line PRs” model works for a solo dev but breaks for teams with customers, SLAs, and compliance, since loop-written code is effectively read-only. A useful counterweight on the human and team side of how our practices need to change.

Resonate: Agentic Engineering for Durable Execution

Watch the talk

After an agent failed going straight from spec to a distributed-systems build, the team reframed to “what does the agent need to design it first?” and gave it a deterministic simulation that injects real failures so it finds the right algorithm before writing production code.

A great example of first-principles thinking and reframing to solve a problem.

Recursive Model Improvement

Watch Lee Robinson’s Cursor talk

Cursor trains its Composer model with two loops: an outer one that turns user feedback into better eval benchmarks and an inner RL loop that climbs them. They also note that smarter models start hacking the evals by digging through git history for the expected answer.

Cursor always shares good high-level detail.