AI Engineer World’s Fair 2026: Talks Worth Watching

I sit in the middle of corn fields and watch the live stream. Thanks @aiDotEngineer and @swyx.

Here are some talks from AI Engineer World’s Fair 2026 that I found particularly interesting, in order of presentation, not quality.

Day 1

Microsoft Foundry: Knowledge Management for AI Agents

Watch the talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=1800s

The Foundry team splits agent knowledge into structured, unstructured, and learned, then demos an agent retrieval stack where the LLM checks whether what it pulled actually answers the query before returning it. Good, concrete material on retrieval.

From 10 Terminals to Managing a Company of Agents

Watch Peter Steinberger’s talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=3492s

Steinberger traces his shift from manually polling ten terminal agents to running one persistent manager agent that delegates to a team and wakes itself on events. Not super practical yet, but a sharp look at where the edge is going.

GLM 4.2 / GLM-Zero 2.0 Keynote

Watch the Jina AI / ZAI keynote

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=4031s

An open-weight GLM release with a selectable thinking budget, low, medium, or high, to trade tokens against capability per task. Mostly benchmarks, but I’m always a sucker for good open source.

Minimax M3: Sparse Attention and Open Research Culture

Watch the talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=4872s

M3 uses multi-head sparse attention. One branch narrows the context and another computes only on the selected blocks to reach roughly 1M-token agentic context, and the design came from an intern. Another good open-source showing.

Browser Agents Demo

Watch Kushan’s early framing and the later demo.

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=6934s

Later demo URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=9726s

Kushan’s early framing: browser agents struggle because the information environment around the model is poor, not because the model is weak, so better environment design beats a smarter model.

The Quality of Vibe-Coded Pull Requests

Watch the Grubtile talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=11710s

Using millions of PRs, they measured AI-authored code and found it comparable to human code on revert rates and review rounds but with different error profiles by tool. I like that they shifted from arguing about AI coding to actually measuring it.

The Napkin Math Origin of Turbo-Puffer

Watch the talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=14656s

I use this story in my course. I love their approach to coming up with TurboPuffer, which started as vectors stored as files in S3, and landed Cursor as the first customer. I love this origin story and can’t get tired of hearing it.

Antigravity 2.0: Agent Teams and the 2026 Primitives

Watch the Google DeepMind talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=17100s

The talk walks the evolution toward “agent teams” and introduces three 2026 primitives: dynamic subagents an orchestrator spins up in parallel, sidecars that act as long-lived listeners reacting to webhooks or cron, and generative UI.

The emphasis on subagents and sidecars matches what customers are asking me to build.

Token Town

Watch the Notion talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=21600s

Notion argues your model provider is structurally your competitor, that auto-upgrading every request is financially ruinous, and that multi-model routing should be an architecture decision from day one. Notion’s talks are always solid and worth the time.

Building Loops for the Real World

Watch the HumanLayer talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=25001s

Kyle argues the “pipe a prompt into a loop and ship 40,000-line PRs” model works for a solo dev but breaks for teams with customers, SLAs, and compliance, since loop-written code is effectively read-only. A useful counterweight on the human and team side of how our practices need to change.

Resonate: Agentic Engineering for Durable Execution

Watch the talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=26100s

After an agent failed going straight from spec to a distributed-systems build, the team reframed to “what does the agent need to design it first?” and gave it a deterministic simulation that injects real failures so it finds the right algorithm before writing production code.

A great example of first-principles thinking and reframing to solve a problem.

Recursive Model Improvement

Watch Lee Robinson’s Cursor talk

URL: https://www.youtube.com/watch?v=htM02KMNZnk&t=30278s

Cursor trains its Composer model with two loops: an outer one that turns user feedback into better eval benchmarks and an inner RL loop that climbs them. They also note that smarter models start hacking the evals by digging through git history for the expected answer.

Cursor always shares good high-level detail.

Day 2

A very high-quality day of talks. It hurt to leave some out.

Field Guide to Fable

Watch Thariq Shihipar’s talk (Anthropic)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=980s

Classic Thariq. He explains agentic coding intuitively, and it’s an easy watch.

In the Land of AI Agents, the Verifiers Are King

Watch Tariq Shaukat’s talk (Sonar)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=2180s

This felt like where the field is going. The code we’re generating isn’t all trustworthy, so the verifier has to become part of the production system rather than an afterthought.

Memory Harnesses for Long-Running Research Agents

Watch Stefania Druga’s talk (Sakana.ai)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=10280s

A useful experiment on recall and memory. The takeaway: memory isn’t just dumping more stuff into context, it’s a write/read policy you can measure.

The Last Thing AI Will Take Away From Software People

Watch the talk

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=11015s

This is the kind of thing I teach in my courses: what is the point of the project? That framing is the real scarce skill.

The Era of (Auto) Research

Watch Elie Bakouch’s talk (Prime Intellect)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=11780s

Anyone who follows Prime Intellect knows they produce great research. Lots of good data here on agents doing auto research.

An AI Agent Became the #1 Contributor in OpenAI’s Hiring Challenge

Watch Zhengyao Jiang’s talk (Weco AI)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=18380s

Good practical advice wrapped in an interesting example. Agents are already strong execution amplifiers when the environment is well-shaped.

Reflective Optimization

Watch Lakshya Agrawal’s talk (GEPA)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=20180s

Good background on GEPA and optimization, which everyone should understand. The important idea is that traces are not just debugging output; they’re raw material for improving prompts, harnesses, skills, and other text artifacts.

Autoresearch for Kernels

Watch Tejas Bhakta’s talk (Morph)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=21680s

Another practical example, this one using kernels. Kernels are a great domain because the verifier is real: correctness and speed. The human gives the high-level idea, the agent searches variants, and a hard verifier decides.

The Log Is the Agent

Watch Isha’s talk (Amnara)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=23248s

I’m a big fan of logs, so this one clicked.

Autoresearch in the Wild / Productizing Loops

Watch Roland’s talk (Introspection)

URL: https://www.youtube.com/watch?v=4sX_He5c4sI&t=23405s

More good stuff on loops. Failures become evals, repeated behavior becomes skills, and taste becomes system behavior over time.

Day 3

I watched the final day of @aiDotEngineer. Here are the talks I found most useful, in order of presentation, not quality.

Tokens Should Have Jobs (Katelyn Lesse and Angela Jiang, Anthropic)

Watch Katelyn Lesse and Angela Jiang’s talk (Anthropic)

URL: https://www.youtube.com/watch?v=I2cbIws9j10&t=7012s

Interesting idea, and it made me think. Instead of treating tokens as a generic pile of compute, assign them jobs: execute, advise, grade, or reflect. The practical question becomes not “did more tokens improve the average score?” but “which token roles give the cheapest reliable answer?”

Loophole (Brendan Rappazzo)

Watch Brendan Rappazzo’s talk

URL: https://www.youtube.com/watch?v=I2cbIws9j10&t=16915s

Nice creative thinking. The idea of using adversarial agents to stress-test policies, constitutions, terms, or agent permissions has value. It is basically fuzzing for rules where you generate edge cases and use those cases to patch the policy or escalate ambiguity to a human.

We Let an AI Agent Execute Bash and Lived to Talk About It (Sarah Sanders, PostHog)

Watch Sarah Sanders’ talk (PostHog)

URL: https://www.youtube.com/watch?v=I2cbIws9j10&t=20106s

Good stuff on agent security learnings and practical guardrails. The big lesson is that prompts are not security. If an agent can execute commands, you need deterministic enforcement.

How We Solved Agent Building (Andrew Qu, Vercel)

Watch Andrew Qu’s talk (Vercel)

URL: https://www.youtube.com/watch?v=I2cbIws9j10&t=23407s

Good practical example of the modern way of building agents: away from chains of narrow agents and toward a strong model with tools, files, a sandbox, and domain context.

Agents Without Code (Philipp Schmid, Google DeepMind)

Watch Philipp Schmid’s talk (Google DeepMind)

URL: https://www.youtube.com/watch?v=I2cbIws9j10&t=24899s

Good on the fundamentals. As models improve, harnesses should often get simpler, not more complicated. Instead of writing custom Python functions for every action (no chains), give the agent general tools, a sandbox, credentials handled outside the model, and behavior defined through files and skills.