Rajiv Shah

AI Engineer

Bio Blog Talks Teaching Videos News Publications Experience Contact

Bio Blog Talks Teaching News Experience

📰 DailyMe

My personalized AI news feed, curated from newsletters and deduplicated automatically.

Subscribe via RSS

Quick links: All stories · Starred · Social Top Stories

Lovable shifts from app builder to general agent

Lovable is pivoting away from an app-making focus toward a general-purpose agent.

ben's bites•6h ago

vendor

99% SOTA memory system for agents

A new memory system for agents reportedly reaches roughly 99% state-of-the-art performance.

ben's bites•6h ago

research

The Minimalist Entrepreneur becomes agent skills

Sahil has converted his book into a set of agent skills aimed at founders.

ben's bites•6h ago

launch

Podcast: Engineering practices that make coding agents work

Simon Willison discusses engineering practices that help coding agents succeed.

ben's bites•6h ago

podcast

The End of Coding: Karpathy on agents and autoresearch

Andrej Karpathy explores agents, autoresearch, and the emerging loopy era of AI in a must-read/listen piece.

ben's bites•6h ago

opinion

How to use Claude Code to design

A Tailwind founder shares a walkthrough on using Claude Code for design workflows.

ben's bites•6h ago

tutorial

Ghost Pepper brings local hold-to-talk STT to macOS

Ghost Pepper offers a fully local, hold-to-talk speech-to-text experience on macOS.

ben's bites•6h ago

launch

Deploy a team of OpenClaw agents securely

A resource explains how to deploy multiple OpenClaw agents with secure controls.

ben's bites•6h ago

tutorial

Agents should interview you

The author argues that having an agent interview you captures preferences and helps overcome blank‑page paralysis, sharing how it shaped his course planning.

ben's bites•6h ago

opinion

Claude Code adds recurring tasks and computer-use automation

Claude Code can schedule recurring cloud tasks and, when connectors are missing, drive apps directly on your computer; Cowork now supports projects.

ben's bites•6h ago

launch

Factory Missions launches long-running agents for end-to-end apps

Factory Missions introduces long-running agents that plan and execute large software projects like full app builds.

ben's bites•6h ago

launch

ChatGPT adds file library as OpenAI plans a superapp

ChatGPT now stores uploaded files in a library for easier reuse, while OpenAI is moving toward a simplified superapp experience.

ben's bites•6h ago

launch

Cursor debuts Composer 2 model and Glass interface

Cursor released Composer 2, revealed to be tuned from Kimi 2.5, and launched the Glass UI; its self-benchmark comparisons sparked criticism.

ben's bites•6h ago

launchbenchmark

SpaceX, Tesla and xAI open TERAFAB chip facility

The companies launched TERAFAB, described as the largest chip manufacturing facility with 1TW/year capacity.

ben's bites•6h ago

launch

Sequoia's Shaun Maguire argues xAI will win

A Sequoia partner claims the market is underestimating xAI and outlines why it will dominate AI.

ben's bites•6h ago

opinion

Codebase to Course makes learning codebases visual

Codebase to Course is a skill that converts codebases into more visual, interactive learning experiences.

ben's bites•6h ago

launch

Luma Labs releases Uni-1 image generation model

Uni-1 adds a canvas workflow and multiple outputs per prompt, though generating many outputs can be slow.

ben's bites•6h ago

launch

Guide claims to improve GPT 5.4 frontend design in Codex

A guide suggests ways to boost GPT 5.4's frontend design quality and adds a frontend skill in Codex.

ben's bites•6h ago

tutorial

Cord lets agents build task trees on the fly

Cord is highlighted as a flexible orchestration tool that allows models to split work into parallel tracks and share context without hardcoded plans. It aims to address rigidity in early orchestration frameworks.

Ben Lorica•7h ago

vendor

Emdash runs multiple coding agents in parallel workspaces

Emdash provides a workspace-oriented orchestration approach that lets developers run multiple coding agents concurrently in isolated environments. The goal is to reduce the friction of juggling terminals and serial runs.

Ben Lorica•7h ago

vendor

Operational skill stores shift agent memory toward procedures

The piece notes a move from chat-history memory toward procedural skill stores and context files that save successful workflows as reusable instructions. This approach aims to improve reliability while reducing compute costs.

Ben Lorica•7h ago

opinion

AGENTBENCH study finds AGENTS.md context files hurt results

A study benchmarking coding agents on standard tests and the new AGENTBENCH shows auto-generated context files lowered success rates and increased inference costs, with only modest gains from developer-written files. The findings suggest guidance files are not a guaranteed improvement.

Ben Lorica•7h ago

benchmark

Leo Meyerovich: teams get what they measure in agent evals

The newsletter cites Meyerovich’s view that teams should keep agent components only if they improve measured outcomes such as task success, speed, safety, or cost. It emphasizes defining clear evals rather than relying on intuition.

Ben Lorica•7h ago

opinion

Dean Wampler outlines the open-source PARK stack for agent experiments

Wampler’s article describes the PARK stack—built on PyTorch, AI models and agents, Ray, and Kubernetes—as a foundation for running computationally intensive agent experiments at scale. The focus is on enabling rigorous evaluation for production readiness.

Ben Lorica•7h ago

vendor

Why smarter agent architecture does not always improve results

The essay argues that building reliable AI agents requires rigorous engineering and evaluation, not just layering on more architectural components. It cautions that complexity can add cost and coordination overhead without improving real-world performance.

Ben Lorica•7h ago

long_formopinion

Cursor admits Kimi K2.5 is best open-source model

A perplexity-based evaluation puts Kimi K2.5 on top while commenters debate methodology and training claims.

AINews•13h ago

benchmark

Honest take on running nine RTX 3090 GPUs for AI

The post outlines PCIe bottlenecks, stability issues, and power constraints, recommending alternatives like Proxmox or PCIe switches.

AINews•13h ago

opinion

Do buyers regret getting an NVIDIA 5090?

Redditors debate whether to buy now or wait, citing price increases, rentals, and performance for gaming and local AI.

AINews•13h ago

opinion

Dreamer joins Meta Superintelligence Labs

Meta execuhired the Dreamer team into MSL shortly after the podcast, giving the consumer agent startup a major distribution partner.

AINews•13h ago

vendor

Qwen3.5-9B Claude-4.6 GGUF conversion fixes released

The post repairs broken attention/expert layers across quantizations and shares LM Studio settings and merge steps.

AINews•13h ago

tutorial

The 5 levels of Claude Code

A guide lays out progression from raw prompting to multi-agent orchestration, noting when users hit ceilings at each level.

AINews•13h ago

tutorial

7MB binary-weight LLM runs in the browser without FPU

A 57M-parameter model with 99.9% binary weights runs in WASM at ~12 tok/s and works offline in-browser.

AINews•13h ago

research

Anthropic pushed computer use onto the desktop

Claude Cowork/Code adds macOS research preview control of mouse, keyboard, and screen, expanding agents beyond APIs and browsers.

AINews•13h ago

launch

Agent stack converging on long-running, tool-rich workflows

Tweets highlight momentum for Hermes Agent, T3 Code, Command Center, and Parchi as evidence of richer, parallel agent harnesses.

AINews•13h ago

opinion

Operational reality is now the bottleneck, not model IQ

Practitioners report over-agentic behavior and fragility in top models, urging tighter loops with traces, evals, and production feedback.

AINews•13h ago

opinion

Hyperagents / DGM-H advances self-improving agents

The work extends Darwin Gödel Machine ideas so agents can improve the improvement procedure itself, with cross-domain transfer claims.

AINews•13h ago

research

RLLM unifies RL post-training with LM-as-RM

RLLM trains a generative reward model on-policy to cover easy, hard, and non-verifiable tasks under one post-training approach.

AINews•13h ago

research

WebArena-Infinity scales browser benchmark generation

The project claims <10 hours and <$100 per environment while yielding harder browser tasks where open models score below 50%.

AINews•13h ago

benchmark

Turing Post catalogues 16 RL variants

A high-engagement overview lists RLHF, RLAIF, RLVR, process rewards, self-feedback, and critique-based methods as a taxonomy.

AINews•13h ago

tutorial

LeWorldModel shows stable JEPA world-model training

The model reports stable end-to-end JEPA training from pixels with 15M params and sub-second planning without heavy tricks.

AINews•13h ago

research

Mechanistic interpretability moves toward reverse engineering

A thread on Anthropic’s biology-of-LLM work highlights circuit-level mapping while noting models may not verbalize their own reasoning.

AINews•13h ago

research

Optimizer scaling theory for LLM hyperparameters

Antonio Orvieto argues adaptive-optimizer theory can explain scaling laws and reduce brute-force hyperparameter sweeps.

AINews•13h ago

research

LlamaParse + Gemini 3.1 Pro and LiteParse improve parsing

Google Devs and LlamaIndex show structured financial PDF extraction gains and introduce LiteParse for fast, low-cost parsing.

AINews•13h ago

launch

Cursor launches Instant Grep for repo-scale search

Instant Grep offers regex search over millions of files in milliseconds, directly improving agentic coding workflows.

AINews•13h ago

launch

Late-interaction multi-vector retrieval gains momentum

Weaviate/LightOn discussions argue late interaction is now practical and cheaper than cross-encoders for code-heavy retrieval.

AINews•13h ago

podcast

Sakana Chat launches with Namazu alpha models

Sakana released a Japanese consumer chat product backed by Namazu alpha models tuned for local context and reduced bias.

AINews•13h ago

launch

MiniMax introduces flat-rate Token Plan for multimodal APIs

The subscription bundles text, speech, music, video, and image APIs under a single predictable price.

AINews•13h ago

launch

Luma Uni-1 blends reasoning with image generation

Uni-1 is pitched as a model that thinks and generates pixels simultaneously for generative media workflows.

AINews•13h ago

launch

NVIDIA Kimodo ships promptable motion/timeline model

Kimodo is trained on 700 hours of mocap and supports both human and robot skeletons, with availability on Hugging Face.

AINews•13h ago

launch

Hugging Face Kernels 0.12.3 adds Flash-Attention 4

The release adds Flash-Attention 4 support via cutlass.cute kernels for faster attention workloads.

AINews•13h ago

launch

TRL v1.0.0 claims 44× VRAM savings for long sequences

The update highlights major memory savings and points to AsyncGRPO as the next optimization.

AINews•13h ago

launch

AI2 MolmoPoint GUI targets VLM-based UI automation

MolmoPoint uses grounding tokens instead of coordinate regression and reports 61.1 on ScreenSpotPro.

AINews•13h ago

launch

Sakana’s narrative intelligence OSINT workflow for 1.1M posts

The applied AI workflow combines LLM ensembles, novelty search, hypothesis generation, and human verification over massive social data.

AINews•13h ago

research

Current state of the Chinese LLMs scene

A recap maps the landscape across ByteDance, Alibaba, Tencent, Baidu, and other labs with rapid open-weight activity.

AINews•13h ago

opinion

OpenRouter usage list shows Chinese model dominance

Token-usage rankings highlight Chinese models at the top and note ByteDance still lacks open-weight releases.

AINews•13h ago

benchmark

Alibaba commits to open-sourcing new Qwen and Wan models

The company says it will keep releasing a full series of open models across sizes, fueling community anticipation.

AINews•13h ago

launch

Showing 56 of 56 stories from the last 3 days