Tuesday Evening Edition

35+

Tweets

Topics

16h

Window

OpenAI Ships the Fleet

3 items

OpenAI released GPT-5.4 mini and nano — smaller, cheaper models optimized for coding, computer use, and subagents that run 2x faster than their predecessors.

The mini model is available immediately in ChatGPT, Codex, and the API. Benchmarks show strong performance on SWE-Bench Pro (57.7%), Terminal-Bench 2.0 (75.1%), and GPQA Diamond (93%). The strategic move here is filling out the model family below the flagship: GPT-5.4 already exists as the frontier model, and these distilled variants let developers build agentic workflows at a fraction of the cost. Simon Willison calculated that the nano model could describe every image in his 76,000-photo library for just $52 — making vision tasks that were prohibitively expensive a year ago essentially free.

@OpenAI · 6h, @simonw · 3h — combined 440K+ views

OpenAI simultaneously rolled out "Subagents" for Codex, letting developers spin up parallel specialized AI agents that tackle different parts of a complex workflow concurrently.

The feature keeps the main agent's context window clean by delegating subtasks to child agents. The timing is deliberate — announced alongside the mini/nano models that make running multiple agents economically viable. Dominik Kundel from the OpenAI Devs team shared a practical tip: have Codex analyze its own past conversations to auto-generate rules files, so future runs stay sandboxed without needing full access.

@OpenAIDevs · Mar 16, @dkundel · Mar 16 — 56K+ views on Kundel's tip alone

Ethan Mollick observed that a knowledge-work platform built around GPT-5.4 Pro-level intelligence "would be really useful," noting the gap between Pro and other models on complex intellectual work remains stark.

He specifically wished for a Codex-like platform with shared file spaces and subagents — suggesting the current product packaging still doesn't match the model's capability. Separately, @scaling01 called OpenAI's accompanying price hike "completely unnecessary," reflecting a growing tension in the community between excitement over model quality and frustration over pricing.

@emollick · 6h, 22K views · @scaling01 · 6h, 76K views

The Claude Ecosystem Explodes

5 items

Anthropic engineer Boris Cherny published the internal playbook for Claude Code Skills — revealing how Anthropic's own team builds, structures, and auto-improves the modular skill system that has become Claude Code's most popular extension point.

The post, titled "Lessons from Building Claude Code: How We Use Skills," hit 485K views and 3K likes in six hours, making it one of the most-engaged Anthropic developer posts in months. The key insight: skills are flexible and easy to create, but that flexibility makes it hard to know what works best — so Anthropic codified their internal patterns into a public guide.

@bcherny · 6h, 485K views — the engagement suggests developers are hungry for official best practices

Anthropic shipped "Dispatch" in Claude Cowork — a persistent conversation that runs on your computer, accepts messages from your phone, and returns finished work.

Felix Rieseberg announced it as a research preview: you dispatch a task to Claude running on your desktop, walk away, check in from your phone, and come back to completed output. The "walkie-talkie" metaphor gained instant traction, with Daniel San and Om Patel both testing it within hours of launch and confirming the phone-to-desktop loop works as advertised.

@felixrieseberg · 2h, 43K views · @dani_avila7 · 1h · @om_patel5 · 1h

Ole Lehmann built an auto-improvement system for Claude Skills using Karpathy's autoresearch method — a single meta-skill that runs any other skill, scores the output, identifies failures, makes one small change, and keeps iterating until the score improves.

The approach borrows the hill-climbing loop from Karpathy's research methodology and applies it to prompt engineering. Lehmann claims most Claude skills fail 30% of the time without users noticing, and the auto-improver catches and fixes these silent failures on autopilot.

@itsolelehmann · 4h, 30K views

A tenant fed their entire 47-page lease to Claude after a $400 rent increase — and Claude found a clause showing $6,200 in overcharges spanning 18 months, which the landlord's lawyer confirmed.

The tweet from @Argona0x went mega-viral at 925K views, becoming the week's most-shared "Claude in the wild" story. The attached spreadsheet showed a detailed rent overcharge calculation cross-referenced against LA County code. The upstream enabler is Claude's long-context capability making full-document legal analysis trivially accessible to non-lawyers.

@Argona0x · 4h, 925K views, 5.4K likes — the single most viral Claude use-case story this week

Tom Dörr open-sourced "Stop Slop," a Claude skill specifically designed to strip AI writing patterns from prose — the predictable phrases, structures, and rhythms that mark AI-generated text.

The skill ships with a reference library of patterns to avoid and teaches Claude to recognize and rewrite its own verbal tics. At 137K views and 2.3K likes, it reflects a maturing ecosystem where users are now building tools to improve AI output quality rather than just generating more of it.

@tom_doerr · 13h, 137K views

Infrastructure & Developer Tools

3 items

NVIDIA open-sourced OpenShell at GTC — a sandboxed runtime for AI agents that restricts access to terminals, files, AWS keys, and network resources through declarative policies.

The Apache 2.0 project addresses a real and growing problem: most coding agents today run with full access to your system, including credentials and sensitive files. OpenShell provides a container-like execution environment where agents get only the permissions explicitly granted. The timing is notable — announced the day after Jensen Huang's GTC keynote about AI as civilization's operating system, acknowledging that an operating system needs security boundaries.

@livingdevops · 13h

alphaXiv launched an MCP (Model Context Protocol) server for arXiv, giving research agents fast multi-turn retrieval, keyword search, and embedding search across millions of papers.

The announcement framed it as letting "research agents stand on the shoulders of giants." MCP continues to emerge as the standard interface layer between AI agents and data sources — this is now the same protocol Claude uses for tool access, and seeing it adopted by research infrastructure suggests the ecosystem is consolidating around it.

@askalphaxiv · 3h, 24K views, 489 likes

Jeffrey Emanuel shared his "Agentic Coding Flywheel" — a methodology for orchestrating swarms of AI agents using exhaustive markdown plans, polished beads (reusable components), and a systematic stack.

The guide has become a frequently-referenced playbook in the AI coding community, with Emanuel noting he posts the same link multiple times daily because people keep asking about his methodology. The approach treats AI agents less like tools and more like a managed workforce requiring plans, reviews, and iteration cycles.

@doodlestein · 6h, 7.1K views

Local AI & Hardware

1 item

The Apple M5 Max MacBook Pro with 128GB RAM hit 108 tokens per second running Qwen 3.5 (35B parameters, 3B active) locally — unoptimized, out of the box.

@krunkosaurus's benchmark went viral at 306K views, with the "Local AI is here" declaration resonating across the community. The key detail: Qwen 3.5 uses a mixture-of-experts architecture where only 3B of the 35B parameters are active per token, which is why the M5 Max's unified memory architecture can run it at interactive speeds. This is a qualitatively different moment for local inference — a reasoning-capable model running faster than most people can read.

@krunkosaurus · 19h, 306K views, 4K likes — unified memory is the enabler; GPU-memory bandwidth was always the bottleneck

AI Meets the Physical World

2 items

Dilum Sanjaya built a working smart home dashboard from an idea generated with Nano Banana 2, coding with Gemini 3.1 Pro and rendering 3D assets with Tripo — end to end from concept to interactive prototype.

The video demo shows a polished dashboard with room controls, energy monitoring, security cameras, and climate management. It's a tidy example of the new stack: ideation with one model, implementation with another, and visual assets from a third, all stitched together by a single person.

@DilumSanjaya · 6h, 65K views

Rahi Anil Barve released MANN-PISHACH, an 80-minute experimental film built entirely on a home PC with two actors, iPhone footage, and hand-drawn elements.

The zero-budget production demonstrates what's possible when AI-assisted tools meet passionate indie filmmaking. At 31K views and 729 likes, it resonated with the creative community as proof that the barrier to feature-length filmmaking has collapsed to the cost of a laptop.

@BarveRahi · 7h, 31K views

Deep Thinking & Science

3 items

JJ (Joseph Jacks) argued that AI's real bottleneck is architecture, not compute — using Paramecium Caudatum as the proof.

The single-celled organism, roughly the width of a human hair, has no brain, no neurons, and no central nervous system, yet coordinates ~100,000 cilia to swim, eat, and reproduce using a decentralized microtubule lattice. Jacks's implication: if nature solved complex coordination without centralized processing, our obsession with scaling GPU clusters may be missing a more fundamental architectural insight. The post referenced Stuart Hameroff and Anirban Bandyopadhyay's work on biological computing.

@JosephJacks_ · 9h, 24K views, 559 likes

Michael Strong argued that Einstein's genius was not raw cognitive power but intellectual integrity — a relentless honesty about what he understood and what he did not.

Strong noted that Einstein's definition of "simultaneity," the conceptual move that led to special relativity, is original but not complex. The implication for the AI age: the bottleneck in reasoning is rarely compute or knowledge, but the willingness to sit with uncertainty rather than reaching for a premature answer.

@flowidealism · 6h, 8.4K views

Paige Bailey took a thousand oral histories from the Computer History Museum and made them fully searchable and deeply interconnected at f0lkl0r3.dev.

The project is a love letter to computing's oral tradition — the bizarre anecdotes, the personalities, the decisions that shaped an industry, now navigable as a knowledge graph rather than isolated transcripts. Grace Hopper, Reynold B. Johnson, and James H. Williams are among the searchable figures.

@DynamicWebPaige · 7h, 1.7K views — small audience but high signal for anyone who cares about computing's history

Gems & Signals

4 items

MiroFish simulates 1,000,000 AI agents debating geopolitical futures — you upload a news event, it builds a relationship graph between entities, launches thousands of agents with different belief systems, and lets them influence each other's positions over simulated time.

The demo showed agents clustering into Bull Coalitions, Bear Alliances, Contrarians, and Neutral Pools around prediction market outcomes. It's less a forecasting tool than a way to stress-test how narratives propagate through belief networks.

@de1lymoon · Mar 16, 136K views, 1.2K likes

Freya Holmér, the math visualization creator, said she'd been feeling stressed and made "a little prototype" — a colorful puzzle game that hit 1.2M views and 26K likes in seven hours.

No AI angle here, just pure creative craft from one of the feed's most beloved figures. The engagement is a reminder that this feed — and the community around it — still responds most viscerally to things people make with their own hands.

Pure craft, no AI required — and it outperformed most technical announcements

Kurtis The Quant used Claude Cowork to turn an academic paper on Dynamic Factor Allocation via Momentum-Based Regime Switching into a working ETF dashboard with live regime detection — in a single session.

The before-and-after (dense paper → interactive dashboard) is one of the cleanest demonstrations of what "10x productivity" actually looks like when applied to specialized domain work.

@Quant_Kurtis · 9h, 19K views

The feed is saturated with Claude and OpenAI ecosystem stories running in parallel, with the interesting signal being how naturally the conversation flows between them.

People are comparing, building with both, treating them as complementary rather than competing. The real competition isn't between companies anymore; it's between the pace of tooling and the pace of human adaptation.

A meta-signal about ecosystem maturity and integration

This Tuesday evening, the feed is saturated with Claude and OpenAI ecosystem stories running in parallel. Anthropic shipped Dispatch and the Skills playbook on the same day OpenAI shipped mini/nano and subagents. The community is processing both simultaneously, and the interesting signal is how naturally the conversation flows between them — people comparing, building with both, treating them as complementary rather than competing. The real competition isn't between companies anymore; it's between the pace of tooling and the pace of human adaptation. Meanwhile, local AI reached a qualitative milestone (108 tokens per second on consumer hardware), and a tenant used Claude's long-context capability to audit a 47-page lease and recover $6,200 in overcharges — a reminder that the most impactful uses of AI today are often the unglamorous ones.