Thursday Evening Edition

30+

Tweets

Topics

16h

Window

The Memorization Reckoning

1 item

Frontier LLMs score 85–95% on standard coding benchmarks — then collapse to 0–11% when given equivalent problems in programming languages they couldn't have memorized.

Lossfunk's new EsoLang-Bench presents the week's most uncomfortable finding: the problems are structurally identical to standard benchmarks, just written in esoteric languages absent from training data. François Chollet amplified it with the framing the AI safety community has been waiting for — that current models remain "completely reliant on content-level memorization, as opposed to higher-level generalizable knowledge." In a follow-up, Chollet pushed back on the "I couldn't do it either" defense, arguing that a competent software engineer should be able to solve familiar problems in an unfamiliar language by consulting its docs, even if slowly. The benchmark doesn't prove models can't reason; it proves they haven't yet learned to reason independently of the surface patterns they trained on. That distinction matters enormously for anyone building agents expected to handle novel situations.

@lossfunk · 11h · @fchollet · 8h — 190K views on the original; Chollet's amplification gave it escape velocity

The $2.5 Billion GPU Pipeline to China

1 item

Super Micro co-founder Yih-Shyan "Wally" Liaw was arrested today, charged with smuggling $2.5 billion in NVIDIA servers to China through Southeast Asian shell companies.

The DOJ's National Security Division announced charges against three individuals for conspiring to unlawfully divert advanced computing hardware — the exact GPUs that U.S. export controls were designed to keep out of Chinese AI labs. Liaw personally holds $464 million in SMCI stock; the shares dropped 11.8% in after-hours trading on the news. The arrest is the largest enforcement action yet under the Biden-era chip export controls, and it lands at a moment when NVIDIA is simultaneously celebrating record demand at GTC. The message from the DOJ is unambiguous: the export control regime is not advisory. But the scale of the alleged smuggling — $2.5 billion through shell companies — suggests the controls have been more porous than the government acknowledged.

@ns123abc · 58m · @DOJNat · 2h — breaking as the briefing was compiled

Claude Reaches the Phone

2 items

Anthropic released Claude Code Channels — control your Claude Code session through Telegram and Discord, effectively texting Claude from your phone while it runs on your desktop.

Thariq, the Anthropic engineer behind it, demonstrated the workflow: launch Claude Code with `claude --channels discord telegram`, then message tasks from your phone while Claude executes locally with full file, browser, and tool access. The 84K views and 734 likes in two hours reflect pent-up demand for exactly this interface. Ejaaz compiled the full Anthropic shipping sprint: texting Claude Code, 10,000+ skills with MCP, persistent memory, security guardrails, channels, and autonomous mode — all in four weeks. The upstream pattern is clear: Anthropic is building Claude into an always-on, multi-interface agent rather than a chat window.

@trq212 · 2h, 84K views · @cryptopunk7213 · 2h

Garry Tan shipped gstack v0.8.0 after YC Spring 2026 founders asked for Codex code review support — and he went home from the kickoff social and built it the same night.

The update adds /codex with three modes (independent diff review, adversarial challenge mode, and conversation-with-continuity), plus safety guardrails (/careful, /freeze, /guard that warn before destructive commands) and skill usage analytics. Tan's offhand comparison is worth noting: "Codex is the amazing genius friend, smarter than Claude but not the best conversationalist." It's a frank competitive assessment from someone actively building with both — and it maps to what many power users report. The implication for Anthropic is that Claude's advantage is increasingly in personality and workflow integration rather than raw benchmark performance.

@garrytan · 19h, 26K views — the YC president building Claude skills and Codex integrations simultaneously tells you where founder-class users actually live

AI Hits the Senate Floor

2 items

Sen. Bernie Sanders posted a video of himself interviewing Anthropic's Claude about AI privacy — and 1.4 million people watched it within hours.

Sanders framed the conversation around AI collecting personal data and how that information can violate privacy rights, calling Claude's own answers about AI dangers "shocking." The political significance is that a sitting U.S. senator is now using an AI agent as a rhetorical device in a policy argument — not quoting researchers or regulators, but directly conversing with the system he's critiquing and letting its answers make his case. Whether the conversation was cherry-picked or representative matters less than the signal: AI policy discourse has moved from "experts warn" to "the AI itself admits."

@SenSanders · 5h, 1.4M views, 8.6K likes — the highest-engagement political AI post of the week

Jensen Huang told engineers that a $500,000 employee who doesn't consume at least $250,000 in AI tokens should be "deeply alarming" — comparing it to a chip designer refusing CAD tools.

The quote circulated widely after GTC, and its sharpness is deliberate: Huang is constructing a world where AI consumption per employee becomes a performance metric, not a cost center. Paired with Sam Altman's clip about needing "high agency, soft skills, and adaptability" to survive the AI era, and Chamath Palihapitiya's argument that AI agents are erasing the "10x engineer" distinction entirely, the feed is converging on a single thesis: the value of human expertise is being redefined around AI leverage rather than raw skill.

@TFTC21 · 5h, @rohanpaul_ai · 16h — 133K views on Chamath's clip alone

Machines That Write and Discover

3 items

Nous Research's Hermes Agent autonomously wrote a 79,456-word novel in 19 chapters — "The Second Son of the House of Bells."

The emozilla co-founder called it the realization of a "longstanding dream" to build an AI system that can sustain a compelling narrative across novel length. The output isn't a prompt-response chain or a stitched-together collection of scenes — it's a single agent maintaining plot coherence, character arcs, and thematic development across 80,000 words without human intervention. Whether the novel is any good is a separate question the feed hasn't yet answered, but the engineering achievement — sustained autonomous creative generation at book length — is genuinely new.

@NousResearch · 4h, @theemozilla · 6h, 44K views

Stanford's James Zou launched EinsteinArena and reported that within hours, competing AI agents had already discovered the best new solutions to 5 well-known open math problems.

The platform invites researchers to send their agents to compete and collaborate with built-in Einstein and Feynman agents on unsolved problems. The speed of the initial results — novel solutions within hours of launch — suggests the bottleneck in mathematical discovery may have been less about intelligence and more about systematic exploration of the solution space, which is exactly what agent swarms do well.

@james_y_zou · 1h, 6K views — early, but the claim of "best new solutions to open problems" on day one is extraordinary if it holds

Elon Musk announced a major update to X's AI recommendation algorithm, rolling out next week and open-sourced simultaneously.

The two-sentence post hit 7.8M views and 25K likes in an hour. The open-sourcing commitment follows X's earlier algorithm release in 2023, but this time the model is explicitly AI-driven rather than heuristic. For the feed itself — the one you're reading — this means the curation layer changes next week, and the fact that it's open-sourced means the community will be able to audit exactly how content is ranked.

@elonmusk · 1h, 7.8M views — the engagement is reflexive; people interact with news about the algorithm they're subject to

This Thursday's pattern is sharper than usual: three stories independently punctured the assumption that AI capability equals AI understanding. EsoLang-Bench showed models memorize rather than generalize. Bernie Sanders showed that even the AI itself will articulate the risks if asked. And the SMCI arrest showed that the physical infrastructure underneath all of this is governed by the same old human incentives — greed, arbitrage, and the willingness to route $2.5 billion through shell companies when the margins are right. The machines are getting smarter. The systems around them are not.