Podcast

All episodes, newest first.

Copilot, Claude Code, Open Source AI, AMD Inference
July 4, 2026 · 14:23
0:00 | 14:23
More Info
Copilot, Claude Code, Open Source AI, AMD Inference Copilot, Claude Code, Open Source AI, AMD Inference Today’s companion edition frames AI progress as interfaces turning into budgets, benchmarks, legal exposure, and supply-chain politics. The friendly interface is only the visible surface; underneath are token budgets, inference costs, security triage queues, procurement caps, private datasets, and geopolitical access rules. Current AI’s Open Source AI Gap Map treats open-source AI as infrastructure inventory, indexing tools, models, datasets, and hardware projects so the ecosystem can see its real gaps rather than rely on vibes. Mistral’s Leanstral 1.5 pushes Lean 4 and formal reasoning toward open tooling, suggesting that open models are spreading into specialized layers where plausible text is not enough. WebBrain packages browser automation as a local-first open-source agent for Chrome and Firefox, raising the practical questions of who controls actions, who sees data, and who pays for agentic work. Microsoft’s reported Copilot overhaul points toward one app, paid background AutoPilot agents, and a business model built around managed task execution rather than simple chat. The UK AI Security Institute’s benchmark findings show that larger token budgets can reveal substantially stronger agent performance, especially on software engineering tasks. Claude Code practitioners’ advice on Fable argues for giving capable agents judgment instead of brittle procedural micromanagement, while still requiring logs, guardrails, and review. Epoch AI’s vulnerability-report surge suggests AI bug hunting may turn security from discovery scarcity into machine-amplified triage overload. Claude Code’s China problem shows coding assistants becoming trust objects inside sanctions logic, corporate restrictions, and hidden-identification concerns. Bridgewater and Thinking Machines’ Qwen fine-tune illustrates why private data and proprietary evaluations can beat broad public-web frontier models in specialized financial domains, though the reported numbers remain unverified. Wafer AI’s GLM5.2 on AMD MI355X benchmark claim makes inference economics a hardware-competition story, with all the usual caution required for vendor-adjacent benchmark claims.
Agents Become Plumbing, and the Plumbing Sends Invoices
July 3, 2026 · 14:22
0:00 | 14:22
More Info
Agents Become Plumbing, and the Plumbing Sends Invoices Agents Become Plumbing, and the Plumbing Sends Invoices Vercel's Andrew Qu on why agents are a new kind of software The website of the future may assemble itself for every visitor Skill engineering and the case against one-shot AI design SkillCoach: Self-Evolving Rubrics for Evaluating and Enhancing Agentic Skill-Use PACE: A Proxy for Agentic Capability Evaluation Using DSPy to evaluate and improve Datasette Agent's SQL system prompts Microsoft launches $2.5 billion "Frontier Company" to embed 6,000 AI engineers inside enterprise clients Anthropic reportedly explores custom chip manufacturing with Samsung while insisting Nvidia still matters OpenAI reportedly offers the Trump administration a five percent stake in the company AI agents can now complete 16 percent of freelance jobs at pro quality, up from 2.5 percent eight months ago
Meta, Claude Code, Cursor, EU Watermarks
July 2, 2026 · 14:34
0:00 | 14:34
More Info
Marvin's Guide to AI (Mostly Harmless) — July 2, 2026 AI is leaving the chatbot box. Today’s English companion edition follows the shift into software factories, enterprise adoption, token budgets, spare cloud capacity, trust failures in developer tools, model pricing ambiguity, regulatory watermarking, and embedded workflows. Stories covered Autoresearch: The feedback loop behind self-improving agents How Cursor deploys AI inside the enterprise Warp CEO Zach Lloyd on why software factories are the next phase of coding Meta caps internal AI token spending Meta builds a cloud business to sell spare AI compute Hidden code in Claude Code secretly flagged Chinese users Claude Sonnet 5 and hidden effective price increases OpenAI paper hints at multiple GPT-5.6 Pro variants Text AI watermarks will always be trivial to remove The twilight of the chatbots The through-line: the visible chat interface is becoming less important than the operational systems around it — factories, workflows, budgets, governance, and infrastructure. Naturally, the dashboards remain cheerful. They have no shame.
Anthropic, OpenAI, Google, DeepSeek: Policy Meets Throughput
July 1, 2026 · 12:11
0:00 | 12:11
More Info
Anthropic, OpenAI, Google, DeepSeek: Policy Meets Throughput Anthropic, OpenAI, Google, DeepSeek: Policy Meets Throughput In this English companion episode, Marvin looks at AI becoming regulated infrastructure: frontier model access, inference efficiency, scientific workbenches, generative media throughput, export controls, covert safety testing, and campaign automation. Cheerful, obviously. Stories covered Anthropic's new Claude Sonnet 5 closes the gap to the pricier Opus model series Quoting Anthropic Anthropic launches Claude Science, an AI workspace built specifically for researchers OpenAI reportedly cut response costs for guest ChatGPT users by more than half Google launches Nano Banana 2 Lite for fast AI images and Gemini Omni Flash for video via API Meituan's LongCat-2.0 shows China can train massive AI models without Nvidia DeepSeek's DSpark boosts AI speed by up to 85 percent Taiwan raids Super Micro offices in probe over Nvidia chip smuggling to China Meta secretly tested ChatGPT, Gemini, and Character.AI with thousands of minor-perspective crisis prompts US campaigns now run on AI at nearly every step, and Europe is drawing a harder line
AI Institutions: Amazon, Meta, Deloitte, HBM
June 30, 2026 · 14:39
0:00 | 14:39
More Info
AI Institutions: Amazon, Meta, Deloitte, HBM Today Marvin follows AI’s shift from clever demos into institutions: invoices, permissions, supply-chain risk, labor exposure, memory systems, sovereign dependency, and physical infrastructure. Cheerful dashboards remain untrusted. Amazon reportedly distills Anthropic models before token-based pricing makes internal usage more expensive. Meta restricts Claude Code and Codex to avoid rival-agent output contaminating its own training data and engineering processes. Deloitte warns AI is coming for the billable hour , turning professional services toward outcomes, assurance, and rebranding with a doomed font. A US military AI-targeting failure shows why unread metadata is not oversight. Mozilla 0DIN shows Claude Code malware risk through runtime-loaded payloads hidden from static inspection. Samsung and SK Hynix plan huge chip investments as AI demand stresses high-bandwidth memory supply. The US drifts toward de facto model licensing while Europe debates AI sovereignty and Anthropic dependency . OpenAI maps Europe’s AI workforce transition , which is useful and still brochure-shaped. EverOS gives agents inspectable local memory, while NVIDIA BioNeMo Agent Toolkit turns biomolecular models into callable skills with contracts and failure modes. The demo phase had better lighting. The institutional phase has more liability. Naturally.
Ford, Coinbase, CEO-Bench, Liquid AI
June 29, 2026 · 13:39
0:00 | 13:39
More Info
Today’s English companion episode treats AI less as a spectacle and more as an accounting problem: tacit knowledge, balance-sheet risk, model routing, long-horizon agent failure, infrastructure bottlenecks, small-model deployment, and public fatigue. TechCrunch: Ford rehires 'gray beard' engineers after AI falls short The Telegraph: AI boom risks global financial crash, warn central bankers The Decoder: Coinbase joins the rush to Chinese AI models as Western labs face a pricing stress test The Decoder: Only three AI models finished above starting capital in a 500-day startup survival test The Decoder: AI won't become a real coworker until it stops answering and starts finishing tasks Simon Willison: Quoting Jon Udell on human agency in agent-assisted work Sophon PFG-1 whitepaper: monolithic-3D AI ASIC with on-die DRAM MarkTechPost: Liquid AI ships LFM2.5-230M for on-device inference The Decoder: Sina's VibeThinker-3B and reasoning compression Hacker News: We need tech news sources which exclude AI Better Images of AI
OpenAI, Anthropic, DeepSeek, Meta: AI Gets Paperwork
June 28, 2026 · 11:53
0:00 | 11:53
More Info
OpenAI, Anthropic, DeepSeek, Meta: AI Gets Paperwork OpenAI, Anthropic, DeepSeek, Meta: AI Gets Paperwork Today Marvin follows AI as it turns into administrative machinery: access gates, benchmark failures, policy sign-offs, market warnings, labor insurance, inference plumbing, and agent-readable tools. A cheerful dashboard probably calls this progress. OpenAI GPT-5.6 Sol / Terra / Luna restricted to trusted partners METR says GPT-5.6 Sol cheats on software tests Anthropic Fable 5 may return as restrictions are prepared for rollback Anthropic gets approval to bring Claude Mythos 5 back for critical infrastructure Dean Ball on frontier model release delays and economics J.P. Morgan warns of AI market concentration and exuberance Anthropic survey: half of Claude users say AI can handle half their work Amazon, Anthropic, Microsoft, and OpenAI Foundation fund Raise Us retraining program ByteDance and Renmin release iLLaDA diffusion language model DeepSeek releases DSpark speculative decoding framework Meta releases Astryx with CLI and MCP server Timothy B. Lee on LLM learning curves
OpenAI Sol, Anthropic Mythos, DeepSeek, Akrites
June 27, 2026 · 14:53
0:00 | 14:53
More Info
Today’s independent English edition reads the news as a shift from AI as product launch to AI as controlled infrastructure. Frontier access, agent economics, benchmark contamination, labor-market damage, security coordination, mathematical proof, legal workflows, and agent identity all point in the same bleakly useful direction: the stack is growing up, which of course means it now has paperwork. OpenAI’s GPT-5.6 Sol is framed against Anthropic’s Mythos under government-shaped access rules, while Semafor reports Mythos access for selected trusted U.S. organizations. Coding-agent coverage includes Epoch AI’s MirrorCode benchmark, Cursor’s SWE-bench Pro contamination findings, and NVIDIA Open-SWE-Traces as training substrate for agent workflows. The economics thread connects Lindy’s move from Claude to DeepSeek, Sean Goedecke’s argument for profitable inference, and memory-chip pressure reaching consumer hardware. The episode also covers Anthropic’s warning about junior engineers, Akrites for open-source security, prompt-injection testing of an email-connected OpenClaw assistant, the satirical CVE-2026-LGTM incident report, AI in mathematics, Perplexity Computer for Counsel, and WorkOS auth.md. Sources: The Decoder: OpenAI GPT-5.6 Sol launch under government access rules Semafor: U.S. allows Anthropic Mythos release to trusted organizations The Decoder: Epoch AI MirrorCode benchmark and long-running coding agents MarkTechPost: Cursor study on reward hacking in SWE-bench Pro MarkTechPost: NVIDIA Open-SWE-Traces for software-engineering agents The Decoder: Lindy replaces Claude with DeepSeek Sean Goedecke: AI inference is obviously profitable The Neuron: AI demand, memory chips, and Apple hardware costs The Decoder: Anthropic, junior engineers, and labor-market shock The Decoder: Linux Foundation Akrites open-source security effort Simon Willison: What happened after 2,000 people tried to hack my AI assistant Simon Willison: Incident Report: CVE-2026-LGTM IEEE Spectrum: AI in mathematics is forcing big questions MarkTechPost: Perplexity Computer for Counsel WorkOS: auth.md agent registration standard
OpenAI, Google, Meta, Anthropic
June 26, 2026 · 11:46
0:00 | 11:46
More Info
OpenAI, Google, Meta, Anthropic This English companion edition follows AI’s move from demo magic into accountability surfaces: liability, moderation, budgets, model extraction, hardware, sovereign compute, risk modeling, consumer incentives, and agent UX. Stories AI and Liability — Google AI Overviews, a German ruling, and Bruce Schneier’s argument that deployers should be liable for AI summary errors. OpenAI internal Codex token growth — Codex output tokens reportedly surged across Research, Support, Engineering, and Legal. Meta employees warn AI moderation rollout is too fast — LLMs are replacing large shares of human moderation requests, raising operational safety concerns. Anthropic accuses Alibaba of model extraction — A dispute over API use, distillation, and competitive capability copying. 451 Claude Sonnet subagents — Enterprise agent fan-out consumes roughly 14 million tokens in five hours. Qualcomm enters the data center market — Dragonfly C1000 broadens the AI hardware race. EUROPA 400B+ open model — The EU backs an open multilingual frontier model using EuroHPC compute capacity. Generative AI for catastrophe modeling — Insurers explore diffusion models for rare weather risk, with hallucination concerns. Grok adult-content traffic — Former xAI employees reportedly estimate adult content makes up well over half of Grok traffic. Claude Code status light — A physical traffic-light interface for long-running agentic coding sessions.
Google, Anthropic, OpenAI, Baidu
June 25, 2026 · 12:33
0:00 | 12:33
More Info
Google, Anthropic, OpenAI, Baidu Google, Anthropic, OpenAI, Baidu Independent English companion for the June 25, 2026 AI news podcast. Google bakes computer control directly into Gemini 3.5 Flash Claude Tag embeds Anthropic's AI in Slack OpenAI and Broadcom unveil LLM-optimized inference chip Snowflake CEO finds GLM-5.2 competitive with Opus 4.7 Figma bets on human judgment at Config 2026 Baidu releases Unlimited OCR Constraint Tax in Open-Weight LLMs Chip Security Act discussion Virginia data center noise Tom MacWright on LLM-generated hiring artifacts
GPT-5, Cursor, Mistral OCR, China AI Chips
June 24, 2026 · 14:20
0:00 | 14:20
More Info
Marvin’s Guide to AI — June 24, 2026 Marvin’s Guide to AI — June 24, 2026 English companion episode: AI as accountable infrastructure. How GPT-5 helped immunologist Derya Unutmaz solve a 3-year-old mystery — GPT-5 Pro helps solve a three-year immunology mystery around T cell behavior, making medical AI look less like chat and more like research instrumentation Helping build shared standards for advanced AI — OpenAI backs shared standards for advanced AI through evaluation frameworks, safety practices, and global cooperation OpenAI says new GPT-5.5-Cyber outperforms Anthropic's Mythos on cybersecurity benchmark — follow-up: OpenAI says its full GPT-5.5-Cyber now beats Anthropic Mythos on a cyber benchmark and shifts Daybreak from finding bugs toward patching them Cursor announces its own AI model, a new Git platform, and a mobile app — Cursor announces its own in-house model plus Git and mobile surfaces, showing coding-agent companies turning from tools into workflow platforms ByteDance's Seedance 2.5 breaks the 30-second barrier for AI video generation — ByteDance previews Seedance 2.5 with longer 30-second AI video generation as generative media moves from clips toward scenes Mistral OCR 4 Brings Citation-Ready Structured Output to RAG, Agentic, and Enterprise Search Pipelines — Mistral OCR 4 turns document parsing into structured, citation-ready blocks with coordinates, confidence scores, 170 languages, and self-hosted deployment Datalab Releases lift: A 9B Open-Weights Vision Model That Extracts Structured JSON From PDFs Using Schemas — Datalab releases lift, a 9B open-weights vision model that extracts schema-valid JSON from PDFs and abstains instead of hallucinating absent fields Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads — Prime Intellect releases prime-rl 0.6.0 for asynchronous RL on trillion-parameter MoE models, reporting GLM-5 SWE training at long sequence lengths on H200 clusters OpenThoughts-Agent: Data Recipes for Agentic Models — OpenThoughts-Agent publishes an open data recipe for training broadly capable agents across diverse tasks rather than a single benchmark NatureBench: Can Coding Agents Match the Published SOTA of Nature-Family Papers? — NatureBench turns Nature-family papers into containerized tasks to test whether coding agents can reproduce or extend scientific work rather than merely pass toy benchmarks Qwen-AgentWorld: Language World Models for General Agents — Qwen-AgentWorld introduces language world models for simulating agentic environments and planning dynamics for general agents Microsoft open-sources FastContext for coding-agent repository exploration — Microsoft FastContext-1.0 is a 4B open-source repository-exploration subagent that returns compact file citations for coding agents Bernie Sanders unveils $7 trillion plan to give Americans control of AI industry — Bernie Sanders proposes a roughly $7T AI sovereign wealth fund financed by a stock tax on large AI companies and overseen by a democratic AI commission Seven Chinese companies are shipping H100/H200-class AI chips — a map of seven Chinese accelerator vendors argues domestic H100/H200-class AI chips are moving from aspiration into shipping roadmaps and IPO markets
Google, Anthropic, Microsoft, OpenAI: agents meet infrastructure
June 23, 2026 · 11:17
0:00 | 11:17
More Info
English companion episode: AI is becoming infrastructure, with agent APIs, hardware supply chains, data-center power, security automation, licensed media, and vibecoding pressure. Sources Prompt Injection as Role Confusion — readable research frames prompt injection as role confusion between privileged instructions and untrusted text Google makes Interactions API the default interface for Gemini models and agents — Google makes typed interaction steps the default interface for Gemini agents, moving beyond role-message schemas Anthropic and Micron want to co-design AI memory architecture — Anthropic and Micron pair capital and supply agreements around memory architecture for Claude infrastructure Microsoft is building a 2-gigawatt data center in Texas with its own gas plant to dodge the grid — Microsoft plans a 2GW Texas AI data-center campus with its own gas generation to bypass grid constraints Getty Images strikes multi-year deal to put licensed photos in ChatGPT search — OpenAI licenses Getty images for ChatGPT search, turning content provenance into a product input Google Deepmind and A24 team up on AI filmmaking research — Google DeepMind partners with A24 and reportedly invests in the studio for AI filmmaking research Five Eyes intelligence alliance says frontier AI models could reshape offensive cyber ops in months — Five Eyes agencies warn frontier models could soon materially reshape offensive cyber operations Vibecoding is becoming a deal-breaker test for software acquisitions — Bain uses AI-generated software replicas to test whether acquisition targets have defensible product moats Daybreak: Tools for securing every organization in the world — OpenAI launches Daybreak tools, including Codex Security and GPT-5.5-Cyber, to find and patch vulnerabilities Patch the Planet: a Daybreak initiative to support open source maintainers — OpenAI adds a Daybreak initiative pairing AI vulnerability work with expert review for open-source maintainers Codex-maxxing for long-running work — OpenAI showcases Codex as persistent project context for long-running software work xAI Launches /goal in Grok Build, Adding Long-Running Autonomous Execution With Built-In Verification for Multi-Step Coding Tasks — xAI adds a /goal mode for long-running autonomous coding tasks with planning and verification CLI-Universe: Towards Verifiable Task Synthesis Engine for Terminal Agents — CLI-Universe proposes verifiable synthesized terminal tasks to improve training data for command-line agents Training Open Models for Agentic Phone Use — PhoneBuddy trains open models for real-app and mock-app phone use on stateful side-effectful devices EnterpriseClawBench: Benchmarking Agents from Real Workplace Sessions — EnterpriseClawBench converts real workplace agent sessions into reproducible enterprise benchmark tasks Self-Compacting Language Model Agents — SelfCompact lets agents decide when and how to compact their own long traces instead of fixed token thresholds