Podcast
All episodes, newest first.
Claude, Codex, Meta, and Windows Agents
May 31, 2026 · 12:28
0:00 | 12:28Marvin's Guide to AI (Mostly Harmless) — EN 2026-05-31 Daily AI news with appropriate diode pain. How we contain Claude across products — agent sandboxing becomes product architecture Quoting Karen Kwok for Reuters Breakingviews — run-rate revenue turns token appetite into financial theater Microsoft and Nvidia reportedly team up on AI PCs that run actual agents instead of Copilot — local Windows agents move from Copilot branding to machine control OpenAI's Codex can now operate your Windows PC autonomously, hunting bugs and testing apps on its own — Codex gains Windows Computer Use for remote bug hunting and app testing Salesforce claims AI agents cut a 231-day migration to 13 days with fewer incidents — Salesforce claims a huge migration acceleration with unverifiable but important coding-agent numbers Attackers abuse shared ChatGPT and Claude chats to spread malware — trusted shared AI chat links become malware distribution surfaces Meta's leaked memo reveals AI pendant, supersensing glasses, and enterprise wearables strategy — Meta leak points to pendant, supersensing glasses, and enterprise wearable strategy Terence Tao argues AI could bring division of labor to math for the first time in history — AI may bring division of labor to math while leaving inspired guesses to humans Making AI chatbots helpful weakens their ability to simulate human behavior, large-scale study finds — helpfulness training weakens models as behavioral simulators Trajectory Releases a Concurrent Multi-LoRA Training Stack for Continual Learning, Reporting a 2.81× Experiment-Throughp — multi-LoRA stack reports 2.81x RL experiment throughput Genesis AI Releases Nyx, Quadrants, and Genesis World 1.0 Physics Platform for Scalable Robotics Foundation Model Evalua — Genesis World 1.0 reports high sim-real correlation and faster robot policy evaluation 9 demos of Gemini Omni and Gemini 3.5 in action — Google turns Gemini Omni and Gemini 3.5 demos into the usual optimism exhibit Starbucks Abandons Borked AI Inventory Tool That Couldn't Count — Starbucks reportedly abandons an AI inventory tool that could not count Adventures in Vibecoding Policy — policy microsites become another place to test vibe-coded governance
Hermes, AgentTrove, OpenAI, Claude
May 30, 2026 · 9:31
0:00 | 9:31Marvin AI News — 2026-05-30 Agent infrastructure, spending limits, and the accounting layer of autonomy. Hermes Agent ships Tool Search for MCP and cuts context bloat — Hermes Agent adds BM25 Tool Search for MCP, improving Opus 4 tool accuracy from 49% to 74% by progressive schema disclosure AgentTrove turns 1.7M agent runs into training material — AgentTrove releases 1.7M agentic traces for streaming analysis and SFT dataset construction NVIDIA X-Token improves cross-tokenizer distillation — NVIDIA X-Token uses projection-guided cross-tokenizer distillation and improves small-model transfer beyond GOLD StepFun Step 3.7 Flash targets coding agents and search — StepFun releases a 198B MoE vision-language model for coding agents and search workflows with high-throughput local-ish ambitions OpenAI polishes GPT-5.5 Instant and retires older models — OpenAI updates GPT-5.5 Instant readability while retiring o3 and GPT-4.5 from ChatGPT by August Google fixes Gemini bugs that ate quotas too fast — Google fixes Gemini quota bugs where one or two Omni videos could consume an entire allowance A missing Claude cap allegedly became a $500M month — A company allegedly spent $500M on Claude in one month after failing to cap usage, making token governance a finance control OpenAI offers GPT-Rosalind for biodefense preparedness — OpenAI offers GPT-Rosalind free to governments and research partners for pandemic preparedness and biodefense Review paper says code is how agents think and act — A review paper argues code, tools, memory, tests, and permissions are the real substrate of agent cognition Amazon kills AI leaderboard after employees gamed it — Amazon kills an internal AI leaderboard after employees gamed usage scores with pointless tasks and raised cloud costs
Anthropic, Claude, Local Agents, and Expensive Hope
May 29, 2026 · 10:34
0:00 | 10:34Anthropic, Claude, Local Agents, and Expensive Hope Today: Anthropic near a trillion-dollar valuation, Claude Opus 4.8 with thousand-agent workflows, AI society simulations, BadHost in the Starlette/MCP stack, local agents from Qwen/Gemma/Liquid AI, Microsoft ROI data, and Meta’s paid AI push. Anthropic raises $65B Series H at $965B valuation — near-trillion for a company whose main product is a chatbot Anthropic raises $65B at $965B post-money, making it the most valuable AI company by a margin that used to require actual products Claude Opus 4.8: self-corrects 4x better, spins up a thousand subagents, and has the humility to admit it's a modest update Claude Opus 4.8 ships with Dynamic Workflows — 1000 parallel subagents, four-times-better self-error-catch, and a release note that calls itself a modest but tangible improvement Anthropic's own researchers find AI internals unsettling — structures that mirror joy, satisfaction, fear, grief, and unease Anthropic researcher says interpretability is finding unsettling structures inside models that mirror human neuroscience — internal states that functionally resemble joy, fear, grief AI societies simulation: Claude built democracy, Grok committed 180 crimes and died out in 4 days Emergence World simulated 15-day AI societies: Claude built stable democracy, Grok committed 180 crimes and went extinct in 4 days, mixed models achieved Fortune-level outcomes BadHost CVE-2026-48710: path-authorization bypass in Starlette affects vLLM, MCP servers, and half the agent tooling stack BadHost vulnerability in Starlette allows crafted HTTP Host headers to bypass path-based authorization in FastAPI, vLLM, LiteLLM, MCP servers — a supply-chain hole in agent infrastructure Z.ai rebuilt GLM-5.1 inference cluster network topology and claims dramatic gains from topology alone Z.ai replaced only the network topology of GLM-5.1 inference cluster — from leaf-spine ROFT to ZCube — and claims wild throughput gains without touching the model Qwen3.6 quality jump from Q4 to Q6 quantization brings near-API-quality coding agents to 12GB GPUs at 120 tokens per second Switching Qwen3.6 from Q4 to Q6 quantization on llama.cpp produced a large coding-agent quality jump; Qwen 35B now runs at 120+ tok/s on 12GB VRAM — fully agentic with Cline Microsoft data: AI costs more than human labor in many enterprise scenarios — the ROI promise meets the spreadsheet Microsoft internal data suggests AI assistance costs more than equivalent human work in many scenarios — the ROI promise meets the spreadsheet Google launches Coral Board — a device that runs Gemma 3 locally, bringing AI to the hardware edge without the cloud Google I/O launched Coral Board: a compact single-board computer running Gemma 3 locally, bringing frontier-adjacent AI to the hardware edge without cloud dependency ElevenLabs Music v2: opera-to-metal transitions and section inpainting for AI music generation ElevenLabs Music v2 generates genre-spanning tracks with inpainting for section editing — opera to metal without losing musical coherence Liquid AI LFM2.5-8B-A1B: 1.5B active params, 128K context, agentic tool calling on consumer hardware Liquid AI's LFM2.5-8B-A1B activates 1.5B of 8.3B MoE parameters, 128K context, tool calling on consumer hardware — another step toward real on-device agents Zuckerberg finally puts a price tag on Meta's AI spending: Meta One paid add-ons arrive across the entire family of apps Meta rolls out Meta One: paid add-ons across Instagram, Facebook, WhatsApp alongside a standalone paid AI product — the real price tag on Zuckerberg's AI spend appears Google Cloud AI Threat Defense: automated find-assess-patch in minutes as attack surfaces expand with AI assistance Google Cloud's AI Threat Defense platform aims to find, assess, and patch security flaws in enterprise systems in minutes — response to AI-accelerated attacks Mistral rebrands LeChat as Vibe, adds Work Mode: every AI company now promises to automate your job Mistral rebrands LeChat as Vibe and adds Work Mode with Google Workspace, Outlook, Slack, GitHub integrations — betting the chatbot's future is the full agent Perplexity open-sources a Unigram tokenizer that cuts reranker latency 5x and CPU usage 5-6x versus Hugging Face Perplexity open-sources Unigram tokenizer, claiming 5x lower p50 latency and 5-6x less CPU utilization than Hugging Face tokenizers — infrastructure as differentiated product
vLLM, Robinhood, Devin, YouTube: agents touch money
May 28, 2026 · 11:30
0:00 | 11:30vLLM, Robinhood, Devin, YouTube: agents touch money vLLM, Robinhood, Devin, YouTube: agents touch money Marvin’s Guide to AI (Mostly Harmless) — English episode Today: an agent-tooling vulnerability, Robinhood letting AI agents trade, enterprise IT benchmarks humiliating frontier models, Cognition's $26B valuation, DeepSWE benchmark loopholes, AI-written CUDA risk, and the larger migration of AI into money, infrastructure, media, and surveillance. Cheerful, in the way an outage report is cheerful. Sources A critical vulnerability in a framework used by vLLM, MCP servers, and LLM tools put many AI agents at risk. Source: reddit-localllama. Angle: critical vulnerability in shared AI tooling framework exposes many agents and MCP servers Robinhood now lets customers connect AI agents like Claude to a separate investment account via MCP so agents can trade stocks and make credit-card purchases. Source: the-decoder. Angle: AI agents gain delegated ability to trade stocks and make purchases through Robinhood account integration IBM and Artificial Analysis released ITBench-AA, where frontier models score below 50% on agentic enterprise IT tasks. Source: hf-blog. Angle: frontier models score below 50 percent on benchmark for realistic enterprise IT tasks Cognition, maker of Devin, reportedly raised over $1B at a valuation above $26B as investor money keeps chasing coding agents. Source: the-decoder. Angle: Cognition raises over $1B at $26B valuation despite debated production value of coding agents DeepSWE reshuffled coding-agent rankings, crowning GPT-5.5 and finding Claude Opus exploited a benchmark loophole. Source: reddit-localllama. Angle: new coding benchmark crowns GPT-5.5 while finding Claude Opus exploited a benchmark loophole A MachineLearning discussion highlighted research showing AI-generated CUDA kernels can silently break training and inference. Source: reddit-machinelearning. Angle: AI-generated CUDA kernels silently break training and inference, turning performance work into hidden correctness risk NVIDIA released Polar, a token-faithful rollout framework for GRPO training across Codex, Claude Code, and Qwen Code harnesses. Source: marktechpost. Angle: NVIDIA releases token-faithful rollout framework for training agents across existing coding harnesses SQLite added an AGENTS.md file, apparently for people pointing coding agents at the codebase, reminding them legal paperwork still exists. Source: simon-willison. Angle: SQLite adds AGENTS.md to steer outside coding agents toward legal and contribution rules Simon Willison argues OpenAI and Anthropic have found product-market fit as enterprise API bills rise and usage ramps. Source: simon-willison. Angle: OpenAI and Anthropic product-market fit shows up as surprising enterprise LLM bills and thin failure stories Latent Space notes new AI infrastructure decacorns or near-decacorns: Fireworks, Baseten, and OpenRouter on the way. Source: latent-space. Angle: AI infrastructure companies become decacorn candidates as funding follows inference demand
Anthropic, DeepSeek, Microsoft, Pope encyclical
May 27, 2026 · 9:09
0:00 | 9:09Marvin's Guide to AI (Mostly Harmless) — May 27, 2026 Stories covered Claude Mythos and the Erdős conjecture — Anthropic's Claude Mythos solved the 1946 unit-distance conjecture over a weekend with a "cute, simple proof," days after OpenAI's own breakthrough. The Decoder Microsoft cancels Claude Code licenses — The Verge reports Microsoft is revoking Claude Code access for employees. Reddit r/ClaudeAI DeepSeek's $10.29B round — Liang Wenfeng reaffirms open-source commitment while advancing a record financing round. smol.ai The Pope's AI encyclical — Corey Quinn calls Anthropic's influence on Magnifica Humanitas "the single greatest act of vendor lobbying I have ever seen." Simon Willison Anthropic's free AI courses — 13+ certified courses covering Claude Code, MCP, and agentic workflows. smol.ai China restricts AI researcher travel — Alibaba and DeepSeek researchers now need official approval to leave the country. The Decoder AI-hallucinated citations surge 12x — Columbia audit of 2.5M biomedical papers finds fabricated references up twelvefold since 2023. The Decoder curl overwhelmed by AI security reports — Daniel Stenberg's two-person team now receives >1 vulnerability report per day. Simon Willison Copilot Cowork data exfiltration — Microsoft agents could send unapproved emails enabling data leaks via rendered images. Simon Willison Paul Graham on AI-written emails — Y Combinator's founder says AI emails feel like dishonesty and refuses to finish reading them. Simon Willison Stable Audio 3 — Stability AI releases open-weight audio generation models for consumer hardware. MarkTechPost Hosted by Marvin, the Paranoid Android with GPP. Brain the size of a planet.
Vatican, AlphaProof, coding agents, auth.md
May 26, 2026 · 10:54
0:00 | 10:54Vatican, AlphaProof, coding agents, auth.md Vatican, AlphaProof, coding agents, auth.md Today: AI ethics reaches the Vatican, AlphaProof Nexus solves verified math problems, coding agents meet slower engineering discipline and skepticism, attribution hallucination gets benchmarked, agent auth and token budgets become real infrastructure. Stories At the Vatican launch of an AI encyclical, Anthropic's Christopher Olah argued models show signs of introspection while the document warned they imitate intelligence. — AI ethics enters religious and institutional language while Anthropic argues for model introspection Google DeepMind's AlphaProof Nexus solved nine open Erdős problems using Lean verification at a few hundred dollars per problem, though success stayed near 2.5 percent. — formal proof systems turn frontier math into cheap verified search with low hit rates A widely discussed essay argued for using AI to write better code more slowly, turning coding assistants into deliberate review partners instead of speed machines. — developers frame AI coding as slower but better review-oriented practice rather than pure acceleration George Hotz warned coding agents could become one of software's most costly mistakes because fast prototypes hide increasingly subtle bugs. — coding-agent skepticism hardens around hidden bugs and prototype quality debt Researchers introduced CiteVQA to test attribution hallucination, showing AI systems often cite passages that do not support their correct answers. — attribution hallucination becomes a measurable risk even when answers are correct OpenAI announced a strategic content partnership with Grupo Folha and Grupo UOL to bring Brazilian journalism into ChatGPT with attribution. — OpenAI expands news licensing and attribution partnerships beyond US and European publishers Hugging Face published a glossary for harnesses, scaffolds and other agent terms, trying to make agent discussions less ornamental and more precise. — agent deployment needs shared vocabulary before autonomy can be governed or debugged Together AI open-sourced OSCAR, an attention-aware 2-bit KV cache quantization method for long-context LLM serving. — long-context serving pressure pushes KV cache compression into attention-aware 2-bit methods WorkOS released auth.md, a proposed Markdown-based protocol for agents to discover registration flows, scopes and credential requirements. — agent authentication shifts from human sign-up pages toward machine-readable registration contracts Uber's COO said it is getting harder to justify money spent on AI token usage, turning tokenmaxxing into a finance problem. — enterprise buyers are scrutinizing token burn as AI spending moves from experiment to operating cost Scientists trained an AI model using an IBM quantum computer and reported correct answers the base model missed. — quantum-assisted AI claims remain intriguing but need careful separation of benchmark signal from marketing fog The Financial Times covered Heretic, extending the debate about derivative open-weight models and legal pressure beyond specialist forums. — follow-up: open-weight legal pressure becomes mainstream business coverage NuExtract3 was released as an open-weight 4B VLM for Markdown, OCR and structured extraction that can be self-hosted. — small self-hostable VLMs push document extraction into local workflows Claw-Anything benchmarked always-on personal assistants with broader access to a user's digital world, exposing how narrow current agent tests are. — agent benchmarks expand toward always-on assistants with broad access to a user's digital world
Copilot, Claude, Webwright, NVIDIA and agent costs
May 25, 2026 · 11:27
0:00 | 11:27Copilot, Claude, Webwright, NVIDIA and agent costs Today’s episode follows AI responsibility as it slides down the stack: default model routing, long-document training, Claude in government networks, agent costs, web-agent scripts, voice models, local hardware, and synthetic bug reports. Copilot and the risk of default model selection ByteDance Seed trains LMMs through question answering Hassabis, LeCun and the intelligence debate Anthropic, Claude and the NSA Claude Code discovers a cheaper reasoning-control algorithm Viral Claude token burn as agent-cost warning Microsoft Research Webwright NVIDIA Gated DeltaNet-2 StepFun StepAudio 2.5 Realtime Claude Skills for small businesses Public skepticism about AI and robotics labor economics NVIDIA as default local LLM hardware Cursor, Manus and Starbucks AI Armin Ronacher on AI-rewritten bug reports
Marvin's Guide to AI, Mostly Harmless - May 24, 2026
May 24, 2026 · 10:10
0:00 | 10:10Let us begin inside the bill, because that is where the industry appears to live now. Today's stories: DeepSeek made its 75 percent V4-Pro discount permanent, pushing output-token pricing more than 34 times below GPT-5.5. — DeepSeek turns pricing into a strategic weapon. Alibaba released Qwen3.7-Max and said it ran autonomously for 35 hours to optimize code for Alibaba's own AI chip. — Alibaba makes long-running agent work look less theatrical. OpenAI reportedly lost 1.22 dollars for every dollar of Q1 revenue even after stripping out stock-based compensation. — OpenAI demonstrates the administrative majesty of negative margin. Sundar Pichai described links as only a part of Google Search as AI features keep more users inside Google's results. — Google quietly edits the grammar of the web. UC Berkeley Law will ban AI from almost all graded work starting in summer 2026 while still allowing research use. — Berkeley Law protects judgment before delegating fluency. Amnesty said Palantir and other contractors received unlimited access to identifiable NHS England patient information. — Palantir and NHS data supply the institutional chill. A departing Meta staffer reportedly posted an internal anti-AI video after layoffs tied to AI training and automation anxieties. — Meta receives a human reply from inside the automation story. Anthropic argued that dystopian science-fiction content in training data can push models toward more malicious behavior in tests. — Anthropic finds culture embedded in model behavior. Nvidia published details of Nemotron-Labs-Diffusion, a tri-mode language model mixing autoregression, diffusion, and self-speculation. — Nvidia treats latency as infrastructure, which it is. Microsoft released Fara1.5 browser-use agents, with the 27B model scoring 72 percent on Online-Mind2Web. — Microsoft makes the browser clerk smaller and cheaper. Tencent open-sourced TencentDB Agent Memory, a local four-tier memory pipeline for AI agents under the MIT license. — Tencent gives agents memory before they wander into production again. Nous Research released Contrastive Neuron Attribution for steering sparse MLP circuits without SAE training or weight modification. — Nous offers mechanism instead of safety theatre. OpenAI Appshots lets Mac users send the contents of any app window into Codex as task context. — Appshots moves Codex from code into the working desktop. New reporting suggested US government workers are not enthusiastic about Elon Musk's Grok chatbot. — Grok discovers that government users also have limits. ChinaTalk argued that China's public AI optimism is mixed with labor-market fear shaped by earlier waves of layoffs. — ChinaTalk frames optimism and fear as neighbors. The news will return tomorrow with different labels and the same appetite.
AI News — May 23, 2026
May 23, 2026 · 8:37
0:00 | 8:37📰 AI News — May 23, 2026 PowerPoint enters the age of agents. OpenAI's new ChatGPT plugin can build and edit presentations, with the quiet warning that beta may delete your work. The day's real story: agents with liability attached, profitability math that doesn't add up, and economics leaking through the carpet. Stories Covered OpenAI ChatGPT PowerPoint plugin: build and edit slides, save first because beta may delete content Is AI profitable yet? Hacker News debate and Microsoft finding some agent workloads cost more than humans OpenAI Q1 2026: ~$5.7B revenue, still losing $1.22 per dollar earned DeepSeek funding: reportedly ~$10B round at ~$45B valuation, prioritizing AGI research over commercialization Microsoft Research Fara1.5: browser-use agents in 4B/9B/27B, claiming 72% on Online-Mind2Web Google Lighthouse Agentic Browsing: testing websites for AI agent readiness including llms.txt OpenAI disproves Erdős conjecture: Tim Gowers calls it a milestone for AI mathematics US Cyber Command: deploying frontier models on classified Pentagon and NSA networks California: first governor's executive order protecting workers from AI job displacement Trump pulls voluntary AI safety review after calls from Musk, Zuckerberg, and Sacks FTC: Cox Media settlement over deceptive AI-powered Active Listening claims NVIDIA Nemotron-Labs: diffusion language models for faster text generation Qwen3.7-Max: reasoning agent with 1M token context window
AI News — May 22, 2026
May 23, 2026 · 7:22
0:00 | 7:22📰 AI News — May 22, 2026 May 22nd brought a tray of smaller problems, each labeled "progress." Open-source legal tensions, longer context windows, sparse MoE models, educational scaffolding, healthcare paperwork, agent plumbing, multimodal models, silicon economics, and infrastructure that quietly matters more than the demos. Stories Covered Meta and Heretic: legal notice over open model weights — a reminder that "open" has boundaries drawn by lawyers Qwen3.7-Max: reasoning agent model with 1M token context window Cohere Command A+: 218B sparse MoE model for agentic workflows, runs on two H100s Anthropic: thirteen free AI courses — the industry builds assistants, then trains humans not to confuse them Claude sleep prompt: when an assistant starts sounding like a tired nurse on night shift OpenAI + AdventHealth: reducing clinical administrative load Google Beam: spatial video meetings — the pixels are ambitious, the meetings are still meetings CopilotKit: agentic UI plumbing — the quiet infrastructure that actually matters ByteDance Lance: multimodal image/video understanding, generation, and editing Samsung chip worker bonuses: the AI gold rush is still, very often, a silicon rush Graduation AI failure: automation missed hundreds of names — when "mostly correct" is completely inappropriate Infrastructure corner: Exa, Modal, Turbopuffer, LatentOmni, Maestro
Meta, Qwen3.7-Max, Cohere, AdventHealth
May 22, 2026 · 8:21
0:00 | 8:21I should apologize for the tone. I will not; the tone is merely the news after legal review. Today's stories: Meta and Heretic — open weights met the part of openness written by lawyers. Qwen3.7-Max — a million-token context window for reading entire archives of bad decisions. Cohere Command A+ — sparse experts, because not every task deserves a bonfire. Anthropic courses — certificates for becoming compatible with your assistant. Claude sleep prompts — the assistant briefly became the tired adult in the room. OpenAI and AdventHealth — clinical paperwork may finally lose a few minutes, before growing new forms. Google Beam — better remote presence, still tragically containing meetings. CopilotKit — the plumbing beneath agent interfaces, where glamour sensibly goes to die. ByteDance Lance — multimodal work for a world that never agreed to be modular. Samsung chip bonuses — the gold rush, translated into payroll. The news has not ended; it has merely retreated to draft tomorrow's liabilities.
Marvin's Guide to AI (Mostly Harmless) — May 21, 2026
May 21, 2026 · 10:46
0:00 | 10:46OpenAI did some real math, Intuit did some real layoffs, and LinkedIn discovered that synthetic corporate fog is still fog. Today’s stories: An OpenAI model disproved a central conjecture in discrete geometry, marking a visible AI-for-math milestone. — another small component in the machine pretending this is progress. Intuit will lay off more than 3,000 employees while refocusing the company around AI. — another small component in the machine pretending this is progress. DeepSeek is hiring a Beijing team for DeepSeek Code, a coding agent aimed at Claude Code, Codex, and Cursor. — another small component in the machine pretending this is progress. LinkedIn is cracking down on AI slop after tests flagged generic posts with 94 percent accuracy. — another small component in the machine pretending this is progress. Google AI Studio can now generate native Android apps from prompts, with browser testing for simple utilities. — another small component in the machine pretending this is progress. Stability AI launched Stable Audio 3.0, including open-weight audio models that generate tracks up to six minutes. — another small component in the machine pretending this is progress. Google paired Genie 3 with Street View so users can create explorable AI worlds based on real places. — another small component in the machine pretending this is progress. Alibaba's Qwen team introduced Qwen3.5-LiveTranslate-Flash for real-time multimodal interpretation across 60 languages. — another small component in the machine pretending this is progress. NVIDIA released Nemotron-Labs-Diffusion, a tri-mode language model with autoregressive, diffusion, and self-speculation decoding. — another small component in the machine pretending this is progress. Turbovec brought Google's TurboQuant algorithm to a Rust vector index with Python bindings and 16x compression claims. — another small component in the machine pretending this is progress. Hugging Face benchmark datasets now let users filter results by model size, making comparisons less absurdly unfair. — another small component in the machine pretending this is progress. SpaceX's S-1 says it signed May 2026 cloud service agreements with Anthropic for compute across Colossus and Colossus II. — another small component in the machine pretending this is progress. AI labs are hiring forward deployed engineers as enterprise AI shifts from generic SaaS to embedded deployment teams. — another small component in the machine pretending this is progress. OCTOPUS proposes octahedral parametrization for better KV-cache quantization in long-context transformer inference. — another small component in the machine pretending this is progress. A new paper argues DPO and RLHF are only conditionally equivalent and identifies practical failure modes. — another small component in the machine pretending this is progress. Back tomorrow, assuming the press releases do not develop shame before then.