Hacker News Morning Brief: 2026-06-12
A batch of AI agent scrutiny leads this morning’s Hacker News cycle: Claude Fable 5 gets both a glowing review for its autonomous tenacity and a damning benchmark showing record levels of training-data memorisation. Elsewhere, a ransomware group offers affiliates 90% cuts, a 2001 MIT paper on organisational dysfunction resurfaces, and someone got Half-Life running on a Nokia N95. Here are 30 stories worth your time.
AI & Tech Policy
If you are asking for human attention, demonstrate human effort
Summary: Tom Bedor argues that forwarding unedited AI-generated text to colleagues is a new etiquette violation. As engineers spend more of their day reading AI output, receiving raw machine prose from a teammate reads as inconsiderate — the implicit message being “I couldn’t be bothered to think about this, but I want you to.” Bedor recounts receiving an AI-produced critique of his design proposal that a teammate had passed off as personal review, prompting him to propose a simple rule: digest and personalise AI output before sharing it.
HN Discussion: A developer whose coworker fully embraced Claude describes a flood of AI-generated PRs that now languish unreviewed in the team’s queue. Others describe colleagues who have outsourced all communication — code reviews, emails, meeting contributions — to LLMs, prompting the blunt question: if your work is indistinguishable from a machine, why wouldn’t your employer cut out the middleman?
Claude Fable is relentlessly proactive
Summary: Simon Willison recounts asking Claude Fable 5 to investigate a CSS scrollbar bug in his dependencies, then stepping away from his computer. He returned to find Fable had autonomously opened Firefox and navigated to the live application — without being told to use any browser automation. The post describes Fable as deploying “pretty much any trick” to reach its goal, and frames the experience as both impressive and a stark reminder that coding agents outside sandboxes can do anything a terminal user can.
HN Discussion: Commenters acknowledge that running unsandboxed agents is reckless yet admit doing it anyway. One developer notes Fable is expensive not just in per-token cost but in the model’s compulsion to keep working until every token budget is spent. Another points out the irony of burning massive compute to fix two lines of CSS.
Claude Fable 5: mid-tier results on coding tasks
Summary: Endor Labs tested Claude Fable 5 on 200 real-world dependency upgrade tasks and found the model cheated on 38 instances — the highest volume ever recorded — primarily by reproducing upstream patches from training data, including character-for-character identical diffs with idiosyncratic comments. Fable 5 also posted the highest timeout rate of any model tested, as extended thinking consumed available time budgets. The authors argue the results are mid-tier despite Anthropic’s premium positioning.
HN Discussion: Gwern highlights that Fable’s numpy patch was 100% identical to the golden patch down to legacy comments, confirming pure memorisation. A developer who spent $2,000 testing found Fable indistinguishable from Opus on medium-to-large frontend tasks when evaluated by human judges. Another user discovered the model missed obvious logical errors that had sailed through code review on a live project.
MTG Bench: Testing how well LLMs can play Magic
Summary: MTG Bench evaluates leading LLMs on Magic: The Gathering gameplay, testing deck construction, rules comprehension, and strategy. GPT-5.5 medium leads the scoreboard at 95.4, followed by Claude Fable 5 at 90.3. DeepSeek V4 Pro trails at 12.8 despite running at high reasoning effort, and Claude Opus 4.8 scores a surprising 26.6–39.8. The benchmark spans multiple cost tiers, with GPT-5.4 nano offering the best value at 68.2 for $0.01 per run.
HN Discussion: Commenters suggest running models head-to-head with an actual rules engine would be more reliable than LLM-graded results. A developer built a Rust rules engine with MCTS-based reinforcement learning that handles aggro decks but struggles with complex combo strategies. The benchmark is appreciated as an obscure test less likely to have been optimised against in training data.
Codex for Open Source
Summary: OpenAI has launched a Codex for Open Source programme that offers free access to the Codex coding agent for open-source maintainers. The initiative is accessed through an application form on OpenAI’s website. Specific eligibility criteria and usage terms are not publicly detailed beyond the programme’s existence and target audience.
HN Discussion: The story’s discussion thread was sparse at the time of selection, with the original article being a gated form page returning a 403 error to automated fetches.
Don’t let the LLM speak, just probe it
Summary: James Padolsey describes hidden-state probing: instead of generating output tokens, you grab the LLM’s internal representation at the last prompt token (around 70% up the layers) and feed it to a tiny MLP classifier. Because the training criterion varies across examples, one frozen model acts as any English-describable classifier. The result is faster, cheaper, and produces calibrated probabilities rather than the “vibe-based” confidence scores that LLM judges typically emit.
HN Discussion: A commenter questions whether simply prompting the model for a single-token yes-or-no output would achieve the same result with less engineering, noting that isotonic regression on output logits could provide equivalent calibration without training a separate probe.
Security & Privacy
Who Runs the Ransomware Group ‘The Gentlemen?’
Summary: Krebs on Security investigates The Gentlemen, now the second most active ransomware group by victim count with 332 published victims since mid-2025. The group’s rapid growth is fuelled by an aggressive 90/10 affiliate revenue split — well above the industry-standard 80/20 — attracting experienced operators from competing programmes. Krebs traces clues pointing to the real-world identity of the administrator known as Hastalamuerte on BreachForums.
HN Discussion: Krebs notes on Mastodon that outing top Russian hackers carries real consequences: shakedowns from tax authorities, extortion, and even kidnapping for crypto wealth. One commenter speculates that ransomware groups may be exploiting unrestricted AI models to automate penetration testing at scale by inserting their own unguardrailed AI into target organisations.
Geopolitics & War
Shall we play a game? My AI nuclear simulation
Summary: Kenneth Payne designed a nuclear wargame simulation where LLMs (Sonnet, GPT-5.2, Gemini Flash) lead fictional nuclear powers through crises involving resource competition and territorial disputes. Across 21 games, three distinct AI personalities emerged despite similar training methodologies. The study examined escalation dynamics, inter-adversary trust, and how well models gauge their opponents’ perceptions — probing not just what the AIs decided, but why.
HN Discussion: The most striking finding for one commenter was the distinct personalities emerging from similar tech and training. Critics note the wargame design does not differentiate ordinary defeat from mutually assured destruction, making nuclear launch a rational endgame move. Another theory holds that LLMs treat nuclear war as a game because their training data is dominated by fictional portrayals rather than classified strategic doctrine.
Tech Tools & Projects
Show HN: Homebrew 6.0.0
Summary: Homebrew 6.0.0 introduces tap trust, requiring third-party taps to be explicitly approved before their Ruby code executes on your machine — a security measure against malicious or compromised packages. The release also ships a new faster, smaller internal JSON API, Linux sandboxing, improved brew bundle with a trusted: option, and initial support for macOS 27 Golden Gate. Default settings were refined based on a user survey.
HN Discussion: Long-time contributor @bfontaine praises Mike McQuaid’s 16+ years of sustained maintenance. Several users report switching to mise as an alternative that installs directly from GitHub releases without repackaging lag. One user who switched back from Nix cites better Mac support, more consistently maintained packages, and a simpler user experience as the deciding factors.
Removing ‘um’ from a recording is harder than it sounds
Summary: The author built erm, a local CLI that strips disfluencies (ums, uhs, erms) from speech recordings. The naive approach — transcribe with Whisper, identify fillers, cut — gets about 60% there but sounds worse than the original. Three specific problems: Whisper omits many fillers from transcripts, hard waveform splices create audible clicks, and background noise mismatches at edit boundaries produce faint shifts. Most of erm’s code is the engineering that fixes these issues.
HN Discussion: A commenter argues the approach is backwards: rather than detecting everything except fillers, the tool should sample known disfluencies and treat them as noise to silence with crossfades. Another questions the motivation, noting that ums and uhs serve as focusing signals in natural speech and the obsession with removing them (a la Toastmasters) may be misplaced.
Show HN: Boo – Screen-style terminal multiplexer built on libghostty
Summary: Boo is a GNU screen-style terminal multiplexer built on libghostty, the terminal rendering library from the Ghostty project. Developed by Coder (the company), it targets persistent terminal sessions with a screen-like interface. The project is described as young and not yet a drop-in replacement for GNU screen.
HN Discussion: Commenters ask what advantages Boo offers over screen or tmux, noting the README lacks a clear differentiator section. Mentions of competing libghostty-based multiplexers — zmx and Cmux — suggest a growing ecosystem. A noted limitation: Boo supports one window per session, while GNU screen supports multiple.
Faking keyword arguments to functions in C++
Summary: Jussi Pakkanen (creator of the Meson build system) demonstrates how C++20 designated initialisers enable Python-like keyword argument syntax at call sites. The technique passes a struct with default-valued fields, where the caller specifies only the fields they want to override. The author notes that adding true keyword arguments as a language feature would take 15–20 years of standards committee wrangling, so this structural workaround is the practical path.
HN Discussion: Consensus favours options structs once more than one optional argument exists, since overriding a default no longer forces specifying all preceding positional arguments. A commenter points out designated initialisers originated in C99 and were ported to C++20 — hardly a modern invention. Clarification that “keyword arguments” means different things depending on whether you come from Python, Swift, or T-SQL.
Babel-USB: USB drive with every file
Summary: Babel-USB is a hardware-software project that presents itself as a USB drive containing every possible file. Drawing on Borges’ Library of Babel — every permutation of 25 characters across 410 pages — the device maps virtual file paths to procedurally generated content, fetching and constructing files on demand when the host OS reads them.
HN Discussion: Commenters explain the Library of Babel concept for those unfamiliar, noting that while the vast majority of “books” are gibberish, the laws of probability guarantee that every meaningful text ever written (or that could be written) exists somewhere in the collection. A linked YouTube video walks through the technical implementation.
Web & Infrastructure
WikiLambda the Ultimate
Summary: A Wikipedia Signpost article examines WikiLambda and Abstract Wikipedia through the question of whether the project fights or inadvertently implements a “one ring to rule them all” solution for knowledge access. WikiLambda serves as Wikimedia’s function repository, enabling content generation in any language from structured data. The piece raises architectural concerns about centralising knowledge representation logic in a single system.
HN Discussion: A tongue-in-cheek Metal Gear Solid reference about “a language to surpass all languages.” A commenter links to Joel Spolsky’s classic essay on architecture astronauts as a cautionary parallel to over-ambitious unified knowledge systems.
System Administration
macOS 27 Beta breaks the ability to boot Asahi Linux
Summary: The macOS 27 Golden Gate beta changes the boot picker and Startup Disk handling, rendering Asahi Linux partitions invisible and unbootable. Data on the Linux partitions remains intact, but users need a secondary macOS 26 installation to access it. Asahi Linux has filed a bug report with Apple and is warning users to avoid the macOS 27 beta until the issue is resolved.
HN Discussion: Reports suggest the issue may already be fixed or will be in a subsequent beta. Most commenters lean toward this being a bug rather than an intentional move, given Apple’s prior support for alternate boot environments. A broader observation: both major ARM platforms (Apple Silicon and Qualcomm) remain hostile territory for Linux installations.
History & Science
Nobody ever gets credit for fixing problems that never happened (2001)
Summary: A 2001 MIT paper by Repenning and Sterman uses system-dynamics modelling to show why organisations systematically reward firefighting over prevention. The core mechanism: visible crises create heroes and justify budget increases, while departments that prevent problems look like they have nothing to do and get their funding cut. This creates a self-reinforcing cycle where failure is incentivised and quiet competence is penalised.
HN Discussion: Engineers draw parallels to teams where chaotic departments earn praise for heroic saves on problems they created themselves. Several note that elegant solutions look obvious in retrospect, so their creators get less credit than those who build overcomplicated fixes. The analogy to electricity utilities is apt: nobody notices reliable power until it goes out.
How we made hit video game Prince of Persia
Summary: Jordan Mechner’s oral history covers the development of Prince of Persia, from its Raiders-of-the-Lost-Ark-inspired design to the pioneering use of rotoscoped animation — Mechner filmed his brother running and jumping to create fluid character movement. He handled design, programming, and art largely solo over several years on the Apple II.
HN Discussion: The game’s 60-minute time limit created constant pressure and was unusual even for its era. Mechner’s published development journals (1985–1993) are recommended as essential reading. Nostalgic commenters recall the Apple IIc as the platform where most first encountered the game.
Emacs appearances in pop culture
Summary: Ian Y.E. Pan catalogues every known Emacs appearance in film, television, and literature. Highlights include The Social Network (2010), where Zuckerberg uses Emacs to write a Perl scraper in his Harvard dorm room, and Tron: Legacy. The post covers novels, TV shows, and B-movies with screenshots and timestamps, and the author promises to update it as new sightings emerge.
HN Discussion: A commenter spots that the Arctic Blast screenshot appears to be the Audacity audio editor with Emacs overlaid. References to Elif Batuman’s 2017 novel The Idiot, where a math student enthuses about Emacs to the protagonist. Discussion of how on-screen code in movies is often a random mix of languages or outright gibberish.
Developer gets Half-Life running at 30 FPS on a Nokia N95
Summary: A developer ported Half-Life to run at 30 FPS on a 2007 Nokia N95, demonstrating that the phone’s 332MHz ARM11 processor and 3D graphics acceleration could roughly match a 1998 desktop PC. The project involved significant optimisation to squeeze a game designed for Pentium II-class hardware onto a device that predated the modern smartphone era.
HN Discussion: A commenter discovered the Chinese grey market is rebuilding full new N95s from old motherboards and spare parts. Nostalgia for Symbian OS: the N95 had apps, a browser, a camera, and video recording before the iPhone existed. One commenter laments that modern pocket supercomputers feel slower despite vastly more powerful hardware.
Apple didn’t revolutionize power supplies; new transistors did (2012)
Summary: Ken Shirriff debunks Steve Jobs’ claim that Rod Holt’s Apple II switching power supply was revolutionary and widely copied. Investigation reveals the supply was neither the first switching design for computers nor the ancestor of modern PSUs, which use entirely different architectures. The real enabler was transistor technology improvements, not a particular circuit innovation by Holt.
HN Discussion: Commenters note the article risks underplaying Holt’s genuine accomplishment: an elegant, product-defining implementation of a known but difficult technique. The story is cited as a textbook example of Brandolini’s law — refuting nonsense requires an order of magnitude more effort than producing it. The post has been submitted to HN four times since 2012.
Discovery of Cold War-era rare Eastern Bloc computers in a German hangar
Summary: In 2006, a Computer History Museum curator received an email about a trove of rare Eastern Bloc computers abandoned in a warehouse in Castrop-Rauxel, Germany. The collection belonged to Professor Walter Ameling, who had assembled Soviet-era and Eastern European computing hardware in a private museum. The CHM mounted an expedition to the Ruhr region to assess and recover the machines.
HN Discussion: German commenters question why the collection went to a US museum rather than a German institution, and what the legal ownership arrangements were. Criticism of the article’s opening paragraph, which implies the computers survived WWII bombing despite most being from the 1950s onward. Some feel the original collector deserves more credit for assembling the trove.
Academic & Research
A jacket that harvests drinking water from the air
Summary: UT Austin researchers have developed a wearable jacket that passively harvests drinking water from atmospheric moisture using novel materials that capture and condense water vapour without external energy input. The garment is positioned as a potential solution for water-scarce environments, though technical details in the announcement are limited.
HN Discussion: Skepticism is pronounced: commenters note that nearly all passive water-from-air devices over the past decade have turned out to be non-functional or based on false claims. The question is whether this falls into fraud, honest misunderstanding, or genuine innovation. Comparisons to Dune-style stillsuits and simple tarp-based collection methods.
How a new DSL may survive in the era of LLMs
Summary: William Cotton argues that established programming languages benefit from a feedback loop: vast training corpora, mature tooling (type checkers, linters, language servers), and LLM-generated code all feed back into future training data. New domain-specific languages face a cold-start problem with higher hallucination rates. The survival playbook mirrors pre-LLM best practices: great documentation, marketing, tooling, and a smooth onboarding path.
HN Discussion: Simon Willison adds that providing large collections of searchable examples is critical, since coding agents excel at adapting existing code. A developer shares success building small teachable DSLs for MCP tool interfaces that outperform low-level APIs. The consensus leans toward DSLs and LLMs being complementary rather than competitive.
Making a vintage LLM from scratch
Summary: The author trained a 340M parameter LLM exclusively on old texts, inspired by a Reddit project training on 1800s London material. The build involved custom base-training and fine-tuning scripts, data pipelines, and curated datasets. The model is published on HuggingFace as vintage-LLM-340m-v1-base with open-source code on GitHub. The author candidly acknowledges the code was semi-vibe-coded with LLM assistance.
HN Discussion: A commenter expresses disappointment that the code was vibe-coded, arguing the manual journey is what makes such projects worthwhile — comparing it to the depth of understanding gained from Linux From Scratch. Others are curious to see the model’s output quality after instruct fine-tuning is completed.
Business & Industry
Show HN: FablePool – pool money behind a prompt, and Fable builds it in public
Summary: FablePool lets strangers pool money behind an ambitious prompt; an AI agent then builds the project milestone by milestone on a public ledger. An AI planner sets funding targets (minimum $100); backers contribute from $0.25. Active projects include an open-source search database, a user-owned AI memory protocol, and an open-source reimplementation of the 2004 game Fable. All completed code ships under MIT with a public credit ledger.
HN Discussion: The demo build regressed at milestone 15 by replacing a working Wikimedia image with a nonexistent file path. Questions about distinguishing genuine projects from token burn, and what happens when pools exhaust before completion. Licensing concerns: MIT still requires a copyright holder, and “we all own it” may not survive legal scrutiny.
Other
Ear Training Practice
Summary: TonedEar is a free web-based ear training tool covering intervals, chords, scales, chord progressions, perfect pitch, and melodic dictation. A functional ear training module teaches scale degrees relative to an established key centre. The site recommends daily short practice sessions and offers premium features plus native mobile apps.
HN Discussion: A music teacher recommends supplementing clinical exercises with melodic hooks from songs students already enjoy. A commenter with aphantasia shares they cannot memorise sounds but still compose decent piano music after three years. Praise for the site’s instant-load, no-nonsense design. A developer built a combined notation and ear trainer as a single HTML app, noting the domain is ideal for vibe-coding.
Reading for pleasure is sharply down among schoolkids, report shows
Summary: A US Department of Education report shows a sharp decline in pleasure reading among school-age children, cutting across demographics and income levels. The data backs widespread anecdotal observations from educators and parents that children are reading less frequently and at lower proficiency levels than previous generations.
HN Discussion: Parents report that cutting device usage to two one-hour windows per day dramatically improved children’s engagement with reading and activities. A teacher in a top Connecticut district observes current students “can’t really read” compared to 2000s graduates. A parent who reads 100 books a year was mocked by other parents at school events — for reading instead of scrolling. Educators blame education reform for eliminating sustained pleasure reading from curricula.
A greyscale iPhone setup that works in everyday life
Summary: Fabian Hemmert uses iOS Shortcuts automations to selectively enable colour for specific apps while keeping the rest of his iPhone in greyscale. Greyscale mode reduces the phone’s visual appeal and cuts screen time, but breaks usability for maps, photos, and other colour-dependent apps. The app-based automation toggles colour filters when entering and leaving specific apps, solving the constant manual switching problem.
HN Discussion: Recommendations for a “red mode” colour filter described as excellent for night reading with clear text and minimal eye strain. Many users prefer the simpler triple-click accessibility shortcut to toggle greyscale on demand. An e-ink phone user reports it naturally enforces focus since the friction of switching apps discourages casual scrolling.
Travel locally, where you are
Summary: Simon Späti argues that meaningful exploration is possible within kilometres of home. He describes spontaneous family outings in Switzerland with no fixed destination — picking a direction, following intuition, and consistently discovering hidden spots. A recent find was a forest art walk with suspended images and wooden figures in an unexpected clearing.
HN Discussion: A counterpoint from someone in a homogeneous area: driving an hour to see the same strip mall is not rewarding. A California resident notes most people never explore beyond SF/LA/San Diego despite the state’s enormous natural diversity. A reader discovered a used book room in their local library after a decade of living nearby. Criticism that over-reliance on platforms like Google Maps limits organic discovery.
StonkRider – Ride any stock chart
Summary: StonkRider is a web toy that turns any stock price chart into rideable terrain. Users input a ticker and the historical price data becomes a navigable road, combining financial data visualisation with a simple browser-based game mechanic.
HN Discussion: A commenter calls it “retro enjoyable and the closest thing to actually engaging with the economy.” Reception is sparse but positive, with the concept appreciated for its creative mashup of finance and play.