NewsWorld
PredictionsDigestsScorecardTimelinesArticles
NewsWorld
HomePredictionsDigestsScorecardTimelinesArticlesWorldTechnologyPoliticsBusiness
AI-powered predictive news aggregation© 2026 NewsWorld. All rights reserved.
Trending
AlsTrumpFebruaryMajorDane'sResearchElectionCandidateCampaignPartyStrikesNewsDigestSundayTimelineLaunchesPrivateGlobalCongressionalCrisisPoliticalEricBlueCredit
AlsTrumpFebruaryMajorDane'sResearchElectionCandidateCampaignPartyStrikesNewsDigestSundayTimelineLaunchesPrivateGlobalCongressionalCrisisPoliticalEricBlueCredit
All Articles
Chess engines do weird stuff
Hacker News
Published 5 days ago

Chess engines do weird stuff

Hacker News · Feb 17, 2026 · Collected from RSS

Summary

Related: https://cosmo.tardis.ac/files/2026-02-12-az-rl-and-spsa.html Comments URL: https://news.ycombinator.com/item?id=47049845 Points: 32 # Comments: 2

Full Article

Things LLM people can learn from Training method Since AlphaZero, lc0-style chess engines have been trained with RL. Specifically, you have the engine (search + model) play itself a bunch of times, and train the model to predict the outcome of the game. It turns out this isn't necessary. Good model vs bad model is ~200 elo, but search is ~1200 elo, so even a bad model + search is essentially an oracle to a good model without, and you can distill from bad model + search → good model. So RL was necessary in some sense only one time. Once a good model with search was trained, every future engine (including their competitors!)1 can distill from that, and doesn't have to generate games (expensive). lc0 trained their premier model, BT4, with distillation and it got worse when you put it in the RL loop. What makes distillation from search so powerful? People often compare this to distilling from best-of-n in RL, which I think is limited — a chess engine that runs the model on 50 positions is roughly equivalent to a model 30x larger, whereas LLM best-of-50 is generously worth a model 2x larger. Perhaps this was why people wanted test-time search to work so badly when RLVR was right under their noses. Training at runtime A recent technique is applying the distillation trick at runtime. At runtime, you evaluate early positions with your NN, then search them and get a more accurate picture. If your network says the position is +0.15 pawns better than search says, subtract 0.15 pawns from future evaluations. Your network live adapts to the position it's in! Training on winning The fundamental training objective of distilling from search is almost but not quite what we actually care about: winning. It's very correlated, but we don't actually care about how well the model estimates one position, we care about how well it performs after search, after looking at 100 positions. To fix this, lc0 uses a weird technique called SPSA: you randomly perturb the weights in two directions, play a bunch of games, and go the direction that wins more.2 This works very well and can get +50 elo on small models.3 Consider for a moment how insane it is that this works at all. You're modifying the weights in purely random directions. You have no gradient whatsoever. And yet it works quite well! +50 elo is ~1.5x model size or ~a year's worth of development effort! The main issue with this is that it's wildly expensive. To do a single step you must play thousands of games with dozens of moves and hundreds of position inferences per move. Like LLMs, you train for a long time on a pseudo-objective that's close to what you want, then a short time on a very expensive and limited objective that's closer to what you want. Tuning through C++ The underlying technique of SPSA can be applied to literally any number in your chess program. Modify the number, see if it wins more or loses more, move in the direction that wins more. You have a hand-tuned heuristic that if there's a checkmate in the search from a position you should back off by depth 1? Replace that with thousandths-of-a-depth and then tune it with SPSA — turns out the optimal value is actually to back off by depth 1.09, which nets you 5 elo. You can do this for every number in your search algorithm. You can do something that looks a lot like gradient descent through arbitrary C++ because you have a grading function (winning). Weird architecture lc0 uses a standard-ish transformer architecture, which they found to be hundreds of elo better than their old convolution-based models. It's still confusing to me that the transformer biases apply to literally every application imaginable. The only substantial architectural change they use is "smolgen", a system for generating attention biases. They claim smolgen is a ~1.2x throughput hit but an accuracy win equivalent to 2.5x model size. Why is it so good? I find all the explanations poor. Notes 1. Their primary competitor, Stockfish, uses their data and so do most engines. Some engines don't, mostly because they care about originality. ↩ 2. More specifically, they take a particular part of the weights, and then for every parameter you randomly choose +1 or −1; this is the direction tensor. You then create two versions of the network, one where weight += direction and one where weight -= direction. Then you play these two versions against each other. If positive wins, you do weight += direction * lr. If negative wins, you do weight -= direction * lr. ↩ 3. Up to about 15 elo on larger models. ↩


Share this story

Read Original at Hacker News

Related Articles

Hacker Newsabout 3 hours ago
Back to FreeBSD: Part 1

Article URL: https://hypha.pub/back-to-freebsd-part-1 Comments URL: https://news.ycombinator.com/item?id=47108989 Points: 4 # Comments: 0

Hacker Newsabout 4 hours ago
U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Article URL: https://ielp.worldtradelaw.net/2026/01/guest-post-president-trump-cannot-legally-impose-tariffs-using-section-122-of-the-trade-act-of-1974/ Comments URL: https://news.ycombinator.com/item?id=47108538 Points: 48 # Comments: 12

Hacker Newsabout 5 hours ago
Iranian Students Protest as Anger Grows

Article URL: https://www.wsj.com/world/middle-east/iranian-students-protest-as-anger-grows-89a6a44e Comments URL: https://news.ycombinator.com/item?id=47108256 Points: 17 # Comments: 1

Hacker Newsabout 7 hours ago
Japanese Woodblock Print Search

Article URL: https://ukiyo-e.org/ Comments URL: https://news.ycombinator.com/item?id=47107781 Points: 14 # Comments: 3

Hacker Newsabout 8 hours ago
Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive

Article URL: https://github.com/Leading-AI-IO/palantir-ontology-strategy Comments URL: https://news.ycombinator.com/item?id=47107512 Points: 37 # Comments: 21

Hacker Newsabout 9 hours ago
A Botnet Accidentally Destroyed I2P

Article URL: https://www.sambent.com/a-botnet-accidentally-destroyed-i2p-the-full-story/ Comments URL: https://news.ycombinator.com/item?id=47106985 Points: 32 # Comments: 12