NewsWorld
PredictionsDigestsScorecardTimelinesArticles
NewsWorld
HomePredictionsDigestsScorecardTimelinesArticlesWorldTechnologyPoliticsBusiness
AI-powered predictive news aggregation© 2026 NewsWorld. All rights reserved.
Trending
AlsNewsFebruaryMajorDane'sResearchElectionCandidateCampaignPartyStrikesDigestSundayTimelinePrivateCrisisPoliticalEricBlueCreditFundingRamadanAdditionalLaunches
AlsNewsFebruaryMajorDane'sResearchElectionCandidateCampaignPartyStrikesDigestSundayTimelinePrivateCrisisPoliticalEricBlueCreditFundingRamadanAdditionalLaunches
All Articles
Hacker News
Published 6 days ago

Why I don't think AGI is imminent

Hacker News · Feb 15, 2026 · Collected from RSS

Summary

Article URL: https://dlants.me/agi-not-imminent.html Comments URL: https://news.ycombinator.com/item?id=47028923 Points: 58 # Comments: 124

Full Article

February 14, 2026Issue 1: cognitive primitives, embodied cognitionWhat about world models?Benchmarking the gapAddendum: Gemini 3 Deep Think and inference-time compute added after initial publicationIssue 2: ArchitectureRevision: Chain of Thought changes this picture added after initial publicationThe Discourse ProblemResearch Labs and SecrecyWhat does this mean?The CEOs of OpenAI and Anthropic have both claimed that human-level AI is just around the corner — and at times, that it's already here. These claims have generated enormous public attention. There has been some technical scrutiny of these claims, but critiques rarely reach the public discourse. This piece is a sketch of my own thinking about the boundary of transformer-based large language models and human-level cognition. I have an MS degree in Machine Learning from over a decade ago, and I don't work in the field of AI currently, but I am well-read on the underlying research. If you know more than I do about these topics, please reach out and let me know, I would love to develop my thinking on this further.Issue 1: cognitive primitives, embodied cognitionResearch in evolutionary neuroscience has identified a set of cognitive primitives that are hardwired into vertebrate brains: some of these are a sense of number, object permanence, causality, spatial navigation, and the ability to distinguish animate from inanimate motion. These capacities are shared across vertebrates, from fish to ungulates to primates, pointing to a common evolutionary origin hundreds of millions of years old.Language evolved on top of these primitives — a tool for communication where both speaker and listener share the same cognitive foundation. Because both sides have always had these primitives, language takes them for granted and does not state them explicitly.Consider the sentence "Mary held a ball." To understand it, you need to know that Mary is an animate entity capable of intentional action, that the ball is a separate, bounded, inanimate object with continuous existence through time, that Mary is roughly human-sized and upright while the ball is small enough to fit in her hand, that her hand exerts an upward force counteracting gravity, that the ball cannot pass through her palm, that releasing her grip would cause the ball to fall, and that there is one Mary and one ball, each persisting as the same entity from moment to moment, each occupying a distinct region of three-dimensional space. All of that is what a human understands from four words, and none of it is in the text. Modern LLMs are now trying to reverse-engineer this cognitive foundation from language, which is an extremely difficult task.I find this to be useful framing for understanding many of the observed limitations of current LLM architectures. For example, transformer-based language models can't reliably do multi-digit arithmetic because they have no number sense, only statistical patterns over digit tokens. They can't generalize simple logical relationships — a model trained on "A is B" can't infer "B is A" — because they lack the compositional, symbolic machinery.One might object: modern AIs are now being trained on video, not just text. And it's true that video prediction can teach something like object permanence. If you want to predict the next frame, you need to model what happens when an object passes behind an occluder, which is something like a representation of persistence. But I think the reality is more nuanced. Consider a shell game: a marble is placed under one of three cups, and the cups are shuffled. A video prediction model might learn the statistical regularity that "when a cup is lifted, a marble is usually there." But actually tracking the marble through the shuffling requires something deeper — a commitment to the marble as a persistent entity with a continuous trajectory through space. That's not merely a visual pattern.The shortcomings of visual models align with this framing. Early GPT-based vision models failed at even basic spatial reasoning. Much of the recent progress has come from generating large swaths of synthetic training data. But even in this, we are trying to learn the physical and logical constraints of the real world from visual data. The results, predictably, are fragile. A model trained on synthetic shell game data could probably learn to track the marble. But I suspect that learning would not generalize to other situations and relations — it would be shell game tracking, not object permanence.Developmental psychologist Elizabeth Spelke's research on "core knowledge" has shown that infants — including blind infants — represent objects as bounded, cohesive, spatiotemporally continuous entities. This isn't a learned visual skill. It appears to be something deeper: a fundamental category of representation that the brain uses to organize all sensory input. Objects have identity. They persist. They can't teleport or merge. This "object-ness" likely predates vision itself — it's rooted in hundreds of millions of years of organisms needing to interact with things in the physical world, and I think this aspect of our evolutionary "training environment" is key to our robust cognitive primitives. Organisms don't merely observe reality to predict what happens next. They perceive in order to act, and they act in order to perceive. Object permanence allows you to track prey behind an obstacle. Number sense lets you estimate whether you're outnumbered. Logical composition enables tool construction and use. Spatial navigation helps you find your way home. Every cognitive primitive is directly linked to action in a rich, multisensory, physical world.As Rodney Brooks has pointed out, even human dexterity is a tight coupling of fine motor control and rich sensory feedback. Modern robots do not have nearly as rich of sensory information available to them. While LLMs have benefited from vast quantities of text, video, and audio available on the internet, we simply don't have large-scale datasets of rich, multisensory perception coupled to intentional action. Collecting or generating such data is extremely challenging.What about world models?What if we built simulated environments where AIs could gather embodied experience? Would we be able to create learning scenarios where agents could learn some of these cognitive primitives, and could that generalize to improve LLMs? There are a few papers that I found that poke in this direction.Google DeepMind's SIMA 2 is one. Despite the "embodied agent" branding, SIMA 2 is primarily trained through behavioral cloning: it watches human gameplay videos and learns to predict what actions they took. The reasoning and planning come from its base model (Gemini Flash-Lite), which was pretrained on internet text and images — not from embodied experience. There is an RL self-improvement stage where the agent does interact with environments, but this is secondary; the core intelligence is borrowed from language pretraining. SIMA 2 reaches near-human performance on many game tasks, but what it's really demonstrating is that a powerful language model can be taught to output keyboard actions.Can insights from world-model training actually transfer to and improve language understanding? DeepMind's researchers explicitly frame this as a trade off between two competing objectives: "embodied competence" (acting effectively in 3D worlds) and "general reasoning" (the language and math abilities from pretraining). They found that baseline Gemini models, despite being powerful language models, achieved only 3-7% success rates on embodied tasks — demonstrating that embodied competence is not something that emerges from language pretraining. After fine-tuning on gameplay data, SIMA 2 achieved near-human performance on embodied tasks while showing "only minor regression" on language and math benchmarks. But notice the framing: the best case is that embodied training doesn't hurt language ability too much. There's no evidence that it improves it. The two capabilities sit in separate regions of the model's parameter space, coexisting but not meaningfully interacting. LLMs have billions of parameters, and there is plenty of room in those weights to predict language and to model a physical world separately. Bridging that gap — using physical understanding to actually improve language reasoning — remains undemonstrated.DeepMind's Dreamer 4 also hints at this direction. Rather than borrowing intelligence from a language model, Dreamer 4 learns a world model from gameplay footage, then trains an RL agent within that world model through simulated rollouts where the agent takes actions, observes consequences provided by the world model, and updates its policy. This is genuinely closer to perception-action coupling: the agent learns through acting. However, the goal of this research is not general intelligence — it's sample-efficient control for robotics. The agent is trained and evaluated on predefined task milestones (get wood, craft pickaxe, find diamond), scored by a learned reward model. Nobody has tested whether the representations learned through this sort of training generalize to reasoning, language, or anything beyond the specific control tasks they were trained on. The gap between "an agent that learns to get diamonds in Minecraft through simulated practice" and "embodied experience that produces transferable cognitive primitives" is enormous and entirely unexplored.As far as I understand, we don't know how to:embed an agent in a perception-action coupled training environmentcreate an objective and training process that leads it to learn cognitive primitives like spatial reasoning or object permanenceleverage this to improve language models or move closer to general artificial intelligenceRecent benchmarking work underscores how far we are. Stanford's ENACT benchmark (2025) tested whether frontier vision-language models exhibit signs of embodied cognition — things


Share this story

Read Original at Hacker News

Related Articles

Hacker Newsabout 4 hours ago
Back to FreeBSD: Part 1

Article URL: https://hypha.pub/back-to-freebsd-part-1 Comments URL: https://news.ycombinator.com/item?id=47108989 Points: 4 # Comments: 0

Hacker Newsabout 6 hours ago
U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Article URL: https://ielp.worldtradelaw.net/2026/01/guest-post-president-trump-cannot-legally-impose-tariffs-using-section-122-of-the-trade-act-of-1974/ Comments URL: https://news.ycombinator.com/item?id=47108538 Points: 48 # Comments: 12

Hacker Newsabout 7 hours ago
Iranian Students Protest as Anger Grows

Article URL: https://www.wsj.com/world/middle-east/iranian-students-protest-as-anger-grows-89a6a44e Comments URL: https://news.ycombinator.com/item?id=47108256 Points: 17 # Comments: 1

Hacker Newsabout 8 hours ago
Japanese Woodblock Print Search

Article URL: https://ukiyo-e.org/ Comments URL: https://news.ycombinator.com/item?id=47107781 Points: 14 # Comments: 3

Hacker Newsabout 9 hours ago
Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive

Article URL: https://github.com/Leading-AI-IO/palantir-ontology-strategy Comments URL: https://news.ycombinator.com/item?id=47107512 Points: 37 # Comments: 21

Hacker Newsabout 10 hours ago
A Botnet Accidentally Destroyed I2P

Article URL: https://www.sambent.com/a-botnet-accidentally-destroyed-i2p-the-full-story/ Comments URL: https://news.ycombinator.com/item?id=47106985 Points: 32 # Comments: 12