Testing Super Mario Using a Behavior Model Autonomously

Hacker News

Published 1 day ago

Testing Super Mario Using a Behavior Model Autonomously

Hacker News · Feb 20, 2026 · Collected from RSS

Summary

Article URL: https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part1/ Comments URL: https://news.ycombinator.com/item?id=47092348 Points: 11 # Comments: 3

Full Article

Autonomous testing is one of the most powerful approaches for exploring vast state spaces in complex systems. Rather than manually writing test cases for every scenario, autonomous systems can systematically explore millions of states, discovering edge cases that human testers would never think to check. In this two-part follow-up, we’ll continue the Super Mario Bros. testing series by implementing the autonomous testing approach presented by Antithesis, where they autonomously play and beat the game. Later in Part 2, we’ll plug in the behavior model we developed in Part 1 and Part 2 of our first series to validate correctness in real-time during autonomous exploration. In the process, we demonstrate the true power of autonomous testing and behavior models: systematically exploring massive state spaces while writing your validation once, then plugging it into any testing driver—including our very own autonomous system that discovers new paths while validating behavior at every step. Here’s what we’re aiming for—implement our own autonomous state space exploration that will allow Mario to complete levels of Super Mario Bros. without any human guidance: Your browser does not support the video tag. Autonomous exploration: Completing Level 1 The complete implementation is open source and available in our Examples/SuperMario repository. Clone the code and let’s dive into how autonomous state space exploration works! 1git clone --branch v2.0 --single-branch https://github.com/testflows/Examples.git && cd Examples/SuperMario The proposed approachAntithesis’s article presents a surprisingly simple yet powerful algorithm for autonomous exploration of Super Mario. At its core is a mutation-based input generator that randomly flips input bits to create variations: 1234567891011121314import marioimport randomdef generate_input(starting_byte, flip_probability, input_length): input = [] next_byte = starting_byte for _ in range(input_length): for j in range(8): if random.random() < flip_probability: next_byte ^= (1 << j) input.append(next_byte) return input This random mutation approach treats game inputs (right, left, jump, action, down, enter) as bytes, where each bit represents whether a given key is pressed (1) or not (0). The algorithm randomly flips individual bits with a small probability (typically around 10%) by XORing the current input with a random mask—flipping bits where the mask is 1 while preserving bits where the mask is 0. By flipping bits, the algorithm generates variations of input sequences, exploring different paths through the game. Input: Random input generation The reason behind choosing this input generation algorithm is that it better mimics how the game is meant to be played: the currently pressed key is likely to remain pressed in the next frame while another key can be added at the same time. For example, you hold down the right key while also pressing the jump or action buttons. However, the random input generation is not enough. The reason is that Mario moving randomly will inevitably cause it to die by running into enemies or falling into pits. Therefore, the exploration itself can never be one-shot. Instead, you have to store traveled paths (input sequences) and have a strategy to pick a sequence for the next iteration. These travel paths effectively define Mario’s state because the game is deterministic. The system is said to be deterministic only if given the same input you will always get the same output. Therefore, starting from the same position and applying the same input sequence will always lead to the same Mario position in the game. This means when we pick a traveled path, we can replay it and then try to continue it with new mutations. The path selection requires a fitness function. For Super Mario, a simple criterion is to favor paths with the highest x-axis position, since winning the game requires advancing to the right. However, always picking the path with the highest fitness score doesn’t work—there will be many cases where the path ends in a state from which no further exploration is possible. For example, right before touching a Goomba, or being in the air right before falling into a pit. Such states are not recoverable and lead to dead ends. To overcome this problem, it’s not enough to keep just the best path we’ve found so far. Instead, we need to maintain a collection of paths with different fitness scores and use a probability distribution function to pick the next path to explore. This way, we’re more likely to pick paths with higher scores while still giving paths with lower scores a chance to be explored. Paths: Selecting path The beauty of this state space exploration approach lies in its simplicity. You don’t need to understand the game’s mechanics or hand-craft complex strategies. The mutation process naturally discovers interesting behaviors through random exploration, guided by fitness scoring that rewards progress through the game world. Characteristics of the proposed approachLet’s step back and examine how the proposed exploration system works: We generate random input by flipping bits with small probability (typically ~10%), producing a sequence of button presses We build a path by recording the input sequence along with a score quantifying how far Mario progresses (for Super Mario, based on x-axis position) We store these paths in a collection, maintaining a population of different trajectories through the game We select a path using a probability distribution function that favors higher-scoring paths while still giving lower-scoring paths a chance Determinism enables resuming in exactly the same state: because the game always produces identical results for the same input sequence, we can replay any stored path to reach that specific game state, then continue exploring from there with new mutations This cycle of select→replay→mutate→evaluate repeats continuously, systematically exploring the state space by building on previously discovered paths Comparing the approach to Genetic AlgorithmLet’s see how these characteristics map to a canonical Genetic Algorithm: Our System GA Concept Collection of stored paths Population of individuals Input sequence (button presses) Genotype encoding behavior Game state (Mario position, score) Phenotype (observable result) Progress scoring function Fitness function Path probability distribution selection function Selection with elite bias Bit-flip input generation Mutation operator — Crossover (recombination) Each exploration iteration Generation cycle Applying the broad understanding of these concepts, without nit-picking, the proposed approach is essentially a Genetic Algorithm—maintaining a population, scoring fitness, selecting promising candidates, and mutating them to explore variations. However, a skeptic might raise two concerns: the absence of crossover, and whether an input sequence truly qualifies as a genotype. Genotype mappingLet’s address the genotype question first. In traditional GAs, genotypes often encode multiple behavioral strategies that apply broadly: “jump more often,” “play aggressively,” “avoid edges.” In our system, the input sequence encodes something more specific: the ability to reach a particular state. But this is a form of behavioral encoding! Our input sequences are genes that enable reaching specific game states—a valid specialization of the general multi-gene GA framework where each path represents a complete behavioral strategy for reaching a specific position. Absence of crossoverAs for the absence of crossover, it’s important to understand what crossover actually is: an evolutionary optimization technique that progresses the population by recombining currently present genetic material. Crossover combines beneficial traits from different individuals to potentially create better offspring without requiring new mutations. However, crossover is not strictly required—it’s an evolutionary optimization, not a fundamental requirement. Mutation alone can effectively explore the state space, particularly when paths build incrementally as in our case. Therefore, the proposed approach remains a valid GA, just one that currently relies solely on mutation for variation rather than using both mutation and crossover for evolution. Why this maps to Genetic Algorithms?The fact that this state space exploration technique maps perfectly to a Genetic Algorithm is not a coincidence—it reveals something fundamental about both testing and evolution. When exploring complex state spaces, you need: A way to maintain progress (population of paths) A way to focus on promising areas (selection by fitness) A way to discover new possibilities (mutation) A way to reach specific states (genes as enablers) This is exactly what biological evolution does. Genes aren’t just instructions—they’re traits that enable organisms to reach and survive in environmental states not yet mastered by the population. Our input sequences serve the same role: enabling Mario to reach game states not yet explored. One might question whether GA truly applies to deterministic systems where the same inputs always produce the same results. However, determinism actually makes GAs more powerful by providing perfect reproducibility. We can define non-determinism as simply not having control over all inputs—apparent randomness is often just hidden state. Deterministic systems like Super Mario make this explicit, giving us perfect reproducibility for controlled evolutionary experiments. Recognizing the approach as a mutation-based Genetic Algorithm unlocks decades of evolutionary computation research and opens a wide range of possibilities. Concrete implementation of autonomous explorationUnfortunately, the original article did not present a concrete implementation. Since the approach is essentially a Genetic Algorithm, there are countless variations to explore in how we generate inputs, select paths, score progress, and manage stored paths. After som

Share this story

Read Original at Hacker News

Hacker Newsabout 3 hours ago

Back to FreeBSD: Part 1

Article URL: https://hypha.pub/back-to-freebsd-part-1 Comments URL: https://news.ycombinator.com/item?id=47108989 Points: 4 # Comments: 0

Hacker Newsabout 4 hours ago

U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Article URL: https://ielp.worldtradelaw.net/2026/01/guest-post-president-trump-cannot-legally-impose-tariffs-using-section-122-of-the-trade-act-of-1974/ Comments URL: https://news.ycombinator.com/item?id=47108538 Points: 48 # Comments: 12

Hacker Newsabout 5 hours ago

Iranian Students Protest as Anger Grows

Article URL: https://www.wsj.com/world/middle-east/iranian-students-protest-as-anger-grows-89a6a44e Comments URL: https://news.ycombinator.com/item?id=47108256 Points: 17 # Comments: 1

Hacker Newsabout 7 hours ago

Japanese Woodblock Print Search

Article URL: https://ukiyo-e.org/ Comments URL: https://news.ycombinator.com/item?id=47107781 Points: 14 # Comments: 3

Hacker Newsabout 8 hours ago

Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive

Article URL: https://github.com/Leading-AI-IO/palantir-ontology-strategy Comments URL: https://news.ycombinator.com/item?id=47107512 Points: 37 # Comments: 21

Hacker Newsabout 9 hours ago

A Botnet Accidentally Destroyed I2P

Article URL: https://www.sambent.com/a-botnet-accidentally-destroyed-i2p-the-full-story/ Comments URL: https://news.ycombinator.com/item?id=47106985 Points: 32 # Comments: 12

All Articles

Hacker News

Published 1 day ago

Testing Super Mario Using a Behavior Model Autonomously

Hacker News · Feb 20, 2026 · Collected from RSS

Summary

Article URL: https://testflows.com/blog/testing-super-mario-using-a-behavior-model-autonomously-part1/ Comments URL: https://news.ycombinator.com/item?id=47092348 Points: 11 # Comments: 3

Full Article

Share this story

Read Original at Hacker News

Hacker Newsabout 3 hours ago

Back to FreeBSD: Part 1

Article URL: https://hypha.pub/back-to-freebsd-part-1 Comments URL: https://news.ycombinator.com/item?id=47108989 Points: 4 # Comments: 0

Hacker Newsabout 4 hours ago

U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Hacker Newsabout 5 hours ago

Iranian Students Protest as Anger Grows

Article URL: https://www.wsj.com/world/middle-east/iranian-students-protest-as-anger-grows-89a6a44e Comments URL: https://news.ycombinator.com/item?id=47108256 Points: 17 # Comments: 1

Hacker Newsabout 7 hours ago

Japanese Woodblock Print Search

Article URL: https://ukiyo-e.org/ Comments URL: https://news.ycombinator.com/item?id=47107781 Points: 14 # Comments: 3

Hacker Newsabout 8 hours ago

Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive

Article URL: https://github.com/Leading-AI-IO/palantir-ontology-strategy Comments URL: https://news.ycombinator.com/item?id=47107512 Points: 37 # Comments: 21

Hacker Newsabout 9 hours ago

A Botnet Accidentally Destroyed I2P

Article URL: https://www.sambent.com/a-botnet-accidentally-destroyed-i2p-the-full-story/ Comments URL: https://news.ycombinator.com/item?id=47106985 Points: 32 # Comments: 12

Testing Super Mario Using a Behavior Model Autonomously

Full Article

Related Articles

Back to FreeBSD: Part 1

U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Iranian Students Protest as Anger Grows

Japanese Woodblock Print Search

Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive

A Botnet Accidentally Destroyed I2P

Testing Super Mario Using a Behavior Model Autonomously

Full Article

Related Articles

Back to FreeBSD: Part 1

U.S. Cannot Legally Impose Tariffs Using Section 122 of the Trade Act of 1974

Iranian Students Protest as Anger Grows

Japanese Woodblock Print Search

Palantir's secret weapon isn't AI – it's Ontology. An open-source deep dive

A Botnet Accidentally Destroyed I2P