Hacker News

Clustered Story

Published 6 days ago

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster

Hacker News · Feb 17, 2026 · Collected from RSS

Summary

Andrej Karpathy showed us the GPT algorithm. I wanted to see the hardware limit. The Punchline: I made it go 4,600x faster in pure C code, no dependencies and using a compiler with SIMD auto-vectorisation!!! Andrej recently released microgpt.py - a brilliant, atomic look at the core of a GPT. As a low-latency developer, I couldn't resist seeing how fast it could go when you get closer to the metal. So just for funzies, I spent a few hours building microgpt-c, a zero-dependency and pure C99 implementation featuring: - 4,600x Faster training vs the Python reference (Tested on MacBook Pro M2 Max). On Windows, it is 2,300x faster. - SIMD Auto-vectorisation for high-speed matrix operations. - INT8 Quantisation (reducing weight storage by ~8x). Training is slightly slower, but the storage reduction is significant. - Zero Dependencies - just pure logic. The amalgamation image below is just for fun (and to show off the density!), but the GitHub repo contains the fully commented, structured code for anyone who wants to play with on-device AI. I have started to build something useful, like a simple C code static analyser - I will do a follow-up post. Everything else is just efficiency... but efficiency is where the magic happens Comments URL: https://news.ycombinator.com/item?id=47042014 Points: 31 # Comments: 1

Full Article

MicroGPT-C A zero-dependency, pure C99 implementation of a GPT-style character-level language model. The algorithm faithfully matches Andrej Karpathy's microgpt.py — same architecture, same training loop, same sampling — but compiles to native code with optional compiler-driven SIMD auto-vectorisation for dramatically faster training and inference. Train a GPT in 20 ms. Generate names in microseconds. No Python. No PyTorch. No GPU. What Is This? MicroGPT-C is a minimal, readable implementation of a GPT (Generative Pre-trained Transformer) — the same family of models behind ChatGPT, but stripped down to its essential algorithm. It trains a tiny character-level language model that learns to generate realistic human names from scratch. The goal is education and experimentation: understand how attention, backpropagation, and the Adam optimiser actually work at the lowest level, without any framework abstractions. Audience Value Students & educators Study attention, softmax, Adam, and backprop in readable C — no framework magic Embedded / edge engineers Entire model fits in < 50 KB RAM; runs on MCUs with no runtime dependencies Researchers Auditable baseline for quantisation, custom layers, or optimiser experiments Rapid prototypers Train → iterate in milliseconds; test tokenisers, vocabularies, data formats Quick Start # Linux / macOS chmod +x build.sh ./build.sh ./build/microgpt :: Windows build.bat build\Release\microgpt.exe The build automatically copies data/names.txt next to the executable. Performance Measured on the same workload (1,000 training steps, 20 inference samples) — C vs the reference Python: Metric Python C (fp64) Speedup Training time ~93 s 0.02 s ~4,600× Training throughput ~0.1 k tok/s ~289 k tok/s ~2,800× Steps/sec ~11 ~40,000 ~3,600× Inference time ~0.74 s < 1 ms ~700×+ Inference rate ~27 samples/s 20,000 samples/s ~740× Token throughput — 109,000 tok/s — INT8 quantised build: ~25% slower training than fp64 on this tiny model, but ~8× smaller weight storage — ideal for constrained devices. Architecture A single-layer, decoder-only Transformer following the GPT-2 design: Input → Token Embed + Pos Embed → RMSNorm → Self-Attention (4 heads, causal) → Residual → RMSNorm → MLP (fc1 → ReLU → fc2, 4× width) → Residual → Linear (lm_head) → Softmax → next-token probabilities Parameter Value Embedding dim 16 Attention heads 4 Layers 1 Context length 16 Total parameters ~4,600 Weight memory (fp64) ~37 KB Weight memory (INT8) ~4.6 KB Training memory ~144 KB Inference memory < 50 KB Training uses the Adam optimiser with linear learning-rate decay (configurable in microgpt.h). Build Options Build scripts (recommended) Platform Standard SIMD (faster) Linux/macOS ./build.sh ./build.sh --simd Windows build.bat build.bat simd SIMD auto-vectorisation The --simd flag enables compiler-driven auto-vectorisation of the core dot products, matrix multiplications, and normalisations. On x86-64 the compiler targets the best available instruction set (SSE4, AVX2, etc.) via -march=native; on MSVC it enables /arch:AVX2. This gives a measurable speed-up on larger models without any hand-written intrinsics — the compiler re-writes the scalar loops into SIMD instructions automatically. # Linux / macOS — auto-detect best ISA ./build.sh --simd # CMake directly cmake -DMICROGPT_SIMD=ON .. cmake --build . --config Release INT8 quantised build Weights are stored as 8-bit integers with per-matrix scales — the forward pass dequantises on the fly; Adam updates an fp64 master copy and requantises each step. This reduces weight storage by ~8× (37 KB → 4.6 KB) at a small accuracy/speed trade-off. Platform Standard SIMD Linux/macOS ./build_quantised.sh ./build_quantised.sh --simd Windows build_quantised.bat build_quantised.bat simd CMake directly mkdir build && cd build cmake .. cmake --build . --config Release # With INT8 quantisation cmake -DQUANTIZATION_INT8=ON .. # With SIMD auto-vectorisation cmake -DMICROGPT_SIMD=ON .. # Both cmake -DQUANTIZATION_INT8=ON -DMICROGPT_SIMD=ON .. Project Layout Path Description microgpt.h Model config, public API declarations microgpt.c Core engine: model, forward/backward, Adam, data loading main.c Entry point: load data → train → generate samples microgpt_amalgamated.c Single-file build — same algorithm, no header needed data/names.txt Training data (one name per line, ~32k names) CMakeLists.txt CMake build (C99, Release, optional SIMD / INT8) Single-File Build microgpt_amalgamated.c is a self-contained single file containing the full GPT algorithm — data loading, training, and inference. No header file needed: # Compile directly (no CMake required) cc -O2 -o microgpt microgpt_amalgamated.c -lm cp data/names.txt . && ./microgpt # Or via CMake cmake --build build --config Release --target microgpt_amalgamated ./build/microgpt_amalgamated Requirements C99 compiler (GCC, Clang, MSVC) CMake 3.10+ No other dependencies License MIT — see LICENSE and source file headers. Author: Ajay Soni (ajay.soni@enjector.com), Enjector Software Ltd.

Share this story

Read Original at Hacker News

Hacker News7 days ago

Show HN: Microgpt is a GPT you can visualize in the browser

very much inspired by karpathy's microgpt of the same name. it's (by default) a 4000 param GPT/LLM/NN that learns to generate names. this is sorta an educational tool in that you can visualize the activations as they pass through the network, and click on things to get an explanation of them. Comments URL: https://news.ycombinator.com/item?id=47026186 Points: 44 # Comments: 2

Hacker Newsabout 7 hours ago

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them

Article URL: https://quesma.com/blog/introducing-binaryaudit/ Comments URL: https://news.ycombinator.com/item?id=47111440 Points: 14 # Comments: 2

Hacker Newsabout 7 hours ago

Man accidentally gains control of 7k robot vacuums

Article URL: https://www.popsci.com/technology/robot-vacuum-army/ Comments URL: https://news.ycombinator.com/item?id=47111400 Points: 16 # Comments: 1

Hacker Newsabout 7 hours ago

Iran students stage first large anti-government protests since deadly crackdown

Article URL: https://www.bbc.com/news/articles/c5yj2kzkrj0o Comments URL: https://news.ycombinator.com/item?id=47111067 Points: 25 # Comments: 4

Hacker Newsabout 7 hours ago

I put New Zealand behind a $1 paywall

Article URL: https://rename.world/ Comments URL: https://news.ycombinator.com/item?id=47111042 Points: 22 # Comments: 4

Hacker Newsabout 8 hours ago

Volatility: The volatile memory forensic extraction framework

Article URL: https://github.com/volatilityfoundation/volatility3 Comments URL: https://news.ycombinator.com/item?id=47110781 Points: 3 # Comments: 0

All Articles

Hacker News

Clustered Story

Published 6 days ago

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster

Hacker News · Feb 17, 2026 · Collected from RSS

Summary

Full Article

Share this story

Read Original at Hacker News

Hacker News7 days ago

Show HN: Microgpt is a GPT you can visualize in the browser

Hacker Newsabout 7 hours ago

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them

Article URL: https://quesma.com/blog/introducing-binaryaudit/ Comments URL: https://news.ycombinator.com/item?id=47111440 Points: 14 # Comments: 2

Hacker Newsabout 7 hours ago

Man accidentally gains control of 7k robot vacuums

Article URL: https://www.popsci.com/technology/robot-vacuum-army/ Comments URL: https://news.ycombinator.com/item?id=47111400 Points: 16 # Comments: 1

Hacker Newsabout 7 hours ago

Iran students stage first large anti-government protests since deadly crackdown

Article URL: https://www.bbc.com/news/articles/c5yj2kzkrj0o Comments URL: https://news.ycombinator.com/item?id=47111067 Points: 25 # Comments: 4

Hacker Newsabout 7 hours ago

I put New Zealand behind a $1 paywall

Article URL: https://rename.world/ Comments URL: https://news.ycombinator.com/item?id=47111042 Points: 22 # Comments: 4

Hacker Newsabout 8 hours ago

Volatility: The volatile memory forensic extraction framework

Article URL: https://github.com/volatilityfoundation/volatility3 Comments URL: https://news.ycombinator.com/item?id=47110781 Points: 3 # Comments: 0

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster

Full Article

Related Articles

Show HN: Microgpt is a GPT you can visualize in the browser

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them

Man accidentally gains control of 7k robot vacuums

Iran students stage first large anti-government protests since deadly crackdown

I put New Zealand behind a $1 paywall

Volatility: The volatile memory forensic extraction framework

Show HN: Andrej Karpathy's microgpt.py to C99 microgpt.c – 4,600x faster

Full Article

Related Articles

Show HN: Microgpt is a GPT you can visualize in the browser

We hid backdoors in ~40MB binaries and asked AI + Ghidra to find them

Man accidentally gains control of 7k robot vacuums

Iran students stage first large anti-government protests since deadly crackdown

I put New Zealand behind a $1 paywall

Volatility: The volatile memory forensic extraction framework