Andrej Karpathy's Autoresearch

March 3, 2026Posted by Federico Ulfo

Optimizing a ML model for who's not familiar used to be a human research process of trial and error. Karpathy just released a repo that automate the research and test with parallel agents running 5 minute experiments.

It’s built on a stripped-down version of his earlier nanochat training core — a self-contained ~630-line Python file (train.py) that includes a full GPT model, Muon+AdamW optimizer, and training loop.

The setup is deliberately simple:

prepare.py handles fixed data prep, tokenization, and evaluation (don’t touch it).
The human only edits a high-level Markdown file (program.md) with research instructions or ideas.
An AI coding agent (Claude, etc.) takes over: it edits only train.py, runs a training experiment for exactly 5 minutes (fixed wall-clock budget), measures validation bits-per-byte (val_bpb — lower is better), and decides whether to keep the change.
Everything happens on a git feature branch. Improvements become commits; failures are discarded. The loop repeats indefinitely.

Auto

As Karpathy said it runs 100+ experiments while you sleep overnight. Karpathy ran ~650 over a weekend and confirmed the gains transferred to larger models, improving nanochat’s “time-to-GPT-2” leaderboard score.

Sources: tweet, Github