Current
AutoResearch
A minimal autonomous agent setup by Andrej Karpathy that runs overnight ML experiments by modifying, training, and evaluating code without human intervention.
Signal
AutoResearch demonstrates a tightly constrained autonomous research loop: an agent edits a single training file, runs fixed-duration experiments, evaluates on a consistent metric, and iterates — roughly 100 experiments overnight on one GPU.
Context
The design is deliberately minimal. Agents modify only train.py, every run is capped at five minutes of wall-clock time, and a single metric (validation bits-per-byte) provides a clean comparison baseline. Human direction is encoded in a program.md file, framing the human role as programming the research organization rather than individual experiments. This is an early but concrete demonstration of autonomous ML experimentation at a scale that previously required teams.
Relevance
For Openflows, this current surfaces a structural shift in how research labor is organized. The constraint architecture — single file, fixed budget, clear metric — is as interesting as the autonomy itself. It models how meaningful human-in-the-loop framing can be preserved even as execution is delegated.
Current State
Functional proof-of-concept. Minimal codebase (~300 lines of core training code). Requires a single NVIDIA GPU and Python 3.10+. Early community engagement.
Open Questions
- Which research tasks are well-suited to fixed-budget autonomous iteration, and which require more fluid human guidance?
- How should experiment provenance and agent-generated hypotheses be documented for reproducibility?
- What happens to scientific judgment when the volume of automated experiments exceeds human review capacity?
Connections
- Linked to
autonomous-research-accountabilityas its founding current and primary design reference. - Linked to
andrej-karpathyas the direct operator behind this project. - Linked to
local-inference-baselineas a downstream use of accessible local compute.
Updates
2026-03-15: The project has transitioned from early community engagement to widespread adoption, now holding 34.2k stars and 4.6k forks. Active development is evidenced by 80 open pull requests and 40 issues, indicating the project has evolved beyond its initial proof-of-concept phase.