Autonomous AI Coding Clears 60,000-Line Ceiling: MirrorCode Benchmark Released
MirrorCode Benchmark released by Epoch AI on June 26, 2026 tests how far autonomous AI coding can go without human supervision, using an evaluation that is verifiable end-to-end. The benchmark, which uses 25 compiled programs and documentation but no source code, internet, or guidance during the run, asks models to write new source code that reproduces behavior exactly. Claude Opus 4.7 reportedly reimplemented gotree, a roughly 16,000-line Go toolkit with more than 40 commands, in 14 hours at a cost of $251; Epoch AI estimates a human engineer would need two to 17 weeks. In a result described as outside the April preliminary release, Opus 4.7 also reimplemented pkl, about 60,000 lines of code, noted as the largest autonomous public reimplementation to date. The full release expands on the April 10 findings by publishing frontier-model comparisons including OpenAI’s GPT-5.5 and Google’s Gemini 3.1 Pro Preview and an open-source scaffold covering 22 of 25 target programs.







