Microsoft’s AutoDev uses AI agents to write, test, and fix code autonomously, hitting 91.5% on HumanEval in Docker.