The researchers argue that humans still limit AI improvement because both models and agent scaffolds require manual design and correction. They propose SIA, a self-improving loop where a Feedback-Agent updates both an agent’s harness and its model weights.
They test SIA on legal classification, GPU kernel optimization, and single-cell RNA denoising. Across all three, combining harness and weight updates beats scaffold-only improvement, with reported gains of 25.1% over prior SOTA on LawBench, 12.4% faster GPU kernels, and 20.4% over prior SOTA on denoising. The researchers conclude that harness updates improve how agents act and search, while weight updates build domain-specific intuition.

Sources: paper
Stay Updated
Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.
Comments
Sign in as a member to join the conversation.
Loading comments…