More Research

June 13, 2026

Arbor (Renmin University and Microsoft Research): a research agent organized around a persistent hypothesis tree linking hypotheses, artifacts and evidence across sessions. It beat Codex and Claude Code on six real research tasks under the same budget and hit 86.36% Any Medal on MLE-Bench Lite. Apache 2.0, installs into Codex and Claude Code as a skill suite. Sources: paper, github
LoopMDM: selectively looping the early-middle layers of a masked diffusion LM matches same-size MDMs with up to 3.3x fewer training FLOPs, and the loop count doubles as an inference-time compute dial. Sources: paper
SkillOpt (Microsoft): train the skill file, not the weights. An optimizer model edits a single skill document from scored rollouts, lifting GPT-5.5 by up to +24.8 points inside Codex. Sources: paper, github
DRPO (Tencent/NUS/UIUC): replaces hard trust-region masks in LLM RL with a smooth advantage-weighted regularizer for more stable training. The post-R1 RLVR refinement stream continues. Sources: paper

React:

Comments

Loading comments…

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.