Skip to main content
AI Socratic
  • Arbor (Renmin University and Microsoft Research): a research agent organized around a persistent hypothesis tree linking hypotheses, artifacts and evidence across sessions. It beat Codex and Claude Code on six real research tasks under the same budget and hit 86.36% Any Medal on MLE-Bench Lite. Apache 2.0, installs into Codex and Claude Code as a skill suite. Sources: paper, github
  • LoopMDM: selectively looping the early-middle layers of a masked diffusion LM matches same-size MDMs with up to 3.3x fewer training FLOPs, and the loop count doubles as an inference-time compute dial. Sources: paper
  • SkillOpt (Microsoft): train the skill file, not the weights. An optimizer model edits a single skill document from scored rollouts, lifting GPT-5.5 by up to +24.8 points inside Codex. Sources: paper, github
  • DRPO (Tencent/NUS/UIUC): replaces hard trust-region masks in LLM RL with a smooth advantage-weighted regularizer for more stable training. The post-R1 RLVR refinement stream continues. Sources: paper
React:

Comments

Sign in as a member to join the conversation.

Loading comments…

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.