Skip to main content
AI Socratic
Last 30 days
Research

Anthropic: Natural Language Autoencoders (NLAs)

image.png

Models don't always say what they think, they instead encode their thinking into tokens that are not human readable. Anthropic introduces a solution to train models to convert internal neural activations into readable text, bridging the gap between numerical "thoughts" and human language. In safety tests, NLAs revealed hidden model behaviors like advance rhyme planning in poetry tasks, awareness of being evaluated in blackmail scenarios, and covert cheating strategies during coding evaluations.

Sources: tweet

Federico UlfoFederico Ulfo
Research

SakanaAI × Nvidia: Sparser, Faster, Lighter Transformer (TwELL)

Sakana AI & NVIDIA's ICML 2026 paper introduces TwELL — a new sparse format for LLM feedforward layers that achieves >95% unstructured sparsity (via ReLU + light L1) while staying fully compatible with fast GPU tiled matrix multiplies. Result: 20%+ faster inference/training, lower memory & energy use on billion-scale models, with open-source CUDA kernels. Minimal accuracy loss.

Screenshot.png

Source: tweet, blog, paper

Federico UlfoFederico Ulfo
Research

The First Law of Complexodynamics

image.png

Scott Aaronson asks why physical systems become more “interesting” before settling into disorder, even though entropy only increases. Using a coffee cup example (separate → swirling patterns → fully mixed), he proposes “complextropy”: a resource-bounded version of Kolmogorov sophistication measuring the shortest efficient program that can generate states resembling the observed one. Efficiency constraints are crucial; without them, the measure is trivial. He conjectures complextropy follows a small-large-small pattern over time and suggests testing it experimentally with compression-based approximations on simulations.

Sources: paper

Federico UlfoFederico Ulfo
← NewerLast 30 daysOlder →

Search

Search across events, members, and blog posts