Skip to main content
AI Socratic

Sakana AI & NVIDIA's ICML 2026 paper introduces TwELL — a new sparse format for LLM feedforward layers that achieves >95% unstructured sparsity (via ReLU + light L1) while staying fully compatible with fast GPU tiled matrix multiplies. Result: 20%+ faster inference/training, lower memory & energy use on billion-scale models, with open-source CUDA kernels. Minimal accuracy loss.

Screenshot.png

Source: tweet, blog, paper

React:

Comments

Sign in as a member to join the conversation.

Loading comments…

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.

Search

Search across events, members, and blog posts