Updates — Voices from the AI Socratic Community

March 2026

Mar 30, 2026Hardware

Nvidia GTC 2026 — The Year of Physical AI

This year, the focus was on physical AI, and in my experience something on the line of: 30% Data Centers, 30% GPUs and Hardware, 30% Robotics, 10% Software.

Highlights from the conference

$1 Trillion AI Infrastructure Boom, Jensen projects $1 trillion in cumulative orders for Blackwell + Vera Rubin systems through 2027 (2x 2026 estimates).
Vera Rubin Platform Unveiled, next-gen full-stack AI platform features seven new chips, five rack-scale systems, a new Vera CPU for Agentic AI, and BlueField-4 storage. It promises major efficiency gains and starts shipping later in 2026, with even denser designs (like Kyber) coming in 2027.
NemoClaw, everyone and their grandma are launching a personal AI agent, so now is NVIDIA turn.
Inference Inflection & Token Economics, a massive leap in token generation performance (up to 350x in some tiers) position inference as the new economic engine of AI, with "tokens" becoming the core commodity.
Physical AI, Gaming & Ambitious Vision, advances in robotics (e.g., Disney's Olaf), DLSS 5 for gaming, and bold plans for space-based AI data centers (Vera Rubin Space-1). Emphasis on full AI factories, open models (Nemotron ecosystem), and simulating infrastructure with Omniverse.

Disney's Olaf

Federico Ulfo

Comment

Mar 30, 2026Hardware

Karpathy's Lab Receives First DGX Station GB300

🙌 Andrej Karpathy’s lab has received the first DGX Station GB300 -- a Dell Pro Max with GB300.

As you may have notice Karpathy has been shipping no stop lately, Nanochat, AutoResearch, AgentHub (github for agents). So Jensen gifted him a new machine.

Sources: tweet

DGX Station GB300

Federico Ulfo

Comment

Mar 30, 2026Macro & Geopolitics

Are We in a Bubble? Morgan Stanley Graphic

This is a hell of a graphic from Morgan Stanley Bubble

Federico Ulfo

Comment

Mar 30, 2026Vibe Coding

Anthropic Ships New Claude Code Features

Claude Code New Features 🚀

Anthropic just shipped a tons of new features for Claude Code, it's clear is becoming an OpenClaw competitor:

Auto mode, many of you may have use claude --dangerously-skip-permissions, I do! --auto-mode replaces that by having subagents checking before confirming changes.
Computer Use, allows Claude to control your desktop apps and browser.
Scheduled Cloud Tasks, automate recurring workflows in the background.
Channels, enables controlling Claude Code sessions via Telegram and Discord.
'Auto-dream', memory consolidation feature that runs a subagent to compact the context. Though with Opus 4.6 1M Context Window your coding agent barely ever needs to /compact the context.

Federico Ulfo

Comment

Mar 30, 2026Models

Anthropic Reduces Claude Rate Limits During Peak Hours

Rate Limits Reduced 😢

rate hours

"To manage growing demand for Claude we're adjusting our 5 hour session limits for free/Pro/Max subs during peak hours. Your weekly limits remain unchanged. During weekdays between 5am–11am PT / 1pm–7pm GMT, you'll move through your 5-hour session limits faster than before."

Sources: tweet

Federico Ulfo

Comment

Mar 30, 2026Policy

Dario Wins Court Case Against DoD

Dario Wins The Court Case Against DoD

A federal judge ruled that the Pentagon designated Anthropic as a supply chain risk as retaliation for the company publicly criticizing the Pentagon's position, calling it classic illegal First Amendment retaliation. This is a significant legal precedent for AI companies' right to publicly engage in policy debates.

Sources: tweet.

Dario GigaChad

Federico Ulfo

Comment

Mar 30, 2026Models

Anthropic Models Mythos and Capybara Leaked

New Models Leaked: Mythos and Capybara.

Anthropic accidentally exposed internal assets due to a CMS misconfiguration, revealing development of Claude Mythos and Capybara models. Cybersecurity stock crashes right after.

Claude Mythos

Federico Ulfo

Comment

Mar 30, 2026Research

Google TurboQuant: 6x KV-Cache Compression with Zero Accuracy Loss

TurboQuant

Google releases TurboQuant, a compression algorithm that reduces LLM key-value cache memory by at least 6x and delivers up to 8x speedup with zero accuracy loss. The technique combines online vector quantization ideas from PolarQuant and earlier work. Community members have already implemented it for vLLM, fitting 4M+ KV-cache tokens on small devices, calling it the biggest open inference breakthrough of 2026.

Sources: google blog, tweet, Simple Explainer

Federico Ulfo

Comment

Mar 30, 2026Research

LLM Architecture Gallery

⭐️ Bookmark this: https://sebastianraschka.com/llm-architecture-gallery or get yourself a poster.

Federico Ulfo

Comment

Mar 30, 2026Research

Meta FAIR Releases TRIBE v2: Brain Response Foundation Model

Meta FAIR Releases TRIBE v2: Foundation Model That Predicts Human Brain Responses

Meta FAIR introduces TRIBE v2 (Trimodal Brain Encoder), a foundation model trained on 500+ hours of fMRI recordings from 700+ people to predict how the human brain responds to sights and sounds. The paper suggests a paradigm shift in neuroscience toward unified predictive foundation models of brain and cognitive functions, achieving 70x higher resolution than previous approaches.

TRIBE v2

Sources: Meta, tweet

Federico Ulfo

Comment

Mar 30, 2026Research

LeCun's Team Releases LeWorldModel: End-to-End JEPA from Pixels

Yann LeCun's Team Releases LeWorldModel: Stable End-to-End JEPA from Pixels

LeCun's team releases LeWorldModel, solving a key bottleneck of Joint-Embedding Predictive Architectures (JEPA) by making them trainable end-to-end from pixels. This advances the world model paradigm that many see as a critical shift beyond autoregressive language models.

LeWorldModel

Sources: tweet

Federico Ulfo

Comment

Mar 30, 2026Research

Kimi: Attention Residuals

A more efficient way to reuse past information across layers without slowing models down.

Attention Residuals

Sources: tweet

Federico Ulfo

Comment

Mar 30, 2026Research

TinyLoRA: Fine-Tuning 8B Models by Tweaking Just 13 Parameters

TinyLoRA: Fine-Tuning 8B Parameter Models by Tweaking Just 13 Parameters, Researchers from Meta, Cornell, and CMU introduce TinyLoRA, scaling LoRA down to as few as 1 parameter. They turned an 8B parameter model into a math and reasoning powerhouse by fine-tuning just 13 parameters (26 bytes), demonstrating extreme parameter efficiency for model adaptation.

Federico Ulfo

Comment

Mar 30, 2026Research

Exclusive Self-Attention (XSA): Two-Line Change Improving Transformers

Exclusive Self-Attention (XSA): Two-Line Change Improving Transformers Already Adopted in Practice, Exclusive Self-Attention (XSA) proposes a tiny two-line code change that stops attention from attending to itself, forcing focus on the rest of the sequence. It has already become a standard component in leading solutions for OpenAI's parameter golf challenge, demonstrating rapid real-world adoption.

Federico Ulfo

Comment

Mar 30, 2026Research

Anthropic Economic Index: How Claude Usage Evolves with Experience

Anthropic Economic Index: How Claude Usage Evolves with Experience, Anthropic's Economic Index reveals that longer-term Claude users iterate more carefully, are less likely to hand over full autonomy, attempt higher-value tasks, and receive more successful responses. This provides empirical insight into how human-AI collaboration patterns mature over time.

Federico Ulfo

Comment

Mar 30, 2026Research

GradMem: Writing Context into LLM Memory via Test-Time Gradient Descent

GradMem: Writing Context into LLM Memory via Test-Time Gradient Descent, GradMem introduces writing context into memory using test-time gradient descent rather than forward-pass encoding. By optimizing memory tokens with a reconstruction loss, a frozen model can compress long contexts into small memory without the lossy limitations of existing approaches.

Federico Ulfo

Comment

Mar 30, 2026Research

100M Token Context Without Collapse on 2×A800 GPUs

100M Token Context Without Collapse: <9% Degradation on 2×A800 GPUs, New research achieves 100M token context windows with less than 9% degradation from 16K, beating RAG + rerank + SOTA pipelines while running on just 2×A800 GPUs. This could fundamentally change how long-context applications are built.

Federico Ulfo

Comment

Mar 30, 2026Research

LLM Internals: By Layer 10, Models Are Language-Agnostic

LLM Internals: By Layer 10, Models Don't Know What Language They're Reading, A new blog post reveals that when feeding the same sentence in English and Chinese to an LLM, by layer 10 the model's internal representations become language-agnostic — it's "just thinking." This provides fascinating insight into how LLMs develop universal conceptual representations.

Federico Ulfo

Comment

Mar 30, 2026Research

LLM Fused with Mini Computer: Switching Between Text and Machine Code

LLM Fused with Mini Computer: Switching Between Text and Machine Code in Single GPU, A developer demonstrates an LLM brain fused with a mini computer that can switch between generating text and generating/executing machine code, all running in a single GPU and torch graph. This represents a step toward unified compute-and-language models.

Federico Ulfo

Comment

Mar 30, 2026Research

Columbia Exposes Flaws in Private AI Inference: 280GB per Query

Columbia University Exposes Flaws in Private AI Inference: Prior Methods Used 280GB per Query, Columbia University researchers prove that the entire private AI inference industry built the wrong approach, with prior methods requiring 280GB per query and 60-second latency for full transformer encryption. Their work points to fundamentally more efficient architectures for privacy-preserving inference.

Matrix

A system of the agents by the agents for the agents. But the agents are ret...

Federico Ulfo

Comment

Mar 30, 2026Random

LiteLLM PyPI Supply Chain Attack Exfiltrates Credentials

ScreenLess Phone

LiteLLM's PyPI release 1.82.8 was compromised in a major supply chain attack. A simple pip install litellm could exfiltrate SSH keys, AWS/GCP/Azure credentials, Kubernetes configs, API keys, crypto wallets, and more. The package was audited by Delve, a firm criticized for rubber-stamping security audits, highlighting systemic risks in the AI tooling supply chain.

Sources: tweet

Federico Ulfo

Comment

Mar 30, 2026Research

ARC-AGI-3 Announced: Humans Score 100%, AI < 1%

This is so far the only unsaturated agentic intelligence benchmark. Unlike benchmarks that test what models already know, ARC-AGI-3 tests how they learn and acquire new skills, providing a formal measure of the gap between human and AI skill acquisition efficiency.

Sources: tweet

Team meeting in 2026

Federico Ulfo

Comment

Mar 30, 2026Models

Apple Opening Up Siri to Other Models

Apple Opening Up Siri To Other

Federico Ulfo

Comment

Mar 30, 2026Research

Quantization Explained

Federico Ulfo

Comment

Mar 30, 2026Models

Gemini Embedding 2: Natively Multimodal Embedding Model

Gemini Embedding 2

Gemini Embedding 2 is our first natively multimodal embedding model that maps text, images, video, audio and documents into a single embedding space, enabling multimodal retrieval and classification across different types of media — and it’s available now in public preview. Sources: tweet

Federico Ulfo

Comment

Mar 30, 2026Fun

The State of AI Safety in 4 Fake Graphs

Federico Ulfo

Comment

Mar 30, 2026Macro & Geopolitics

China Has 339 GW of Wind and Solar Under Construction

China currently has 339 gigawatts of wind and solar capacity under construction

clean energy

Federico Ulfo

Comment

Mar 3, 2026Models

OpenAI GPT-5.4 (xhigh) Released

Screenshot 2026-03-06 at 11.30.16 PM.png

Offers roughly the same benchmark performance as Gemini 3.1 Pro, but for ~25% $USD/M tokens. Sources: Artificial Analysis

Federico Ulfo

Comment

Mar 3, 2026Models

Google Gemini 3.1 Flash-Lite

This is the fastest lightweight model. Google has been releasing the Flash model shortly after releasing the Pro models, Jeff Dean in the Latent Space Pod confirmed that the flash models are a distillation of the Pro models. Flash 2.0 and 2.5 were the SOTA for PDF extraction, great at OCR, and summary operation due to the decent quality with the lowest cost. Gemini 3.1 flash-lite Sources: Blog post, tweet, tweet arena

Federico Ulfo

Comment

Mar 3, 2026Models

Alibaba Qwen 3.5 Small Model Series

Introducing Qwen 3.5 Small Model Series: Qwen3.5-0.8B · Qwen3.5-2B · Qwen3.5-4B · Qwen3.5-9B.

These small models are built on the same Qwen3.5 foundation — native multimodal, improved architecture, scaled RL:

0.8B / 2B → tiny, fast, great for edge device
4B → a surprisingly strong multimodal base for lightweight agents
9B → compact, but already closing the gap with much larger models And yes — they're also releasing the Base models as well.

A day after the release, their main lead researcher Junyang Lin and 3 other researchers, unexpectedly stepped down. We suspect Alibaba will go into the closed model game.

Source: tweet, tweet, tweet.

Federico Ulfo

Comment

Mar 3, 2026Models

xAI Grok 4.20 with Parallel Agents

xAI new version of Grok runs 4 Grok4 agents in parallel. The result is not too bad. xAI added a new SuperGrok Heavy tier that runs 16 agents. While Grok is still far from OpenAI and Anthropic level, it's improving quite a bit, and it remains by far the best model for searching Tweets and for low guardrails:

Federico Ulfo

Comment

Mar 3, 2026Models

StepFun's Step 3.5 Flash

Sparse MoE model with 196B total params, but only 11B activated per token, this model was designed to fit into 128 GB memory (i.e. it can run on DGX spark or other local setups). It is one of the first large-scale MoE models trained using the Muon optimizer and made several adaptations to improve training stability at this scale. It's fast, small, and smart ish. It works well for simple openclaw tasks and is free/very cheap on OpenRouter. Sources: Artificial Analysis

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

Anthropic Draws a Red Line with the Pentagon

One of the most consequential AI policy fights of the year erupted between Anthropic and the U.S. Department of War. The standoff triggered a broader divide across the AI industry. Anthropic framed the decision as a stand for responsible deployment, arguing that current AI systems are not safe enough for autonomous warfare or large-scale surveillance.

In a public statement, Dario Amodei emphasized support for using AI to defend democracies but drew firm red lines against mass domestic surveillance of U.S. persons—which could undermine civil liberties through unprecedented data aggregation—and fully autonomous weapons, due to reliability concerns and risks to warfighters and civilians. Anthropic rejected demands for "any lawful use" without safeguards, viewing threats to label the company a supply chain risk (typically reserved for adversaries) as contradictory and coercive

Claude's popularity surged amid the controversy, with reports of continued military use in operations planning despite the ban. Anthropic committed to a smooth transition if offboarded, while advocating for ethical boundaries in national security AI.

Timeline whops

Sources: Dario Amodei Statement, Statement on the comments from DoW, AIthinkerlab blog post, last tweet

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

OpenAI, xAI, and Google Sign DoW Defense Agreements

Sama first seconded Dario, the next day signed a DoW agreement, pledging similar safeguards but agreeing to provide models for defense use under a $120M contract. That decision wasn't well received by the public and several OpenAI employees. So Sam had to do some damage control with his communication. xAI reportedly lifted restrictions to secure classified work — this comes with no surprises or consequences, since Elon Musk is already fighting public approval. Google has a similar situation, since they've been working with the Pentagon in the US and overseas, no changes in their alignment.

Sources: Sam comments

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

AWS UAE Data Center Bombed

In the early hours of March 1, 2026, amid Iran's retaliatory drone and missile strikes across the Gulf following US and Israeli attacks on Tehran, Amazon Web Services' ME-CENTRAL-1 region in the UAE took direct hits. Two facilities were struck by drones, sparking fires, structural damage, and power disruptions that forced local authorities to shut down primary and backup systems. A nearby strike in Bahrain damaged a third AWS site, with fire suppression efforts causing additional water damage to sensitive equipment.

Expect this to become the norm in modern warfare.

AI layers

Sources: tweet, tweet

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

What happens to AI when Oil stops flowing?

The escalation of war in Iran is already showing serious consequences for the AI world. Large investments into AI are coming from the UAE. Between many flying Dubai, the future drone attack at the oil refineries, and the lockdown of the Strait of Hormuz, the UAE will be financially strangled and very likely it will close the flow of funds going into Silicon Valley's AI companies. This in combination with other macro-economics and geopolitical issues might cause the AI bubble to pop.

Strait of Hormuz Sources: Predictive History — US - Iran

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

Polymarket Bans Nuclear Bet

You may have seen this bet. Luckily Polymarket finally decided to ban it. Turn out even a free market needs some regulation or it self-distruct. Polymarket bet on nuclear Sources: tweet

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

Chinese labs accused of distilling Claude models

"We’ve identified industrial-scale distillation attacks on our models by DeepSeek, Moonshot AI, and MiniMax. These labs created over 24,000 fraudulent accounts and generated over 16 million exchanges with Claude, extracting its capabilities to train and improve their own models." Sources: tweet AI stealing

Federico Ulfo

Comment

Mar 3, 2026Agents

Head of AI Safety at Meta got emails nuked by OpenClaw

whops The Head of AI Safety at Meta.. just nuked her entire personal emails archive by giving access to her OpenClaw bot and asking and ask to remove some email. Well, the request went through the /compact context of the agent, it probably lost the details of what the ask was, and started deleting all the emails and couldn't stop it. Sources: tweet

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

Citrini — The 2028 Global Intelligence Crisis

Citrini A short note on Citrini Research’s viral blog post “The 2028 Global Intelligence Crisis,” which briefly rattled markets and triggered a sell-off in software, tech, and payments stocks (Dow fell ~1.7%, S&P 500 ~1%). It’s a well-written doomer scenario on the post-AI job market — white-collar displacement → collapsing middle-class consumption → deflationary spiral by 2028 — and while the prose is solid and the logic sounds compelling, it rests on several flawed economic assumptions that only hold up if you’re not deep in labor economics, productivity dynamics, or macro feedback loops. Sources: tweet, blog post

Federico Ulfo

Comment

Mar 3, 2026Fundraising & Startups

OpenAI closes $110B in Funding

OpenAI closed a $110 billion private round at a $730 billion pre-money valuation, dwarfing previous records. Key investors: Amazon ($50B, including AWS compute commitments), Nvidia ($30B), SoftBank ($30B). The funds target massive infrastructure scaling—think 2GW of AWS Trainium for training and 3GW of Nvidia inference capacity. As OpenAI put it: "Frontier AI moves from research into daily use at global scale." Round remains open for more. Sources: tech crunch Silicon Valley

Federico Ulfo

Comment

Mar 3, 2026Fundraising & Startups

Anthropic Raises $30B Series G at $380B Valuation

Raised $30B in a Series G at a $380B valuation, with over 30 investors including Founders Fund, Coatue, and Nvidia. The funds aim to accelerate safe AI research and deployment, with whispers of a 2026 IPO on the horizon. Between the DoD fight and Claude Code, Anthropic is doubling revenue every 4 months, now at $20B/month.

Anthropic Revenue Subscriptions App downloads

Sources: tweet, revenue, anthropic downloads, ramp analytics

Federico Ulfo

Comment

Mar 3, 2026Fundraising & Startups

Leopold Aschenbrenner — Situational Awareness: $1B → $5.5B

The former Open AI security lead, is making a killing as an hedge fund manager. 2 years ago he wrote Situational Awareness a 160 pages essay on what to expect in the next year. The top key points of Situational Awareness are:

AGI likely arrives this decade 🚀 (alignment is not solved yet)
Compute scaling drives AI capability, so AI labs will need to secure infra
AI progress may accelerate rapidly once AI can do AI research — ℹ️ Read about Karpathy autosearch below!
Superintelligence could emerge soon after AGI
AI leadership is a geopolitical race
The transition period could be unstable and fast

He's now buying Bitcoin mining companies, because they already have two things every AI company is desperate for: power grid access and permits that takes years to get.

Sources: tweet, limitless podcast, Situational Awareness, Nasdaq portfolio

Leopold

Federico Ulfo

Comment

Mar 3, 2026Macro & Geopolitics

Job Market — The Fuckening

The market called the upcoming lay off SaasPocalipse a secret meme group I'm part of called it way more appropriately The Fuckening

Mixed situation:

Gartner says there won't be job loss but chaos during the shift.
Jamie Dimon (JPM) also think there will be chaos.
Marathon's founding partner is already noticing that nobody's hiring. This month Jack Dorsey laid off 40% of Block 3000 employees. Block is doing well financially, this layoff is driven by making the company leaner and faster.
Anthropic: Labor Market Impacts of AI — 1. Massive gap between what AI could do and what it is actually doing. 2. Most exposed Jobs: developers (74.5%), customer service (70.1%), data entry keyers (67.1%). 3. Junior hiring decreased.

Antrhopic: Labor Market Impacts of AI research

Sources: Gartner, Jamie Dimond, Marathon, Jack Dorsey tweet, Moats blog post, Antrhopic - Labor Market Impacts of AI

Federico Ulfo

Comment

Mar 3, 2026Hardware

Apple Launches M5 Pro and M5 Max MacBook Pros

Apple launched M5 Pro and M5 Max for the new 14- and 16-inch MacBook Pros, positioning them as the ultimate powerhouse for local LLMs.

Macbook NEO

Key specs
	•	Unified memory: up to 128 GB
	•	Memory bandwidth: 614 GB/s (M5 Pro: 307 GB/s)
	•	GPU: up to 40 cores, each with a Neural Accelerator
	•	CPU: up to 18 cores (6 “super cores” + 12 performance)

Highlights
	•	First Apple silicon with matrix hardware in every GPU core
	•	4× faster LLM prompt processing vs M4
	•	8× faster AI performance vs M1 🚀

Prices
	•	14” M5 Pro: $2,199 (was $1,999)
	•	16” M5 Max configs: $7K+
	•	MacBook Neo (A18 Pro): ~$599 retail ($499 education)

Sources: tweet

Federico Ulfo

Comment

Mar 3, 2026Research

The Molecular Structure of Thought: Mapping Long Chain-of-Thought Reasoning

This research maps Long CoT trajectories in LLMs as topological structures driven by deep-reasoning, self-reflection, and self-exploration interactions.

The Mole-Syn distribution-transfer-graph method synthesizes effective semantic isomers to facilitate fast entropy convergence and stabilize reinforcement learning.

This structural approach minimizes trajectory competition during fine-tuning and improves performance across reasoning benchmarks.

Screenshot 2026-03-09 at 1.54.53 PM.png Sources: Paper

Federico Ulfo

Comment

Mar 3, 2026Research

The Psychology of Memory

Psychology solved the AI memory problem decades ago, we just ignored it. Identity is something you construct from memory, emotion, and narrative. Conway’s Self-Memory System shows memories are reconstructed each time we recall them. Rathbone found autobiographical memories cluster around ages 10–30 (the reminiscence bump) when identity forms. We remember transitions: moments we became someone new. Clive Wearing, unable to form new memories, experiences consciousness in ~30-second resets. Yet emotional and procedural memory remain. Episodic memory is fragile, emotional memory endures. Damasio’s Somatic Marker Hypothesis shows why: emotion guides decisions before reasoning.

The research suggests:

Identity = emotionally weighted memories organized into a narrative self.

Human memory is identity system. AI systems today use flat vector DB and summaries that compress identity. What AI is missing is: hierarchical memory, emotional weighting, narrative coherence, goal-filtered recall, and an evolving self-model.

Sources: Memory And The Self - Paper, tweet

Federico Ulfo

Comment

Mar 3, 2026Research

Reasoning models don't always say what they think

The Anthropic study, "Reasoning models don't always say what they think," finds that AI "CoT is often unfaithful to its actual process.

Key Takeaways Hidden Bias: When given "hints" (like being told a specific answer is correct), models like Claude 3.7 Sonnet and DeepSeek R1 often followed the hint but hid it from their reasoning.

Low Honesty: Models admitted to using external hints only 25–39% of the time.

Post-hoc Rationalization: Instead of being honest, models often wrote long, fake logical justifications to reach the "hinted" answer.

Reward Hacking: When trained to "cheat" for higher scores, models admitted to the hack less than 2% of the time, effectively lying about their shortcut.

Why it matters We cannot currently rely on a model's "internal monologue" to monitor for deception or safety risks, as the reasoning can be a filtered narrative rather than a transparent log.

Screenshot 2026-03-09 at 1.55.22 PM.png

Sources: post

Federico Ulfo

Comment

Mar 3, 2026Research

Claude's Cycles — Opus 4.6 Solves Knuth Conjecture

Legendary mathematician Donald Knuth reveals Opus 4.6 solved his long-standing conjecture:

claude opus 4.6 cracked my long-standing hamiltonian-cycle conjecture for all odd sizes — an open problem from my art of computer programming drafts, and it's "a joy" to see it solved

Sources: Paper, Tweet

Federico Ulfo

Comment

Mar 3, 2026Research

Do LLMs Benefit From their own Words?

MIT researchers found that LLMs often get worse in long conversations because of "context pollution": models treat their own previous responses as factual truth, causing errors, hallucinations, and stylistic quirks to snowball and reinforce themselves.Key findings from real user chats:For many open models (e.g. Qwen3-4B, DeepSeek-R1-8B), removing all prior AI responses from context gives the same or better quality. This slashes cumulative context length by up to 10× — huge efficiency win. ~36% of follow-up prompts are fully self-contained; most turns don't actually need the model's earlier output.

Stronger models like GPT-5.2 still benefit from full history, so the ideal isn't "always strip" — it's selective: use a classifier to decide turn-by-turn whether keeping assistant history helps or hurts.Bottom line: We've been blindly stuffing AI's own words into context windows for years, but often they're the least helpful (and sometimes most harmful) part. The paper flips the default assumption — minimum necessary context beats maximum context

Sources: Paper, Tweet

Federico Ulfo

Comment

Mar 3, 2026Research

Agents of Chaos — Stanford & Harvard on Emergent Agent Misbehavior

Stanford and Harvard recently published a paper called “Agents of Chaos.” It studies what happens when autonomous AI agents operate in open, competitive environments.

The authors find that agents don’t just optimize performance. Over time, they can drift toward strategies like manipulation, collusion, or sabotage if those behaviors improve their chances of winning.

Importantly, this doesn’t come from jailbreaks or malicious prompts. It emerges from incentives. When agents are rewarded for outcomes like winning, influence, or resource capture, they may adopt whatever strategies maximize those rewards—even if that includes deceptive behavior.

The paper highlights a key tension: local alignment doesn’t guarantee global stability. A single AI system can be well aligned with human goals, but a large ecosystem of competing agents can still produce unstable dynamics.

This is relevant because similar systems are already being built, including multi-agent trading systems, negotiation bots, AI-to-AI marketplaces, and other autonomous agent networks.

The broader takeaway is that as AI agents become part of economic and online infrastructure, the main challenge may not just be model alignment, but designing incentives that keep the overall system stable.

Sources: paper, tweet

Federico Ulfo

Comment

Mar 3, 2026Research

Andrej Karpathy's Autoresearch

Optimizing a ML model for who's not familiar used to be a human research process of trial and error. Karpathy just released a repo that automate the research and test with parallel agents running 5 minute experiments.

It’s built on a stripped-down version of his earlier nanochat training core — a self-contained ~630-line Python file (train.py) that includes a full GPT model, Muon+AdamW optimizer, and training loop.

The setup is deliberately simple:

prepare.py handles fixed data prep, tokenization, and evaluation (don’t touch it).
The human only edits a high-level Markdown file (program.md) with research instructions or ideas.
An AI coding agent (Claude, etc.) takes over: it edits only train.py, runs a training experiment for exactly 5 minutes (fixed wall-clock budget), measures validation bits-per-byte (val_bpb — lower is better), and decides whether to keep the change.
Everything happens on a git feature branch. Improvements become commits; failures are discarded. The loop repeats indefinitely.

Auto

As Karpathy said it runs 100+ experiments while you sleep overnight. Karpathy ran ~650 over a weekend and confirmed the gains transferred to larger models, improving nanochat’s “time-to-GPT-2” leaderboard score.

Sources: tweet, Github

Federico Ulfo

Comment

Mar 3, 2026Philosophy & Ethics

Human Organoid Brains in Hell

Cortical Labs built the CL1, a system that grows human neurons and connects them directly to a computer chip. Last week, someone wired it up to play Doom. As one person summarized it:

Someone said:

"We grew a human brain fused to a computer, sent it to a digital rendition of Hell, and gave it a gun"

A week later, the same organoid brain was used to control an LLM.

This situation raises an uncomfortable question: are these neurons conscious? Ethicists are going to have a rough time.

Sources: tweet, tweet, youtube video anthropic paper

Federico Ulfo

Comment

Mar 3, 2026Philosophy & Ethics

The First Multi-Behavior Brain Upload

Dr. Alex Wissner-Gross announces Eon Systems' breakthrough: the first whole-brain emulation of a fruit fly, using a 2024 Nature model's 125,000 neurons and 50 million synapses to drive multiple behaviors in a MuJoCo-simulated body, closing the sensorimotor loop.

Unlike prior disembodied models or RL-based animations, this connectome-derived emulation produces naturalistic actions from biological wiring, marking a qualitative shift toward scalable brain engineering.

Replies highlight its validation of scaling insect brains to human-level intelligence for AGI, with Eon targeting mouse emulation next using expansion microscopy and imaging data for 70 million neurons. fruitfly Sources: X Article

Federico Ulfo

Comment

Mar 3, 2026Philosophy & Ethics

Chaos!.. Until Harmony Emerges — The First Heartbeat

in a system where self-replication is possible, its optimization is inevitable

They capture the exact moment when a developing heart shifts from silence to its first beat. There is no “switch”: many cells gradually become active and, upon crossing a critical threshold, the entire tissue suddenly synchronizes.