StepFun's Step 3.5 Flash

March 3, 2026Posted by Federico Ulfo

Sparse MoE model with 196B total params, but only 11B activated per token, this model was designed to fit into 128 GB memory (i.e. it can run on DGX spark or other local setups). It is one of the first large-scale MoE models trained using the Muon optimizer and made several adaptations to improve training stability at this scale. It's fast, small, and smart ish. It works well for simple openclaw tasks and is free/very cheap on OpenRouter. Sources: Artificial Analysis