Skip to main content
AI Socratic

DeepSeek just dropped V4 (preview) — two open-weights MoE models that push the frontier on cost-effective 1M-token context.

DeepSeek-V4-Pro: 1.6T total params (49B active) — flagship performance rivaling top closed models in reasoning, math, and agentic coding. DeepSeek-V4-Flash: 284B total (13B active) — faster, cheaper, and highly efficient for everyday/agent tasks.

image.png

Both feature a new hybrid attention architecture (Compressed Sparse Attention + Heavily Compressed Attention) that makes million-token contexts dramatically more practical (much lower FLOPs and KV cache than V3). MIT license, available on Hugging Face (base + instruct), and live on the DeepSeek API today.

The community is already praising the efficiency gains, strong coding/agent results (e.g., high LiveCodeBench / SWE-Bench scores), and rock-bottom pricing — especially with the ongoing Pro discount.

Quick Highlights (as of early May 2026)

  • Release date: April 24, 2026 (preview)
  • Context: Native 1M tokens (with practical efficiency improvements for real agent/document workflows)
  • Reasoning modes: Non-think (fast), Think High, Think Max (deeper, higher quality on hard tasks) — all from the same weights
  • API pricing (highly competitive): Flash is extremely cheap; Pro has a big temporary discount (extended to ~May 31 in some updates) + major input cache price drop (1/10th)
  • Strengths: Coding/agentic tasks, long-context efficiency, price/performance. Text-only for now (multimodal planned later).
  • Availability: Chat at chat.deepseek.com (Expert/Instant modes), API (OpenAI/Anthropic compatible), open weights on HF/ModelScope.

Sources: Official announcement, Hugging Face collection, Tech Report, tweet discount extended

React:

Comments

Sign in as a member to join the conversation.

Loading comments…

Stay Updated

Get the latest AI insights delivered to your inbox. No spam, unsubscribe anytime.

Search

Search across events, members, and blog posts