DeepSeek V4 Flash
All models
DeepSeekDeepSeekDeepSeekReleased 2026-04-24

DeepSeek V4 Flash

1.0M context$0.140/M input$0.280/M output284B total, 13B activated

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model with 284 billion total parameters and 13 billion activated parameters. It supports a 1 million-token context window and is designed for fast inference and high-throughput workloads. The model features hybrid attention for efficient long-context processing and configurable reasoning modes, making it suitable for coding assistants, chat systems, and agent workflows.

What is DeepSeek V4 Flash?

DeepSeek V4 Flash is an AI model from DeepSeek that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare DeepSeek V4 Flash against other models for agent workflows and production deployments.

Model ID

DeepSeek V4 Flash is an efficiency-optimized Mixture-of-Experts model with 284 billion total parameters and 13 billion activated parameters. It supports a 1 million-token context window and is designed for fast inference and high-throughput workloads. The model features hybrid attention for efficient long-context processing and configurable reasoning modes, making it suitable for coding assistants, chat systems, and agent workflows.

Architecture & Specifications
Architecture
Mixture of Experts (MoE)
Parameters
284B total, 13B activated
Tokenizer
DeepSeek
Released
2026-04-24
Modalities
Input
text
Output
text
Supported Parameters
frequency_penaltyinclude_reasoninglogprobsmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_logprobstop_p
Strengths
  • Supports a 1 million-token context window
  • Optimized for fast inference and high-throughput workloads
  • Hybrid attention for efficient long-context processing
  • Configurable reasoning modes
  • Strong reasoning and coding performance
Limitations
  • Limited information on training data sources
  • Hallucination rate of 4.2% in knowledge benchmarks
  • Low performance in research-level physics reasoning (CritPt: 7.1%)
Recommended Use Cases
Coding assistants
Chat systems
Agent workflows
Long-context processing tasks
High-throughput applications requiring cost efficiency

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.