Mercury 2
All models
InceptionInceptionReleased 2026-03-04

Mercury 2

128K context$0.250/M input$0.750/M output

Mercury 2 is a reasoning diffusion large language model (dLLM) designed for high-speed token generation and refinement in parallel, achieving over 1,000 tokens per second on standard GPUs. It supports tunable reasoning levels, 128K context length, native tool use, and schema-aligned JSON outputs, making it suitable for coding workflows, real-time voice/search applications, and agent loops. Mercury 2 is notable for being 5x faster than leading speed-optimized models like Claude 4.5 Haiku and GPT 5 Mini, while maintaining cost efficiency.

What is Mercury 2?

Mercury 2 is an AI model from Inception that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Mercury 2 against other models for agent workflows and production deployments.

Model ID

Mercury 2 is a reasoning diffusion large language model (dLLM) designed for high-speed token generation and refinement in parallel, achieving over 1,000 tokens per second on standard GPUs. It supports tunable reasoning levels, 128K context length, native tool use, and schema-aligned JSON outputs, making it suitable for coding workflows, real-time voice/search applications, and agent loops. Mercury 2 is notable for being 5x faster than leading speed-optimized models like Claude 4.5 Haiku and GPT 5 Mini, while maintaining cost efficiency.

Architecture & Specifications
Architecture
Reasoning Diffusion LLM (dLLM)
Tokenizer
Other
License
Proprietary
Released
2026-03-04
Modalities
Input
text
Output
text
Supported Parameters
include_reasoningmax_tokensreasoningresponse_formatstopstructured_outputstemperaturetool_choicetools
Strengths
  • Extremely fast token generation (>1,000 tokens/sec)
  • Supports 128K context length
  • Tunable reasoning levels
  • Native tool use and schema-aligned JSON outputs
  • Cost-efficient compared to leading models
Limitations
  • Lower performance in research-level physics reasoning (CritPt: 0.8%)
  • Moderate accuracy in knowledge-based tasks (AA-Omniscience Accuracy: 20.5%)
  • Higher structured output error rate (2.44%) compared to some models
  • Tool call error rate of 4.82%
  • Limited economic task performance (GDPval-AA: 23.0%)
Recommended Use Cases
Coding workflows with low latency requirements
Real-time voice and search applications
Agent loops for autonomous systems
Schema-aligned JSON output generation
High-speed reasoning tasks

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.