Mercury 2 is an AI model from Inception in the Inception series. Agent Mag tracks its context window, pricing, modalities, and supported API parameters on this page.

How do you use Mercury 2?

Mercury 2 can be called through OpenRouter's OpenAI-compatible API using the model ID inception/mercury-2. This page includes a ready-to-run curl example and the supported parameter list.

Mercury 2 is a paid model in the OpenRouter catalog, with separate input and output token pricing shown on this page.

Mercury 2

All models

InceptionInceptionReleased 2026-03-04

Mercury 2

128K context

$0.250/M input

$0.750/M output

Mercury 2 is a reasoning diffusion large language model (dLLM) designed for high-speed token generation and refinement in parallel, achieving over 1,000 tokens per second on standard GPUs. It supports tunable reasoning levels, 128K context length, native tool use, and schema-aligned JSON outputs, making it suitable for coding workflows, real-time voice/search applications, and agent loops. Mercury 2 is notable for being 5x faster than leading speed-optimized models like Claude 4.5 Haiku and GPT 5 Mini, while maintaining cost efficiency.

What is Mercury 2?

Mercury 2 is an AI model from Inception that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Mercury 2 against other models for agent workflows and production deployments.

Model ID

Architecture & Specifications

Architecture

Reasoning Diffusion LLM (dLLM)

Tokenizer

Other

License

Proprietary

Released

2026-03-04

Modalities

Input

text

Output

text

Supported Parameters

include_reasoningmax_tokensreasoningresponse_formatstopstructured_outputstemperaturetool_choicetools

Strengths

Extremely fast token generation (>1,000 tokens/sec)
Supports 128K context length
Tunable reasoning levels
Native tool use and schema-aligned JSON outputs
Cost-efficient compared to leading models

Limitations

Lower performance in research-level physics reasoning (CritPt: 0.8%)
Moderate accuracy in knowledge-based tasks (AA-Omniscience Accuracy: 20.5%)
Higher structured output error rate (2.44%) compared to some models
Tool call error rate of 4.82%
Limited economic task performance (GDPval-AA: 23.0%)

Recommended Use Cases

Coding workflows with low latency requirements

Real-time voice and search applications

Agent loops for autonomous systems

Schema-aligned JSON output generation

High-speed reasoning tasks

Stay in the know

Mercury 2

What is Mercury 2?

Related content