Mercury 2
Mercury 2
All models
Mercury 2 is a reasoning diffusion large language model (dLLM) designed for high-speed token generation and refinement in parallel, achieving over 1,000 tokens per second on standard GPUs. It supports tunable reasoning levels, 128K context length, native tool use, and schema-aligned JSON outputs, making it suitable for coding workflows, real-time voice/search applications, and agent loops. Mercury 2 is notable for being 5x faster than leading speed-optimized models like Claude 4.5 Haiku and GPT 5 Mini, while maintaining cost efficiency.
Related content
Data enriched Apr 24, 2026. Pricing from OpenRouter API.