MiMo-V2.5
All models
XiaomiXiaomiMiMoReleased 2026-04-22

MiMo-V2.5

1.0M context$0.400/M input$2.00/M output

MiMo-V2.5 is Xiaomi's omnimodal AI model designed for multimodal perception tasks, including image and video understanding. It features a 1M token context window, enabling it to handle complete documents, extended conversations, and complex task contexts in a single pass. The model is optimized for cost-efficient inference while delivering strong reasoning and perception capabilities, making it suitable for integration with agent frameworks.

What is MiMo-V2.5?

MiMo-V2.5 is an AI model from Xiaomi that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare MiMo-V2.5 against other models for agent workflows and production deployments.

Model ID

MiMo-V2.5 is Xiaomi's omnimodal AI model designed for multimodal perception tasks, including image and video understanding. It features a 1M token context window, enabling it to handle complete documents, extended conversations, and complex task contexts in a single pass. The model is optimized for cost-efficient inference while delivering strong reasoning and perception capabilities, making it suitable for integration with agent frameworks.

Architecture & Specifications
Architecture
Omnimodal
Tokenizer
Other
Released
2026-04-22
Modalities
Input
textaudioimagevideo
Output
text
Supported Parameters
frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p
Strengths
  • Supports a 1M token context window for extended tasks
  • Optimized for cost-efficient inference
  • Strong multimodal perception across image and video tasks
  • Ideal for integration with agent frameworks
Recommended Use Cases
Extended conversations
Complex task contexts
Image and video understanding
Integration with agent frameworks

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.