MiMo-V2-Omni
All models
XiaomiXiaomiMiMo

MiMo-V2-Omni

262K context$0.400/M input$2.00/M output

MiMo-V2-Omni is an AI model from Xiaomi built for agent workflows, with support for text, audio, image, video input and text output. MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

What is MiMo-V2-Omni?

MiMo-V2-Omni is an AI model from Xiaomi that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare MiMo-V2-Omni against other models for agent workflows and production deployments.

Model ID

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...

Modalities
Input
textaudioimagevideo
Output
text
Supported Parameters
frequency_penaltyinclude_reasoningmax_tokenspresence_penaltyreasoningresponse_formatstoptemperaturetool_choicetoolstop_p

Related content