Qwen3.5-Flash
All models
AlibabaAlibabaQwenReleased February 25, 2026

Qwen3.5-Flash

1M context$0.065/M input$0.260/M output

Qwen3.5-Flash is a vision-language model built on a hybrid architecture that combines linear attention mechanisms with a sparse mixture-of-experts model. It is designed for efficient inference and excels in both pure text and multimodal tasks, offering fast response times while maintaining a balance between speed and performance. The model represents a significant improvement over the Qwen3 series in terms of capabilities and efficiency.

What is Qwen3.5-Flash?

Qwen3.5-Flash is an AI model from Alibaba that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Qwen3.5-Flash against other models for agent workflows and production deployments.

Model ID

Qwen3.5-Flash is a vision-language model built on a hybrid architecture that combines linear attention mechanisms with a sparse mixture-of-experts model. It is designed for efficient inference and excels in both pure text and multimodal tasks, offering fast response times while maintaining a balance between speed and performance. The model represents a significant improvement over the Qwen3 series in terms of capabilities and efficiency.

Architecture & Specifications
Architecture
Hybrid architecture with linear attention and sparse mixture-of-experts
Tokenizer
Qwen3
Released
February 25, 2026
Modalities
Input
textimagevideo
Output
text
Supported Parameters
include_reasoningmax_tokenspresence_penaltyreasoningresponse_formatseedstructured_outputstemperaturetool_choicetoolstop_p
Strengths
  • Efficient inference with hybrid architecture
  • Strong performance in both text and multimodal tasks
  • Fast response times
  • Improved capabilities over the Qwen3 series
Recommended Use Cases
Academia
Finance
Health
Legal
Marketing

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.