Gemma 4 26B A4B (free)
All models
GoogleGoogleGemmaFreeReleased 2026-04-03

Gemma 4 26B A4B (free)

262K contextFreeFree25.2B total, 3.8B active

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model developed by Google DeepMind. It features 25.2 billion total parameters, with only 3.8 billion activated per token during inference, enabling high-quality outputs comparable to models with 31 billion parameters while optimizing compute efficiency. The model supports multimodal inputs, including text, images, and video, and offers advanced capabilities such as a 256K token context window, native function calling, configurable reasoning modes, and structured output generation.

What is Gemma 4 26B A4B (free)?

Gemma 4 26B A4B (free) is an AI model from Google that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Gemma 4 26B A4B (free) against other models for agent workflows and production deployments.

Model ID

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model developed by Google DeepMind. It features 25.2 billion total parameters, with only 3.8 billion activated per token during inference, enabling high-quality outputs comparable to models with 31 billion parameters while optimizing compute efficiency. The model supports multimodal inputs, including text, images, and video, and offers advanced capabilities such as a 256K token context window, native function calling, configurable reasoning modes, and structured output generation.

Architecture & Specifications
Architecture
Mixture of Experts (MoE)
Parameters
25.2B total, 3.8B active
Tokenizer
Gemma
License
Apache 2.0
Released
2026-04-03
Modalities
Input
imagetextvideo
Output
text
Supported Parameters
include_reasoningmax_tokensreasoningresponse_formatseedtemperaturetool_choicetoolstop_p
Strengths
  • Supports multimodal inputs including text, images, and video
  • Efficient inference with only 3.8B active parameters per token
  • 256K token context window for handling large inputs
  • Native function calling and structured output generation
  • Configurable reasoning modes for tailored responses
Limitations
  • Lower performance on certain benchmarks like CritPt (0.0%)
  • Limited coding capabilities as indicated by Terminal-Bench Hard (13.6%)
  • High hallucination rate in knowledge benchmarks (19.1%)
  • Relatively low accuracy in knowledge-based tasks (18.1%)
  • Performance variability across different benchmarks
Recommended Use Cases
Complex reasoning tasks with large context requirements
Multimodal applications involving text, images, and video
Instruction-following scenarios with structured outputs
Agentic coding and terminal use in IDE environments
Interactive roleplay and storytelling applications

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.