Gemma 4 26B A4B
All models
GoogleGoogleGemmaReleased April 3, 2026

Gemma 4 26B A4B

262K context$0.060/M input$0.330/M output25.2B total, 3.8B active

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model developed by Google DeepMind. It features 25.2 billion total parameters, with only 3.8 billion activating per token during inference, enabling high-quality outputs at reduced computational cost. The model supports multimodal inputs, including text, images, and video, and offers a 256K token context window, native function calling, configurable reasoning modes, and structured output capabilities. It is released under the Apache 2.0 license.

What is Gemma 4 26B A4B ?

Gemma 4 26B A4B is an AI model from Google that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Gemma 4 26B A4B against other models for agent workflows and production deployments.

Model ID

Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model developed by Google DeepMind. It features 25.2 billion total parameters, with only 3.8 billion activating per token during inference, enabling high-quality outputs at reduced computational cost. The model supports multimodal inputs, including text, images, and video, and offers a 256K token context window, native function calling, configurable reasoning modes, and structured output capabilities. It is released under the Apache 2.0 license.

Architecture & Specifications
Architecture
Mixture of Experts (MoE)
Parameters
25.2B total, 3.8B active
Tokenizer
Gemma
License
Apache 2.0
Released
April 3, 2026
Modalities
Input
imagetextvideo
Output
text
Supported Parameters
frequency_penaltyinclude_reasoninglogit_biaslogprobsmax_tokensmin_ppresence_penaltyreasoningrepetition_penaltyresponse_formatseedstopstructured_outputstemperaturetool_choicetoolstop_ktop_logprobstop_p
Strengths
  • Supports multimodal input including text, images, and video
  • Efficient inference with only 3.8B active parameters per token
  • 256K token context window for handling large inputs
  • Native function calling and configurable reasoning modes
  • Structured output support for complex tasks
Limitations
  • Lower performance on certain benchmarks like CritPt (0.0%)
  • Hallucination rate of 19.1% in knowledge tasks
  • Limited coding capabilities with scores like 13.6% on Terminal-Bench Hard
  • Moderate performance in economically valuable tasks (25.7%)
  • Not explicitly trained for prompt retention or logging
Recommended Use Cases
Academia and research tasks
Health-related applications
Marketing and SEO optimization
Roleplay and conversational AI
Long-context reasoning tasks

Related content

Data enriched Apr 24, 2026. Pricing from OpenRouter API.