Llama 3.2 11B Vision Instruct
Llama 3.2 11B Vision Instruct is an AI model from Meta built for agent workflows, with support for text, image input and text output. Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
What is Llama 3.2 11B Vision Instruct?
Llama 3.2 11B Vision Instruct is an AI model from Meta that Agent Mag tracks for pricing, context window, modalities, benchmarks, and API compatibility. Builders can use this page to compare Llama 3.2 11B Vision Instruct against other models for agent workflows and production deployments.
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
More from Meta
Llama Guard 4 is a Llama 4 Scout-derived multimodal pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM...
Llama 4 Maverick 17B Instruct (128E) is a high-capacity multimodal language model from Meta, built on a mixture-of-experts (MoE) architecture with 128 experts and 17 billion active parameters per forward...
Llama 4 Scout 17B Instruct (16E) is a mixture-of-experts (MoE) language model developed by Meta, activating 17 billion parameters out of a total of 109B. It supports native multimodal input...
Llama Guard 3 is a Llama-3.1-8B pretrained model, fine-tuned for content safety classification. Similar to previous versions, it can be used to classify content in both LLM inputs (prompt classification)...
Related content
Compare pricing, local installs, context windows, and modality filters across the full model catalog.
Find frameworks, SDKs, and infrastructure tools that pair with this model in production workflows.
See Agent Mag coverage of model benchmarks, agent frameworks, and deployment patterns.