Skills

Vision

Name: Vision
Rating: 4.7 (445 reviews)
Author: anthropic

v1.3.0

anthropic · Multimodal

13.6K installs445 starsApache 2.0Updated 2026-04-08

Analyze images, extract text, describe visual content, and compare screenshots for visual QA.

visionocrimagesmultimodalscreenshots

npx agentmag add vision

About

The Vision skill gives your agent eyes. It can analyze images to describe content, extract text (OCR), detect objects, read charts, and compare screenshots for visual regression testing.

Powered by multimodal models, it goes beyond simple image recognition — it can interpret complex diagrams, read handwritten notes, understand UI layouts, and provide detailed visual descriptions that other skills can act on.

Capabilities

Natural language image description and analysis
OCR for printed and handwritten text
Chart and diagram interpretation
Screenshot comparison for visual regression
Object detection and counting
UI element identification and layout analysis

Compatible agents

Claude CodeCursorWindsurf

Add Vision to your agent

One command to install. Works with all major coding agents.

Building an agent skill? Submit it for free or get featured placement.

Stay in the know

Vision

About

Capabilities

Compatible agents

Add Vision to your agent