Skills
Skills
All Skillsanthropic · Multimodal
Vision
v1.3.013.6K installs445 starsApache 2.0Updated 2026-04-08
Analyze images, extract text, describe visual content, and compare screenshots for visual QA.
visionocrimagesmultimodalscreenshots
npx agentmag add visionAbout
The Vision skill gives your agent eyes. It can analyze images to describe content, extract text (OCR), detect objects, read charts, and compare screenshots for visual regression testing.
Powered by multimodal models, it goes beyond simple image recognition — it can interpret complex diagrams, read handwritten notes, understand UI layouts, and provide detailed visual descriptions that other skills can act on.
Capabilities
- Natural language image description and analysis
- OCR for printed and handwritten text
- Chart and diagram interpretation
- Screenshot comparison for visual regression
- Object detection and counting
- UI element identification and layout analysis
Compatible agents
Claude CodeCursorWindsurf
Add Vision to your agent
One command to install. Works with all major coding agents.
Building an agent skill? Submit it for free or get featured placement.