Skills
Skills
All Skillsopenai · Multimodal
Audio Transcription
v1.0.213.4K installs401 starsMITUpdated 2026-04-13
Transcribe audio and video with Whisper — multi-language, timestamps, and speaker diarization.
audiotranscriptionwhisperspeech-to-textdiarization
npx agentmag add audio-transcriptionAbout
The Audio Transcription skill uses OpenAI's Whisper model to transcribe audio and video files with high accuracy. It supports 99+ languages, provides word-level timestamps, and can identify different speakers (diarization).
Handles common formats: MP3, WAV, M4A, MP4, WebM, and more. Perfect for meeting transcriptions, podcast processing, video subtitling, and voice note conversion.
Capabilities
- High-accuracy speech-to-text with Whisper
- 99+ language support with auto-detection
- Word-level and segment-level timestamps
- Speaker diarization (who said what)
- Support for MP3, WAV, M4A, MP4, WebM
- Translation to English from any language
Compatible agents
Claude CodeCursorWindsurf
Add Audio Transcription to your agent
One command to install. Works with all major coding agents.
Building an agent skill? Submit it for free or get featured placement.