Skills
All Skills

Audio Transcription

v1.0.2
openai · Multimodal
13.4K installs401 starsMITUpdated 2026-04-13

Transcribe audio and video with Whisper — multi-language, timestamps, and speaker diarization.

audiotranscriptionwhisperspeech-to-textdiarization
npx agentmag add audio-transcription

About

The Audio Transcription skill uses OpenAI's Whisper model to transcribe audio and video files with high accuracy. It supports 99+ languages, provides word-level timestamps, and can identify different speakers (diarization).

Handles common formats: MP3, WAV, M4A, MP4, WebM, and more. Perfect for meeting transcriptions, podcast processing, video subtitling, and voice note conversion.

Capabilities

  • High-accuracy speech-to-text with Whisper
  • 99+ language support with auto-detection
  • Word-level and segment-level timestamps
  • Speaker diarization (who said what)
  • Support for MP3, WAV, M4A, MP4, WebM
  • Translation to English from any language

Compatible agents

Claude CodeCursorWindsurf

Add Audio Transcription to your agent

One command to install. Works with all major coding agents.

Building an agent skill? Submit it for free or get featured placement.