Skills

Audio Transcription

Name: Audio Transcription
Rating: 4.7 (401 reviews)
Author: openai

v1.0.2

openai · Multimodal

13.4K installs401 starsMITUpdated 2026-04-13

Transcribe audio and video with Whisper — multi-language, timestamps, and speaker diarization.

audiotranscriptionwhisperspeech-to-textdiarization

npx agentmag add audio-transcription

About

The Audio Transcription skill uses OpenAI's Whisper model to transcribe audio and video files with high accuracy. It supports 99+ languages, provides word-level timestamps, and can identify different speakers (diarization).

Handles common formats: MP3, WAV, M4A, MP4, WebM, and more. Perfect for meeting transcriptions, podcast processing, video subtitling, and voice note conversion.

Capabilities

High-accuracy speech-to-text with Whisper
99+ language support with auto-detection
Word-level and segment-level timestamps
Speaker diarization (who said what)
Support for MP3, WAV, M4A, MP4, WebM
Translation to English from any language

Compatible agents

Claude CodeCursorWindsurf

Add Audio Transcription to your agent

One command to install. Works with all major coding agents.

Building an agent skill? Submit it for free or get featured placement.

Stay in the know

Audio Transcription

About

Capabilities

Compatible agents

Add Audio Transcription to your agent