Skip to content

transcribe_video

Transcribe the spoken content from a video using one of four supported audio providers. Supports paragraph-level timestamps, speaker diarization, subtitle export (SRT/VTT), and speech-to-English translation.

ParameterTypeRequiredDescription
videoPathstringPath to the video, file:// URI, or remote URL
providerstringAudio provider: deepgram, assemblyai, groq, or gemini
languagestringLanguage code (e.g., en, es, fr, ja, zh)
diarizebooleanIdentify speakers — default: false
translatebooleanTranslate speech to English — default: false
outputFormatstringOutput format: text (default), srt, vtt, or json
FormatDescriptionBest For
textContinuous text with [MM:SS] paragraph timestampsReading, analysis
srtSubRip subtitle format (numbered blocks with timecodes)Video players, editors
vttWebVTT subtitle formatWeb video players (<video> element)
jsonStructured JSON with segment-level timing dataProcessing, editing tools

Basic transcription:

“Transcribe ./interview.mp4

French video:

“Transcribe this French presentation: ./presentation.mp4” (pass language: "fr")

Speaker diarization:

“Transcribe ./interview.mp4 with speaker diarization using AssemblyAI”

Translate speech to English:

“Transcribe this Spanish video and translate it to English: ./video.mp4

Generate SRT subtitles:

“Generate SRT subtitles for ./interview.mp4

Generate WebVTT subtitles:

“Export a WebVTT subtitle file from ./talk.mp4

Transcribe a YouTube video:

“Transcribe the audio from https://www.youtube.com/watch?v=abc123

Diarization assigns a label to each speaker (e.g., “Speaker 0”, “Speaker 1”). It is supported by Deepgram and AssemblyAI only:

ProviderDiarization
Deepgram
AssemblyAI✓ (highest quality)
Groq/Whisper
Gemini

Use provider: "assemblyai" for the best diarization results. See Audio Providers for a full comparison.

Setting translate: true converts speech from any language into English in the output. This is separate from multilingual transcription — Deepgram and AssemblyAI transcribe many languages natively but always output in the source language.

ProviderTranslation
Deepgram
AssemblyAI
Groq/Whisper
Gemini

Use provider: "groq" or provider: "gemini" when you need English output from non-English audio.