transcribe_video
Transcribe the spoken content from a video using one of four supported audio providers. Supports paragraph-level timestamps, speaker diarization, subtitle export (SRT/VTT), and speech-to-English translation.
Parameters
Section titled “Parameters”| Parameter | Type | Required | Description |
|---|---|---|---|
videoPath | string | ✓ | Path to the video, file:// URI, or remote URL |
provider | string | — | Audio provider: deepgram, assemblyai, groq, or gemini |
language | string | — | Language code (e.g., en, es, fr, ja, zh) |
diarize | boolean | — | Identify speakers — default: false |
translate | boolean | — | Translate speech to English — default: false |
outputFormat | string | — | Output format: text (default), srt, vtt, or json |
Output Formats
Section titled “Output Formats”| Format | Description | Best For |
|---|---|---|
text | Continuous text with [MM:SS] paragraph timestamps | Reading, analysis |
srt | SubRip subtitle format (numbered blocks with timecodes) | Video players, editors |
vtt | WebVTT subtitle format | Web video players (<video> element) |
json | Structured JSON with segment-level timing data | Processing, editing tools |
Usage Examples
Section titled “Usage Examples”Basic transcription:
“Transcribe
./interview.mp4”
French video:
“Transcribe this French presentation:
./presentation.mp4” (passlanguage: "fr")
Speaker diarization:
“Transcribe
./interview.mp4with speaker diarization using AssemblyAI”
Translate speech to English:
“Transcribe this Spanish video and translate it to English:
./video.mp4”
Generate SRT subtitles:
“Generate SRT subtitles for
./interview.mp4”
Generate WebVTT subtitles:
“Export a WebVTT subtitle file from
./talk.mp4”
Transcribe a YouTube video:
“Transcribe the audio from
https://www.youtube.com/watch?v=abc123”
Speaker Diarization
Section titled “Speaker Diarization”Diarization assigns a label to each speaker (e.g., “Speaker 0”, “Speaker 1”). It is supported by Deepgram and AssemblyAI only:
| Provider | Diarization |
|---|---|
| Deepgram | ✓ |
| AssemblyAI | ✓ (highest quality) |
| Groq/Whisper | ✗ |
| Gemini | ✗ |
Use provider: "assemblyai" for the best diarization results. See Audio Providers for a full comparison.
Translation
Section titled “Translation”Setting translate: true converts speech from any language into English in the output. This is separate from multilingual transcription — Deepgram and AssemblyAI transcribe many languages natively but always output in the source language.
| Provider | Translation |
|---|---|
| Deepgram | ✗ |
| AssemblyAI | ✗ |
| Groq/Whisper | ✓ |
| Gemini | ✓ |
Use provider: "groq" or provider: "gemini" when you need English output from non-English audio.