transcribe_video

Transcribe the spoken content from a video using one of four supported audio providers. Supports paragraph-level timestamps, speaker diarization, subtitle export (SRT/VTT), and speech-to-English translation.

Parameters

Parameter	Type	Required	Description
`videoPath`	string	✓	Path to the video, `file://` URI, or remote URL
`provider`	string	—	Audio provider: `deepgram`, `assemblyai`, `groq`, or `gemini`
`language`	string	—	Language code (e.g., `en`, `es`, `fr`, `ja`, `zh`)
`diarize`	boolean	—	Identify speakers — default: `false`
`translate`	boolean	—	Translate speech to English — default: `false`
`outputFormat`	string	—	Output format: `text` (default), `srt`, `vtt`, or `json`

Output Formats

Format	Description	Best For
`text`	Continuous text with `[MM:SS]` paragraph timestamps	Reading, analysis
`srt`	SubRip subtitle format (numbered blocks with timecodes)	Video players, editors
`vtt`	WebVTT subtitle format	Web video players (`<video>` element)
`json`	Structured JSON with segment-level timing data	Processing, editing tools

Usage Examples

Basic transcription:

“Transcribe ./interview.mp4”

French video:

“Transcribe this French presentation: ./presentation.mp4” (pass language: "fr")

Speaker diarization:

“Transcribe ./interview.mp4 with speaker diarization using AssemblyAI”

Translate speech to English:

“Transcribe this Spanish video and translate it to English: ./video.mp4”

Generate SRT subtitles:

“Generate SRT subtitles for ./interview.mp4”

Generate WebVTT subtitles:

“Export a WebVTT subtitle file from ./talk.mp4”

Transcribe a YouTube video:

“Transcribe the audio from https://www.youtube.com/watch?v=abc123”

Speaker Diarization

Diarization assigns a label to each speaker (e.g., “Speaker 0”, “Speaker 1”). It is supported by Deepgram and AssemblyAI only:

Provider	Diarization
Deepgram	✓
AssemblyAI	✓ (highest quality)
Groq/Whisper	✗
Gemini	✗

Use provider: "assemblyai" for the best diarization results. See Audio Providers for a full comparison.

Translation

Setting translate: true converts speech from any language into English in the output. This is separate from multilingual transcription — Deepgram and AssemblyAI transcribe many languages natively but always output in the source language.

Provider	Translation
Deepgram	✗
AssemblyAI	✗
Groq/Whisper	✓
Gemini	✓

Use provider: "groq" or provider: "gemini" when you need English output from non-English audio.