Audio Providers
Video Context MCP supports four audio providers for the transcribe_video tool and for audio-enhanced video analysis.
Provider Comparison
Section titled “Provider Comparison”| Feature | Deepgram | AssemblyAI | Groq/Whisper | Gemini |
|---|---|---|---|---|
| Price | Paid ($200 free credits) | Paid ($50 free credits) | Free tier | Free tier |
| Speaker diarization | ✓ | ✓ (highest quality) | ✗ | ✗ |
| English translation | ✗ | ✗ | ✓ | ✓ |
| Multilingual transcription | ✓ | ✓ | ✓ | ✓ |
| Best for | Default — fast, accurate | High-quality diarization | Free / cost-conscious | Users already using Gemini |
Deepgram — Default
Section titled “Deepgram — Default”Fast, accurate, and highly reliable. Offers $200 in free credits to new accounts. The default audio provider.
Environment variable: DEEPGRAM_API_KEY
Get API key →
AssemblyAI
Section titled “AssemblyAI”Best-in-class speaker diarization. $50 free credits to new accounts. Use this when identifying who said what is important.
Environment variable: ASSEMBLYAI_API_KEY
Get API key →
Groq/Whisper
Section titled “Groq/Whisper”Free tier available. Based on OpenAI Whisper. The only free provider that supports translation to English — use this when you need an English transcript of non-English audio.
Environment variable: GROQ_API_KEY
Get API key →
Gemini
Section titled “Gemini”Free tier available. Reuses the same key as the Gemini video provider. Also supports English translation. A good option for users who already have a Gemini key configured.
Environment variable: GEMINI_API_KEY (same key as video)
Fallback Chain
Section titled “Fallback Chain”When a provider fails or its key is missing, the server automatically tries the next one in order. With the default (Deepgram):
Deepgram → AssemblyAI → Groq → Gemini
Change the Default Audio Provider
Section titled “Change the Default Audio Provider”Set AUDIO_MCP_DEFAULT_PROVIDER in your MCP environment config:
"AUDIO_MCP_DEFAULT_PROVIDER": "assemblyai"Valid values: deepgram, assemblyai, groq, gemini