Skip to content

Audio Providers

Video Context MCP supports four audio providers for the transcribe_video tool and for audio-enhanced video analysis.

FeatureDeepgramAssemblyAIGroq/WhisperGemini
PricePaid ($200 free credits)Paid ($50 free credits)Free tierFree tier
Speaker diarization✓ (highest quality)
English translation
Multilingual transcription
Best forDefault — fast, accurateHigh-quality diarizationFree / cost-consciousUsers already using Gemini

Fast, accurate, and highly reliable. Offers $200 in free credits to new accounts. The default audio provider.

Environment variable: DEEPGRAM_API_KEY
Get API key →

Best-in-class speaker diarization. $50 free credits to new accounts. Use this when identifying who said what is important.

Environment variable: ASSEMBLYAI_API_KEY
Get API key →

Free tier available. Based on OpenAI Whisper. The only free provider that supports translation to English — use this when you need an English transcript of non-English audio.

Environment variable: GROQ_API_KEY
Get API key →

Free tier available. Reuses the same key as the Gemini video provider. Also supports English translation. A good option for users who already have a Gemini key configured.

Environment variable: GEMINI_API_KEY (same key as video)

When a provider fails or its key is missing, the server automatically tries the next one in order. With the default (Deepgram):

Deepgram → AssemblyAI → Groq → Gemini

Set AUDIO_MCP_DEFAULT_PROVIDER in your MCP environment config:

"AUDIO_MCP_DEFAULT_PROVIDER": "assemblyai"

Valid values: deepgram, assemblyai, groq, gemini