Audio Providers

Video Context MCP supports four audio providers for the transcribe_video tool and for audio-enhanced video analysis.

Provider Comparison

Feature	Deepgram	AssemblyAI	Groq/Whisper	Gemini
Price	Paid ($200 free credits)	Paid ($50 free credits)	Free tier	Free tier
Speaker diarization	✓	✓ (highest quality)	✗	✗
English translation	✗	✗	✓	✓
Multilingual transcription	✓	✓	✓	✓
Best for	Default — fast, accurate	High-quality diarization	Free / cost-conscious	Users already using Gemini

Deepgram — Default

Fast, accurate, and highly reliable. Offers $200 in free credits to new accounts. The default audio provider.

Environment variable: DEEPGRAM_API_KEY
Get API key →

AssemblyAI

Best-in-class speaker diarization. $50 free credits to new accounts. Use this when identifying who said what is important.

Environment variable: ASSEMBLYAI_API_KEY
Get API key →

Groq/Whisper

Free tier available. Based on OpenAI Whisper. The only free provider that supports translation to English — use this when you need an English transcript of non-English audio.

Environment variable: GROQ_API_KEY
Get API key →

Gemini

Free tier available. Reuses the same key as the Gemini video provider. Also supports English translation. A good option for users who already have a Gemini key configured.

Environment variable: GEMINI_API_KEY (same key as video)

Fallback Chain

When a provider fails or its key is missing, the server automatically tries the next one in order. With the default (Deepgram):

Deepgram → AssemblyAI → Groq → Gemini

Change the Default Audio Provider

Set AUDIO_MCP_DEFAULT_PROVIDER in your MCP environment config:

"AUDIO_MCP_DEFAULT_PROVIDER": "assemblyai"

Valid values: deepgram, assemblyai, groq, gemini

To ensure the LLM respects your configured default provider, add the following to your Copilot instructions (.github/copilot-instructions.md) or AGENTS.md:

If AUDIO_MCP_DEFAULT_PROVIDER is set in the MCP config, respect that value as the default provider for audio tools. Do not specify a provider parameter unless the user explicitly requests one. :::

Audio-Enhanced Video Analysis

When the AUDIO_ENHANCE_VIDEO_ANALYSIS variable is set to auto (the default), the server automatically transcribes audio and injects the transcript into AI prompts for analyze_video and summarize_video. This produces richer answers by giving the AI both visual and spoken context.

Value	Behavior
`auto` (default)	Transcribe when an audio track is detected
`true`	Always attempt transcription
`false`	Disable audio enhancement entirely