analyze_video
Ask any question about a video in natural language. The server extracts frames and optionally transcribes the audio, then sends everything to an AI provider to produce an answer grounded in the actual video.
Parameters
Section titled “Parameters”| Parameter | Type | Required | Description |
|---|---|---|---|
videoPath | string | ✓ | Path to the video, file:// URI, or remote URL |
question | string | ✓ | The question to ask about the video |
provider | string | — | Override the AI provider: glm, gemini, qwen, kimi, mimo |
Usage Examples
Section titled “Usage Examples”General question:
“Analyze the video at
./demo.mp4and tell me what happens in it”
Specific detail:
“What programming language is being demonstrated in
./tutorial.mp4?”
Remote video:
“Analyze this video:
https://example.com/product-demo.mp4”
YouTube video:
“What are the main topics covered in
https://www.youtube.com/watch?v=abc123?”
With a specific provider:
“Analyze
./video.mp4using Kimi — describe each scene in detail”
Audio-Enhanced Analysis
Section titled “Audio-Enhanced Analysis”When the video contains an audio track, the server can automatically transcribe it and inject the transcript into the AI prompt for richer answers. This is controlled by the AUDIO_ENHANCE_VIDEO_ANALYSIS environment variable:
| Value | Behavior |
|---|---|
auto (default) | Transcribe when an audio track is detected |
true | Always attempt transcription |
false | Disable audio enhancement |
Audio enhancement works with GLM, Kimi, Qwen, and MiMo. Gemini uploads the full video natively (audio included) and does not need this enhancement.