Skip to content

analyze_video

Ask any question about a video in natural language. The server extracts frames and optionally transcribes the audio, then sends everything to an AI provider to produce an answer grounded in the actual video.

ParameterTypeRequiredDescription
videoPathstringPath to the video, file:// URI, or remote URL
questionstringThe question to ask about the video
providerstringOverride the AI provider: glm, gemini, qwen, kimi, mimo

General question:

“Analyze the video at ./demo.mp4 and tell me what happens in it”

Specific detail:

“What programming language is being demonstrated in ./tutorial.mp4?”

Remote video:

“Analyze this video: https://example.com/product-demo.mp4

YouTube video:

“What are the main topics covered in https://www.youtube.com/watch?v=abc123?”

With a specific provider:

“Analyze ./video.mp4 using Kimi — describe each scene in detail”

When the video contains an audio track, the server can automatically transcribe it and inject the transcript into the AI prompt for richer answers. This is controlled by the AUDIO_ENHANCE_VIDEO_ANALYSIS environment variable:

ValueBehavior
auto (default)Transcribe when an audio track is detected
trueAlways attempt transcription
falseDisable audio enhancement

Audio enhancement works with GLM, Kimi, Qwen, and MiMo. Gemini uploads the full video natively (audio included) and does not need this enhancement.