text_to_speech
Convert any text into natural-sounding speech using the MiniMax TTS API. The generated audio is returned inline in the MCP response and also saved to disk.
Parameters
Section titled “Parameters”| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
text | string | ✓ | — | Text to synthesize (max 10,000 chars) |
model | string | — | speech-02-hd | TTS model: speech-2.8-hd, speech-2.8-turbo, speech-02-hd, speech-02-turbo |
voice_id | string | — | female-shaonv | Voice ID. See the MiniMax voice list |
speed | number | — | 1.0 | Playback speed [0.5, 2.0] |
vol | number | — | 1.0 | Volume (0, 10] |
pitch | integer | — | 0 | Pitch adjustment [-12, 12] |
emotion | string | — | — | Emotional tone: happy, sad, angry, fearful, disgusted, surprised, calm |
format | string | — | mp3 | Output format: mp3, wav, flac, pcm |
sample_rate | integer | — | 32000 | Sample rate (Hz): 8000, 16000, 22050, 24000, 32000, 44100 |
bitrate | integer | — | 128000 | Audio bitrate: 32000, 64000, 128000, 256000 |
channel | integer | — | 1 | Channels: 1 = mono, 2 = stereo |
language_boost | string | — | — | Language-specific enhancement, e.g. English, Chinese, auto |
Usage Examples
Section titled “Usage Examples”Basic narration:
“Read this paragraph aloud: ‘In the depths of the ocean, creatures of light dance through the darkness.’”
Custom voice and emotion:
“Convert this text to speech using a calm male voice: ‘Welcome to our annual product review.’”
High-quality with specific settings:
“Generate speech from this script at 0.9× speed, stereo, 44100 Hz sample rate.”
Non-English text:
“Narrate this Chinese text with
language_boostset to Chinese.”
Response
Section titled “Response”Returns an audio content part (base64-encoded) with the chosen format, plus a text part containing the saved file path, duration (ms), format, and sample rate.
- The
speech-2.8-hdandspeech-2.8-turbomodels are the latest generation;speech-02-hd/speech-02-turboare previous-generation alternatives. - Not all emotions are supported by every voice. If an unsupported combination is requested, the API ignores the
emotionparameter. - Generated files are saved to a temporary directory and cleaned up according to your cache settings.