Text-to-Speech

Convert any text into natural, lifelike speech using multiple providers, custom voices, and emotion enhancement. Generate high-quality audio for your AI agents, content, or applications.

Overview

Dotclone's Text-to-Speech (TTS) engine lets you convert text into human-like speech instantly. Whether you're building voice agents, creating audio content, or adding speech to your app, TTS provides:

Multiple Providers

Choose from ElevenLabs, MiniMax, OpenAI, and more

Voice Cloning

Clone any voice with 10-30 seconds of audio

Emotion & Enhancement

Add emotions, whispers, laughs, and more

50+ Languages

Generate speech in multiple languages and accents

TTS Interface Overview The Text-to-Speech interface in Dotclone

How to Generate Speech

Follow these steps to generate your first speech audio:

1

Enter Your Text

Type or paste the text you want to convert.

Navigate to Text-to-Speech in the sidebar. Enter your text in the large input area. You can enter anything from a single sentence to multiple paragraphs.

Enter text in TTS Enter your text in the input area
2

Select a Provider

Choose your preferred TTS provider.

Click on the provider dropdown and select from the available options. Each provider has unique voice qualities and features.

Select TTS provider Choose your TTS provider
3

Choose a Voice

Select a voice that fits your needs.

Browse the available voices for your selected provider. You can preview voices before selecting. Each provider has its own set of built-in voices.

Choose voice Browse and select a voice
Preview voice Preview a voice before selecting
4

Generate & Download

Create your audio and save it.

Click the Generate button. Once complete, you can play the audio, download it, or share it.

Generate button Click Generate to create your audio
Download audio Download or share your generated audio

Writing Better Text for TTS

The quality of your output depends heavily on how you write your input text. Here are tips for better results:

Use Punctuation

Proper punctuation creates natural pauses. Use commas, periods, and ellipses for pacing.

✓ "Hello, welcome to Dotclone. How can I help you today?"

Break Long Sentences

Short sentences sound more natural. Break complex sentences into smaller parts.

✗ "Our company which was founded in 2020 provides AI voice solutions for businesses of all sizes." ✓ "Our company was founded in 2020. We provide AI voice solutions for businesses of all sizes."

Spell Out Abbreviations

Write out abbreviations and acronyms for proper pronunciation.

✗ "Call our HQ ASAP" ✓ "Call our headquarters as soon as possible"

Use Phonetic Spelling

For unusual names or words, spell them phonetically.

✓ "Welcome to Dotclone (dot-clone)"
Well-formatted text example Example of well-formatted text for TTS

Providers

Dotclone integrates with multiple TTS providers. Each provider has its own voices, models, and settings. Choose based on your quality, speed, and cost requirements.

Provider Models Best For Features
ElevenLabs Recommended eleven_v3
eleven_multilingual_v2
eleven_turbo_v2_5
eleven_flash_v2_5
Highest quality, expressive voices Emotion enhancement, voice cloning
MiniMax speech-2.8-turbo
speech-2.8-hd
speech-2.6-turbo
speech-2.6-hd
Emotion brackets, sound effects Background noise, emotion tags
OpenAI tts-1
tts-1-hd
gpt-4o-mini-tts
Fast, reliable, good quality Speed control, multiple voices
Providers dropdown Select a provider from the dropdown
Provider models Each provider has multiple models to choose from
Provider-Specific Voices
Each provider has its own built-in voices. When you switch providers, the voice list will update to show only voices available for that provider.

Voices

Each provider offers a unique set of built-in voices. Browse, preview, and select the voice that best fits your use case.

Built-in Voices

Built-in voices are pre-trained voices provided by each TTS provider. They cover various genders, accents, ages, and speaking styles.

Voices list Browse available voices for your selected provider
Voice details View voice details including language and style

Selecting a Voice

Click on any voice to select it. You can preview the voice by clicking the play button next to each voice name before making your selection.

Select a voice Click to select a voice

Voice Cloning

Clone any voice with just 10-30 seconds of audio. Cloned voices can be used with any TTS provider in Dotclone.

Cross-Provider Compatibility
Once you clone a voice, it becomes available across all providers. Select any provider, and your cloned voice will appear in the voice list alongside built-in voices.

How to Clone a Voice

1

Go to Voice Cloning

Navigate to the Voice Cloning section.

Voice cloning navigation Click Voice Cloning in the sidebar
2

Upload Audio Sample

Upload 10-30 seconds of clear audio.

For best results, use audio that is:

  • Clear speech without background noise
  • Single speaker only
  • 10-30 seconds in length
  • Natural speaking pace (not reading)
Upload audio sample Upload your audio sample
3

Name Your Voice

Give your cloned voice a descriptive name.

Name cloned voice Enter a name for your cloned voice
4

Clone & Use

Process the voice and start using it.

Click Clone Voice to process. Once complete, your cloned voice will appear in the voice dropdown when using TTS.

Voice cloning complete Your cloned voice is ready to use
Cloned voice in TTS Select your cloned voice in the TTS interface

Deleting a Cloned Voice

To delete a cloned voice, go to Voice Cloning, find the voice you want to remove, and click the Delete button.

Delete cloned voice Click the delete button to remove a cloned voice
Deletion is Permanent
Once deleted, a cloned voice cannot be recovered. You'll need to upload a new audio sample to create it again.

Enhance & Emotion

Add emotions, expressions, and style to your generated speech. Make your audio more engaging and natural-sounding.

Supported Models
Emotion enhancement is only supported by these models:
  • ElevenLabs: eleven_v3
  • MiniMax: speech-2.8-turbo, speech-2.8-hd

Using Emotion Brackets

Add emotion tags in square brackets directly in your text. The TTS engine will interpret these and apply the corresponding emotion to the speech.

Syntax

[emotion] Your text here

Examples

[excited] Wow, this is amazing! [laughs] I can't believe it worked! [whispers] Don't tell anyone...

[sad] I'm sorry to hear that happened.

[angry] This is completely unacceptable!

[cheerful] Good morning! How can I help you today?

Available Emotion Tags

[excited] [sad] [angry] [cheerful] [whispers] [laughs] [sighs] [surprised] [nervous] [calm] [serious] [friendly]
Text with emotion brackets Add emotion brackets directly in your text
Enhance button Use the Enhance button for automatic emotion suggestions

Provider Settings

Each provider and model has specific settings you can adjust to fine-tune your audio output.

ElevenLabs Settings

eleven_v3

Setting Range Description
Stability 0% - 100% Higher = more consistent, Lower = more expressive

eleven_multilingual_v2, eleven_turbo_v2_5, eleven_flash_v2_5

Setting Range Description
Speed 0.5x - 2.0x Playback speed of the generated audio
Stability 0% - 100% Voice consistency vs expressiveness
Clarity 0% - 100% Clarity and enhancement of the voice

MiniMax & OpenAI Settings

speech-2.8-turbo, speech-2.8-hd, speech-2.6-turbo, speech-2.6-hd, tts-1, tts-1-hd, gpt-4o-mini-tts

Setting Range Description
Speed 0.5x - 2.0x Playback speed
Pitch -12 to +12 Voice pitch (higher or lower)
Intensity 0% - 100% Emotional intensity
Timber 0% - 100% Voice warmth and tone color
TTS settings panel Adjust provider-specific settings in the settings panel
Settings sliders Use sliders to fine-tune your audio output

Sound Effects & Background Noise

Add ambient sounds and background noise to make your audio more immersive. This feature is available for MiniMax provider only.

MiniMax Only
Background noise and sound effects are only available when using MiniMax as your TTS provider.

Available Background Sounds

Choose from a variety of ambient sounds to mix with your generated speech:

🏢 Office Ambience
Coffee Shop
🚗 Traffic / Street
🌧️ Rain
🌊 Ocean Waves
🌲 Forest / Nature
✈️ Airplane Cabin
🏠 Home Interior
Background noise selection Select a background sound from the dropdown
Mix volume control Adjust the mix level between speech and background

Supported Languages

Dotclone TTS supports over 50 languages and accents across all providers. Language availability may vary by provider and voice.

Popular Languages

  • English (US, UK, AU)
  • Spanish (ES, MX)
  • French (FR, CA)
  • German
  • Italian
  • Portuguese (BR, PT)
  • Japanese
  • Korean
  • Chinese (Mandarin)
  • Arabic

European Languages

  • Dutch
  • Polish
  • Russian
  • Swedish
  • Norwegian
  • Danish
  • Finnish
  • Greek
  • Czech
  • Romanian

Asian Languages

  • Hindi
  • Thai
  • Vietnamese
  • Indonesian
  • Malay
  • Filipino
  • Tamil
  • Bengali
  • Urdu
  • Turkish
Language selection Select language when choosing a voice

Output Formats

Choose your preferred audio format when generating or downloading speech.

Format Extension Best For Quality
MP3 .mp3 General use, web, sharing Good (compressed)
WAV .wav Professional audio, editing Highest (uncompressed)
OGG .ogg Web applications, games Good (compressed)
FLAC .flac Archival, lossless storage Highest (lossless)
Output format selection Select your preferred output format before downloading

TTS History

Every speech you generate is automatically saved to your history. Access, replay, or download any previously generated audio.

Accessing Your History

Click the History tab in the TTS interface to view all your previously generated audio files.

History tab Click the History tab to view past generations
History list Browse your generated audio history

History Features

  • Replay: Listen to any previous generation
  • Download: Download the audio file again
  • Copy Text: Copy the original text used
  • Regenerate: Generate again with same or different settings
  • Delete: Remove from history
History actions Actions available for each history item
Storage Limits
History storage depends on your plan. Free accounts store the last 50 generations. Paid plans have higher or unlimited storage.

API Usage

Generate speech programmatically using the Dotclone API or SDK.

Basic TTS Generation

from dotclone import Dotclone

client = Dotclone()

# Generate speech
audio = client.tts.generate(
    text="Hello! Welcome to Dotclone.",
    provider="elevenlabs",
    model="eleven_v3",
    voice="emma",
    settings={
        "stability": 75
    }
)

# Save the audio
audio.save("welcome.mp3")

# Or get the URL
print(audio.url)

With Emotion Enhancement

# Using emotion brackets
audio = client.tts.generate(
    text="[excited] Wow, this is amazing! [laughs] I love it!",
    provider="elevenlabs",
    model="eleven_v3",
    voice="emma"
)

audio.save("excited_speech.mp3")

Using a Cloned Voice

# Use your cloned voice
audio = client.tts.generate(
    text="This is my cloned voice speaking.",
    provider="elevenlabs",
    voice="my-cloned-voice-id"  # Your cloned voice ID
)

audio.save("cloned_voice.mp3")

Basic TTS Generation

import Dotclone from 'dotclone';

const client = new Dotclone();

// Generate speech
const audio = await client.tts.generate({
  text: "Hello! Welcome to Dotclone.",
  provider: "elevenlabs",
  model: "eleven_v3",
  voice: "emma",
  settings: {
    stability: 75
  }
});

// Get the audio URL
console.log(audio.url);

// Or download
await audio.download("welcome.mp3");

With Emotion Enhancement

// Using emotion brackets
const audio = await client.tts.generate({
  text: "[excited] Wow, this is amazing! [laughs] I love it!",
  provider: "elevenlabs",
  model: "eleven_v3",
  voice: "emma"
});

console.log(audio.url);

Basic TTS Generation

curl -X POST "https://api.dotclone.com/v1/tts/generate" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to Dotclone.",
    "provider": "elevenlabs",
    "model": "eleven_v3",
    "voice": "emma",
    "settings": {
      "stability": 75
    }
  }'

Response

{
  "id": "tts_abc123",
  "url": "https://cdn.dotclone.com/audio/tts_abc123.mp3",
  "duration": 2.5,
  "format": "mp3",
  "created_at": "2025-01-15T10:30:00Z"
}

For complete API documentation, see the TTS API Reference.