Voice Selection & Cloning
The voice of your assistant determines the personality of your brand. Caller AI offers a vast library of high-quality pre-made voices, as well as the ability to clone your own voice for a fully personalized experience.
Selecting a Standard Voice
Caller AI integrates with top-tier voice providers to give you access to hundreds of accents, ages, and tones. To select a voice:
- Navigate to your Assistant Settings.
- Click on the Voice tab.
- Use the filters to search by Language, Gender, or Accent (e.g., American, British, Australian).
- Click the "Play" icon to preview the voice before saving.
Instant Voice Cloning
You can clone your own voice or the voice of a team member to create a digital replica. This is ideal for founders, sales leaders, or specific brand personas.
How to Clone a Voice
- Step 1: Record Samples. You will need 1 to 3 minutes of high-quality audio. The speaker should talk naturally, avoiding monotone reading.
- Step 2: Upload to Caller AI. Go to Voices -> Add New -> Instant Clone.
- Step 3: Verification. You may be asked to record a specific verification phrase to prove you have the rights to use this voice.
Do not upload audio with background music, heavy echo, or other people talking. Dirty audio results in a "robotic" or "glitchy" clone.
Fine-Tuning Voice Settings
Once you have selected or cloned a voice, you can tweak its performance characteristics to better suit your use case.
| Setting | Description |
|---|---|
| Stability | Determines how consistent the voice sounds. High: Monotone, consistent. Low: More expressive, but can be unpredictable. |
| Similarity Boost | (Cloned voices only) How closely the AI tries to match the original audio sample. Note: Setting this too high can introduce audio artifacts. |
| Speed | Adjusts the speaking rate. 0.9x - 1.0x is usually best for support. 1.1x works well for high-energy sales. |
Matching Voice to Use Case
For Cold Outbound: Use a casual, slightly faster voice with lower stability (more expression) to sound like a real person calling on a cell phone.
For Inbound Support: Use a clear, deep, or authoritative voice with higher stability to convey trust and professionalism.