Assistant Modes
Caller AI assistants can operate in three voice processing modes, each with different latency, voice options, and ideal use cases.
1. Pipeline Mode
High Logic
How it works: Speech-to-text → LLM → text-to-speech
Latency: ~800 to 1500 ms
Strengths
- Handles complex reasoning
- Supports long, multi-sentence replies
- Full access to all voices (including custom cloned)
- Clean variable use and structured conversation control
Best For
- Deep support conversations
- Detailed explanations
- Brand-specific voice requirements
2. Speech-to-Speech
Fastest
How it works: Direct speech input → multimodal model → speech output (no intermediate text stage)
Latency: ~300 to 600 ms (very fast)
Strengths
- Most natural humanlike flow
- Ultra-low response time
- Great for short, reactive interactions
- More expressive intonation
Best For
- Fast-paced sales calls
- Appointment confirmations
- Short back-and-forth conversations
3. Dualplex Mode
Recommended
How it works: Multimodal processing + ElevenLabs TTS for output. Best of both worlds.
Latency: Low, but varies based on selected voice.
Strengths
- Faster like speech-to-speech
- Higher audio quality using ElevenLabs voices
- Supports custom voice cloning
- More expressive, natural delivery
Best For
- Premium/cloned voice + low latency
- Most general business use cases
- Sales and support where quality + speed matter
Switching Modes
You can switch modes anytime under Assistant → Settings → Voice Engine.
Tip: Run the same call in different modes to compare speed and tone before choosing.