Assistant Modes

Caller AI assistants can operate in three voice processing modes, each with different latency, voice options, and ideal use cases.

1. Pipeline Mode High Logic

How it works: Speech-to-text → LLM → text-to-speech
Latency: ~800 to 1500 ms

Strengths

Handles complex reasoning
Supports long, multi-sentence replies
Full access to all voices (including custom cloned)
Clean variable use and structured conversation control

Best For

Deep support conversations
Detailed explanations
Brand-specific voice requirements

2. Speech-to-Speech Fastest

How it works: Direct speech input → multimodal model → speech output (no intermediate text stage)
Latency: ~300 to 600 ms (very fast)

Strengths

Most natural humanlike flow
Ultra-low response time
Great for short, reactive interactions
More expressive intonation

Best For

Fast-paced sales calls
Appointment confirmations
Short back-and-forth conversations

3. Dualplex Mode Recommended

How it works: Multimodal processing + ElevenLabs TTS for output. Best of both worlds.
Latency: Low, but varies based on selected voice.

Strengths

Faster like speech-to-speech
Higher audio quality using ElevenLabs voices
Supports custom voice cloning
More expressive, natural delivery

Best For

Premium/cloned voice + low latency
Most general business use cases
Sales and support where quality + speed matter

Switching Modes

You can switch modes anytime under Assistant → Settings → Voice Engine.

Tip: Run the same call in different modes to compare speed and tone before choosing.