Assistant Modes

Caller AI assistants can operate in three voice processing modes, each with different latency, voice options, and ideal use cases.

1. Pipeline Mode High Logic

How it works: Speech-to-text → LLM → text-to-speech
Latency: ~800 to 1500 ms

Strengths

  • Handles complex reasoning
  • Supports long, multi-sentence replies
  • Full access to all voices (including custom cloned)
  • Clean variable use and structured conversation control

Best For

  • Deep support conversations
  • Detailed explanations
  • Brand-specific voice requirements
2. Speech-to-Speech Fastest

How it works: Direct speech input → multimodal model → speech output (no intermediate text stage)
Latency: ~300 to 600 ms (very fast)

Strengths

  • Most natural humanlike flow
  • Ultra-low response time
  • Great for short, reactive interactions
  • More expressive intonation

Best For

  • Fast-paced sales calls
  • Appointment confirmations
  • Short back-and-forth conversations
3. Dualplex Mode Recommended

How it works: Multimodal processing + ElevenLabs TTS for output. Best of both worlds.
Latency: Low, but varies based on selected voice.

Strengths

  • Faster like speech-to-speech
  • Higher audio quality using ElevenLabs voices
  • Supports custom voice cloning
  • More expressive, natural delivery

Best For

  • Premium/cloned voice + low latency
  • Most general business use cases
  • Sales and support where quality + speed matter

Switching Modes

You can switch modes anytime under Assistant → Settings → Voice Engine.

Tip: Run the same call in different modes to compare speed and tone before choosing.