Voice Messages for AI Chatbots

Overview

Transform your AI chatbot into a voice assistant. Send voice messages and receive spoken responses - all through Telegram or Discord. Your VPS handles the transcription and speech synthesis, making hands-free AI interaction a reality.

How It Works

Voice Message (Telegram/Discord)
    ↓
[Your VPS]
    ↓
[Whisper API - Speech to Text]
    ↓
[AI Processes Request]
    ↓
[ElevenLabs - Text to Speech]
    ↓
Voice Response Sent Back

Why Voice?

Hands-free: Use while driving, cooking, exercising
Faster input: Speak faster than you type
Natural interaction: Feels like a real assistant
Accessibility: Easier for some users
Multilingual: Speak in any language

Setup Guide

Prerequisites

OpenClaw or OpenClaw on VPS
Telegram or Discord configured
API keys for speech services

Step 1: Get API Keys

For Speech-to-Text (Whisper):

Option A: Groq - Fast and affordable
Option B: OpenAI - Original Whisper

For Text-to-Speech:

ElevenLabs - Best quality voices
Alternative: OpenAI TTS (simpler, lower quality)

Step 2: Configure Environment

# Speech-to-Text (Transcription)
VOICE_INPUT_ENABLED=true
WHISPER_PROVIDER=groq  # or 'openai'
GROQ_API_KEY=your-groq-key
# or
OPENAI_API_KEY=your-openai-key

# Text-to-Speech (Voice Output)
VOICE_OUTPUT_ENABLED=true
TTS_PROVIDER=elevenlabs
ELEVENLABS_API_KEY=your-elevenlabs-key
ELEVENLABS_VOICE_ID=your-chosen-voice

# Voice Settings
VOICE_RESPONSE_THRESHOLD=50  # Respond with voice if input was voice
AUTO_VOICE_REPLY=true  # Voice input = voice output

Step 3: Choose a Voice

ElevenLabs offers many voices. Find your voice ID:

Go to ElevenLabs
Browse Voice Library
Click a voice → copy Voice ID

Popular choices:

Rachel: Warm, professional female
Adam: Clear, friendly male
Bella: Expressive, natural female

ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM  # Rachel

Step 4: Test Voice Features

Send a voice message in Telegram: 🎤 "What's the weather like in London?"

Bot should reply with a voice message containing the answer.

Voice Configuration Options

Smart Voice Detection

Only reply with voice when user sends voice:

# Match input format
AUTO_VOICE_REPLY=true

# Or always use text
AUTO_VOICE_REPLY=false
PREFER_TEXT_RESPONSE=true

Voice Quality Settings

# ElevenLabs model
ELEVENLABS_MODEL=eleven_turbo_v2_5  # Fast
# or
ELEVENLABS_MODEL=eleven_multilingual_v2  # Best quality

# Voice settings
VOICE_STABILITY=0.5
VOICE_SIMILARITY_BOOST=0.75
VOICE_STYLE=0.5

Language Support

Whisper and ElevenLabs support multiple languages:

# Auto-detect language (recommended)
WHISPER_LANGUAGE=auto

# Or force specific language
WHISPER_LANGUAGE=en

Multilingual conversations:

Speak in Italian, get response in Italian
Mix languages in the same conversation
Better than Siri's language handling!

Creating Custom Voices

Clone Your Own Voice

ElevenLabs allows voice cloning:

Record 1-5 minutes of clear speech
Upload to ElevenLabs
Use the cloned voice ID

ELEVENLABS_VOICE_ID=your-cloned-voice-id

Use cases:

Bot speaks in your voice
Create branded voice for business
Fun personalized assistant

Voice Personas

Different voices for different contexts:

// In bot configuration
const voices = {
  default: 'rachel_voice_id',
  formal: 'professional_voice_id',
  casual: 'friendly_voice_id',
  alerts: 'urgent_voice_id'
};

function selectVoice(context) {
  if (context.isAlert) return voices.alerts;
  if (context.isBusinessHours) return voices.formal;
  return voices.default;
}

Use Cases

Morning Briefing

Wake up to a spoken summary:

MORNING_BRIEFING_VOICE=true
MORNING_BRIEFING_TIME=07:00

Bot sends audio at 7 AM: 🎤 "Good morning! Today is Monday, January 27th. You have 3 meetings: team standup at 10, client call at 2, and dentist at 4:30. Weather is 8 degrees and cloudy. Have a great day!"

Voice-Controlled Home

Speak to control your home:

🎤 "Turn off all the lights and set the thermostat to 20 degrees"

Bot responds with voice confirmation: 🎤 "Done! All lights are off and thermostat set to 20 degrees."

Hands-Free Tasks

While cooking: 🎤 "Set a timer for 15 minutes"

While driving: 🎤 "Read my last 3 emails"

While exercising: 🎤 "What's next on my todo list?"

Language Learning

Practice conversations: 🎤 "Let's practice French. Ask me questions about my day."

Bot responds in French with pronunciation you can hear.

Cost Analysis

Speech-to-Text (Whisper)

| Provider | Cost per Hour | |----------|---------------| | Groq | ~£0.05 | | OpenAI | ~£0.36 |

Typical usage: 5-10 minutes/day = £1-5/month

Text-to-Speech (ElevenLabs)

| Plan | Characters/month | Cost | |------|------------------|------| | Free | 10,000 | £0 | | Starter | 30,000 | ~£4 | | Creator | 100,000 | ~£18 |

Typical usage: 500-1000 chars/response × 50 responses = 25,000-50,000 chars/month

Total Voice Costs

Light usage: £5-10/month Heavy usage: £15-25/month

Performance Optimization

Reduce Latency

# Use fastest models
ELEVENLABS_MODEL=eleven_turbo_v2_5
WHISPER_PROVIDER=groq  # Groq is faster

# Stream responses (if supported)
VOICE_STREAMING=true

Cache Common Responses

# Cache frequently used phrases
VOICE_CACHE_ENABLED=true
VOICE_CACHE_SIZE=100

Greetings, confirmations, and common responses are cached to avoid regeneration.

Troubleshooting

No voice response

# Check API keys
pm2 logs openclaw | grep -i "elevenlabs\|voice"

# Test ElevenLabs directly
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/YOUR_VOICE_ID" \
  -H "xi-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

Poor transcription quality

Speak clearly and not too fast
Reduce background noise
Check WHISPER_LANGUAGE setting

Voice sounds robotic

Try different ElevenLabs voices
Adjust stability and similarity settings
Use multilingual model for better quality

High latency

Switch to Groq for Whisper (faster)
Use eleven_turbo model for TTS
Ensure VPS has good network to APIs

Security Considerations

Voice Data Privacy

# Don't store voice files permanently
VOICE_RETENTION_MINUTES=5

# Process and delete
DELETE_VOICE_AFTER_TRANSCRIPTION=true

Rate Limiting

# Prevent API abuse
VOICE_MESSAGES_PER_MINUTE=5
VOICE_MESSAGES_PER_DAY=100

Alternative TTS Options

OpenAI TTS

Simpler setup, lower quality:

TTS_PROVIDER=openai
OPENAI_API_KEY=your-key
OPENAI_TTS_MODEL=tts-1
OPENAI_TTS_VOICE=alloy  # alloy, echo, fable, onyx, nova, shimmer

Local TTS (Free)

For privacy-focused setups:

TTS_PROVIDER=local
LOCAL_TTS_ENGINE=piper  # or espeak

Lower quality but no API costs.

Related Guides

Need Help?

Voice integration involves multiple APIs and careful configuration. Our premium setup service includes voice features fully configured and tested.