Voice Messages for AI Chatbots
Add voice input and output to your AI bot. Set up speech-to-text and text-to-speech for OpenClaw and OpenClaw on your VPS.
Overview
Transform your AI chatbot into a voice assistant. Send voice messages and receive spoken responses - all through Telegram or Discord. Your VPS handles the transcription and speech synthesis, making hands-free AI interaction a reality.
How It Works
Voice Message (Telegram/Discord)
↓
[Your VPS]
↓
[Whisper API - Speech to Text]
↓
[AI Processes Request]
↓
[ElevenLabs - Text to Speech]
↓
Voice Response Sent Back
Why Voice?
- Hands-free: Use while driving, cooking, exercising
- Faster input: Speak faster than you type
- Natural interaction: Feels like a real assistant
- Accessibility: Easier for some users
- Multilingual: Speak in any language
Setup Guide
Prerequisites
- OpenClaw or OpenClaw on VPS
- Telegram or Discord configured
- API keys for speech services
Step 1: Get API Keys
For Speech-to-Text (Whisper):
For Text-to-Speech:
- ElevenLabs - Best quality voices
- Alternative: OpenAI TTS (simpler, lower quality)
Step 2: Configure Environment
# Speech-to-Text (Transcription)
VOICE_INPUT_ENABLED=true
WHISPER_PROVIDER=groq # or 'openai'
GROQ_API_KEY=your-groq-key
# or
OPENAI_API_KEY=your-openai-key
# Text-to-Speech (Voice Output)
VOICE_OUTPUT_ENABLED=true
TTS_PROVIDER=elevenlabs
ELEVENLABS_API_KEY=your-elevenlabs-key
ELEVENLABS_VOICE_ID=your-chosen-voice
# Voice Settings
VOICE_RESPONSE_THRESHOLD=50 # Respond with voice if input was voice
AUTO_VOICE_REPLY=true # Voice input = voice output
Step 3: Choose a Voice
ElevenLabs offers many voices. Find your voice ID:
- Go to ElevenLabs
- Browse Voice Library
- Click a voice → copy Voice ID
Popular choices:
- Rachel: Warm, professional female
- Adam: Clear, friendly male
- Bella: Expressive, natural female
ELEVENLABS_VOICE_ID=21m00Tcm4TlvDq8ikWAM # Rachel
Step 4: Test Voice Features
Send a voice message in Telegram: 🎤 "What's the weather like in London?"
Bot should reply with a voice message containing the answer.
Voice Configuration Options
Smart Voice Detection
Only reply with voice when user sends voice:
# Match input format
AUTO_VOICE_REPLY=true
# Or always use text
AUTO_VOICE_REPLY=false
PREFER_TEXT_RESPONSE=true
Voice Quality Settings
# ElevenLabs model
ELEVENLABS_MODEL=eleven_turbo_v2_5 # Fast
# or
ELEVENLABS_MODEL=eleven_multilingual_v2 # Best quality
# Voice settings
VOICE_STABILITY=0.5
VOICE_SIMILARITY_BOOST=0.75
VOICE_STYLE=0.5
Language Support
Whisper and ElevenLabs support multiple languages:
# Auto-detect language (recommended)
WHISPER_LANGUAGE=auto
# Or force specific language
WHISPER_LANGUAGE=en
Multilingual conversations:
- Speak in Italian, get response in Italian
- Mix languages in the same conversation
- Better than Siri's language handling!
Creating Custom Voices
Clone Your Own Voice
ElevenLabs allows voice cloning:
- Record 1-5 minutes of clear speech
- Upload to ElevenLabs
- Use the cloned voice ID
ELEVENLABS_VOICE_ID=your-cloned-voice-id
Use cases:
- Bot speaks in your voice
- Create branded voice for business
- Fun personalized assistant
Voice Personas
Different voices for different contexts:
// In bot configuration
const voices = {
default: 'rachel_voice_id',
formal: 'professional_voice_id',
casual: 'friendly_voice_id',
alerts: 'urgent_voice_id'
};
function selectVoice(context) {
if (context.isAlert) return voices.alerts;
if (context.isBusinessHours) return voices.formal;
return voices.default;
}
Use Cases
Morning Briefing
Wake up to a spoken summary:
MORNING_BRIEFING_VOICE=true
MORNING_BRIEFING_TIME=07:00
Bot sends audio at 7 AM: 🎤 "Good morning! Today is Monday, January 27th. You have 3 meetings: team standup at 10, client call at 2, and dentist at 4:30. Weather is 8 degrees and cloudy. Have a great day!"
Voice-Controlled Home
Speak to control your home:
🎤 "Turn off all the lights and set the thermostat to 20 degrees"
Bot responds with voice confirmation: 🎤 "Done! All lights are off and thermostat set to 20 degrees."
Hands-Free Tasks
While cooking: 🎤 "Set a timer for 15 minutes"
While driving: 🎤 "Read my last 3 emails"
While exercising: 🎤 "What's next on my todo list?"
Language Learning
Practice conversations: 🎤 "Let's practice French. Ask me questions about my day."
Bot responds in French with pronunciation you can hear.
Cost Analysis
Speech-to-Text (Whisper)
| Provider | Cost per Hour | |----------|---------------| | Groq | ~£0.05 | | OpenAI | ~£0.36 |
Typical usage: 5-10 minutes/day = £1-5/month
Text-to-Speech (ElevenLabs)
| Plan | Characters/month | Cost | |------|------------------|------| | Free | 10,000 | £0 | | Starter | 30,000 | ~£4 | | Creator | 100,000 | ~£18 |
Typical usage: 500-1000 chars/response × 50 responses = 25,000-50,000 chars/month
Total Voice Costs
Light usage: £5-10/month Heavy usage: £15-25/month
Performance Optimization
Reduce Latency
# Use fastest models
ELEVENLABS_MODEL=eleven_turbo_v2_5
WHISPER_PROVIDER=groq # Groq is faster
# Stream responses (if supported)
VOICE_STREAMING=true
Cache Common Responses
# Cache frequently used phrases
VOICE_CACHE_ENABLED=true
VOICE_CACHE_SIZE=100
Greetings, confirmations, and common responses are cached to avoid regeneration.
Troubleshooting
No voice response
# Check API keys
pm2 logs openclaw | grep -i "elevenlabs\|voice"
# Test ElevenLabs directly
curl -X POST "https://api.elevenlabs.io/v1/text-to-speech/YOUR_VOICE_ID" \
-H "xi-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}'
Poor transcription quality
- Speak clearly and not too fast
- Reduce background noise
- Check
WHISPER_LANGUAGEsetting
Voice sounds robotic
- Try different ElevenLabs voices
- Adjust stability and similarity settings
- Use multilingual model for better quality
High latency
- Switch to Groq for Whisper (faster)
- Use
eleven_turbomodel for TTS - Ensure VPS has good network to APIs
Security Considerations
Voice Data Privacy
# Don't store voice files permanently
VOICE_RETENTION_MINUTES=5
# Process and delete
DELETE_VOICE_AFTER_TRANSCRIPTION=true
Rate Limiting
# Prevent API abuse
VOICE_MESSAGES_PER_MINUTE=5
VOICE_MESSAGES_PER_DAY=100
Alternative TTS Options
OpenAI TTS
Simpler setup, lower quality:
TTS_PROVIDER=openai
OPENAI_API_KEY=your-key
OPENAI_TTS_MODEL=tts-1
OPENAI_TTS_VOICE=alloy # alloy, echo, fable, onyx, nova, shimmer
Local TTS (Free)
For privacy-focused setups:
TTS_PROVIDER=local
LOCAL_TTS_ENGINE=piper # or espeak
Lower quality but no API costs.
Related Guides
Need Help?
Voice integration involves multiple APIs and careful configuration. Our premium setup service includes voice features fully configured and tested.
Need a VPS for Your Bot?
We recommend Hostinger KVM 2 VPS - reliable, fast, and perfect for AI chatbots. Get started with our recommended setup.
Get Hostinger VPSNeed Help With Setup?
Got your VPS? Let us handle the technical work. Professional setup and maintenance for OpenClaw (formerly Clawd.bot).