Self-Hosted Speech with Echoline
Echoline provides fully self-hosted speech-to-text (STT) and text-to-speech (TTS) for the vowel stack. It's an optional component that eliminates external speech API dependencies.
Overview
Echoline is an OpenAI-compatible audio server that runs locally:
- STT: Uses faster-whisper for speech recognition
- TTS: Uses Kokoro for high-quality speech synthesis
- API: OpenAI-compatible
/v1/audio/*endpoints - No external dependencies: Runs entirely within your infrastructure
When to Use Echoline
Choose Echoline when:
- Data privacy requires audio to stay on your infrastructure
- You want to eliminate external API dependencies
- Lower latency is critical (local processing)
- You need offline/air-gapped operation
Use Deepgram when:
- You prefer managed, high-quality speech APIs
- GPU resources are limited or unavailable
- Simplicity of setup is prioritized over data locality
Architecture
When running with Echoline, the stack looks like this:
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Browser │────▶│ Core │────▶│ Engine │
│ (Client) │ │ (Token/API) │ │ (Realtime) │
└─────────────┘ └──────────────┘ └──────┬──────┘
│
┌──────────────┐ │ OpenAI-compatible
│ Echoline │◀───────────┘ audio API
│ (STT + TTS) │
└──────────────┘Key points:
- The engine treats Echoline as an
openai-compatibleaudio provider - Echoline runs as a separate container in the Docker Compose stack
- Core and engine remain unchanged - only the audio backend switches
- Echoline itself can use the engine for LLM completions (circular dependency for Realtime API features)
Quick Start
1. Prerequisites
GPU Mode (Recommended):
- NVIDIA GPU with 8GB+ VRAM
- NVIDIA drivers installed
- NVIDIA Container Toolkit
CPU Mode (Development Only):
- No special requirements
- Significantly slower transcription
2. Configure Environment
Edit your stack.env:
# Switch audio providers
STT_PROVIDER=openai-compatible
TTS_PROVIDER=openai-compatible
# Point to echoline container
OPENAI_COMPATIBLE_BASE_URL=http://echoline:8000/v1
# Echoline model configuration
ECHOLINE_STT_MODEL=Systran/faster-whisper-tiny
ECHOLINE_TTS_MODEL=onnx-community/Kokoro-82M-v1.0-ONNX
ECHOLINE_TTS_VOICE=af_heart
DEFAULT_VOICE=af_heart
# Echoline container settings
ECHOLINE_HOST_PORT=8000
ECHOLINE_CHAT_COMPLETION_BASE_URL=http://host.docker.internal:8787/v13. Start the Stack with Echoline
# From workspace root
cp stack/stack.env.example stack.env
# Edit stack.env with your configuration
docker compose --profile echoline up4. Verify Setup
# Check echoline health
curl http://localhost:8000/health
# Test STT
curl -X POST http://localhost:8000/v1/audio/transcriptions \
-F file=@test.wav \
-F model=Systran/faster-whisper-tiny
# Run smoke test
bun run stack:testConfiguration Reference
Engine Configuration (stack.env)
| Variable | Description | Example |
|---|---|---|
STT_PROVIDER | Set to openai-compatible | openai-compatible |
TTS_PROVIDER | Set to openai-compatible | openai-compatible |
OPENAI_COMPATIBLE_BASE_URL | Echoline URL (Docker internal) | http://echoline:8000/v1 |
ECHOLINE_STT_MODEL | Whisper model name | Systran/faster-whisper-tiny |
ECHOLINE_TTS_MODEL | Kokoro model name | onnx-community/Kokoro-82M-v1.0-ONNX |
ECHOLINE_TTS_VOICE | Default TTS voice | af_heart |
DEFAULT_VOICE | Engine default voice | af_heart |
Echoline Container Configuration
| Variable | Description | Default |
|---|---|---|
ECHOLINE_HOST_PORT | Host port mapping | 8000 |
ECHOLINE_CHAT_COMPLETION_BASE_URL | LLM backend URL | http://host.docker.internal:8787/v1 |
ECHOLINE_CHAT_COMPLETION_API_KEY | API key for LLM | ${ENGINE_API_KEY} |
HF_TOKEN | HuggingFace token (optional) | - |
ECHOLINE_LOG_LEVEL | Logging verbosity | INFO |
Model Selection
STT Models (faster-whisper)
| Model | Size | VRAM Required | Quality | Use Case |
|---|---|---|---|---|
tiny | ~400MB | 2GB | Good | Development, testing |
small | ~900MB | 3GB | Better | Balanced quality/speed |
base | ~1.5GB | 4GB | Good | - |
medium | ~5GB | 8GB | Best | Production quality |
large-v3 | ~6GB | 10GB | Excellent | Maximum accuracy |
Set in stack.env:
ECHOLINE_STT_MODEL=Systran/faster-whisper-smallTTS Voices (Kokoro)
Kokoro voices use the format {gender}{number}_{name}:
| Voice | Gender | Description |
|---|---|---|
af_heart | Female | Default, warm |
af_bella | Female | Clear, professional |
af_nicole | Female | Natural, conversational |
am_adam | Male | Deep, authoritative |
am_michael | Male | Warm, friendly |
Set in stack.env:
ECHOLINE_TTS_VOICE=af_heart
DEFAULT_VOICE=af_heartDocker Compose Profiles
The stack supports multiple deployment profiles:
Default (Core + Engine + Deepgram)
docker compose upUses hosted Deepgram for STT/TTS. Requires DEEPGRAM_API_KEY.
Echoline Profile (Fully Self-Hosted)
docker compose --profile echoline upIncludes Echoline container for local STT/TTS. Requires GPU or CPU mode.
Full Self-Hosted Profile
docker compose --profile full-self-hosted upAlias for the full self-hosted deployment.
CPU-Only Mode
For development without a GPU:
- Edit
docker-compose.ymlor createdocker-compose.override.yml:
services:
echoline:
image: ghcr.io/vowel/echoline:latest-cpu
deploy:
resources: {} # Remove GPU reservation- Use a smaller model:
ECHOLINE_STT_MODEL=Systran/faster-whisper-tinyWarning: CPU mode is ~10x slower than GPU for real-time transcription. Not recommended for production voice interactions.
Troubleshooting
Echoline Won't Start
Check NVIDIA setup:
# Verify drivers
nvidia-smi
# Test Docker GPU access
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smiIf no GPU, switch to CPU image:
image: ghcr.io/vowel/echoline:latest-cpuSlow Transcription
- GPU: Check
nvidia-smi- GPU being used? - Model: Use a smaller model (tiny vs small)
- CPU mode: Expected to be slow; upgrade to GPU for production
Model Download Fails
# Check disk space
docker system df
# Check echoline logs
docker logs vowel-echoline
# Set HF_TOKEN for gated models
HF_TOKEN=your_huggingface_tokenAudio Quality Issues
- Check audio format: Echoline expects PCM16, 16kHz for STT
- Verify voice name is valid: use exact Kokoro voice ID
- Test directly:
curl http://localhost:8000/v1/audio/speech ...
Engine Can't Connect to Echoline
# Check both services are on same network
docker network ls
docker network inspect vowel-self-hosted_default
# Test from engine container
docker exec -it vowel-engine wget -qO- http://echoline:8000/health
# Verify OPENAI_COMPATIBLE_BASE_URL uses container name, not localhost
OPENAI_COMPATIBLE_BASE_URL=http://echoline:8000/v1 # CorrectPerformance Tuning
GPU Optimization
- Use CUDA 12.x for best compatibility
- Ensure models fit in GPU memory (watch
nvidia-smi) - Share GPU between echoline and other services if VRAM permits
Model Warmup
First transcription is slower due to model loading. Keep Echoline running for consistent performance.
Caching
Models are cached in the echoline-cache Docker volume:
# View cache location
docker volume inspect vowel-self-hosted_echoline-cache
# To reset models (if corrupted)
docker compose down -v # WARNING: deletes all dataComparison: Echoline vs Deepgram
| Feature | Echoline | Deepgram |
|---|---|---|
| Setup Complexity | Higher (GPU/Docker) | Lower (API key only) |
| Latency | Lower (local) | Higher (network) |
| Data Privacy | Complete (on-prem) | Hosted (transmission) |
| Cost | Infrastructure | Per-usage |
| Quality | Good (Whisper/Kokoro) | Excellent (Nova/Aura) |
| Offline Operation | Yes | No |
| Maintenance | Self-managed | Managed |
Migration: Deepgram to Echoline
- Set
STT_PROVIDER=openai-compatibleandTTS_PROVIDER=openai-compatible - Add
OPENAI_COMPATIBLE_BASE_URL=http://echoline:8000/v1 - Configure
ECHOLINE_*models and voices - Remove
DEEPGRAM_API_KEY(optional, for fallback) - Start with
--profile echoline - Test:
bun run stack:test