Self-Hosted

Self-hosted deployment lets you run vowel on infrastructure you control.

Beta Release

This open-source release is in beta. You may encounter rough edges, incomplete features, or breaking changes. We are actively reviewing and merging community PRs, but please expect some instability as we iterate toward a stable release. Your feedback and contributions are welcome.

Who This Is For

Choose self-hosted when you want:

Your own deployment boundary
Your own token issuance path
Custom networking, auth, or backend policy
Operator control over runtime configuration
Data privacy with fully offline operation (optional)

What The Self-Hosted Stack Includes

The self-hosted stack includes these services:

Service	Default URL	Purpose
Core	http://localhost:3000	Token issuance, app management, Web UI
Realtime Engine	ws://localhost:8787/v1/realtime	Voice AI WebSocket (OpenAI-compatible)
Echoline (optional)	http://localhost:8000	Self-hosted STT/TTS with faster-whisper + Kokoro

Optional:

Service	Default URL	Purpose
Echoline	http://localhost:8000	Self-hosted STT/TTS (no external APIs)

Your application typically talks to Core or your own backend to get a token, then connects to the realtime engine with that token.

Deployment Options

Option 1: Deepgram-Powered (Default - Recommended)

Uses hosted STT/TTS from Deepgram. Works on all machines (no GPU required).

Pros: Fast setup, professional-grade quality, no model downloads
Cons: Requires Deepgram API key, ongoing API costs
Requirements: Deepgram API key + LLM provider key (Groq or OpenRouter)
Command: bun run stack:up

Option 2: Fully Self-Hosted with Echoline

Local speech processing with faster-whisper + Kokoro. Requires NVIDIA GPU.

Pros: No external APIs, data privacy, works offline, no API costs
Cons: Requires GPU, ~5GB disk space, slower initial startup
Requirements: NVIDIA GPU with 8GB+ VRAM
Command: bun run stack:up:full

Option 3: GPU-Accelerated (NVIDIA GPU Only)

Uses GPU for lower VAD latency with Deepgram quality.

Requirements: NVIDIA GPU + Container Toolkit
Command: bun run stack:up:gpu

Command Reference

Common stack management commands:

Command	Description
`bun run stack:up`	Start CPU stack (default)
`bun run stack:up:gpu`	Start with GPU acceleration
`bun run stack:up:full`	Start with Echoline (self-hosted STT/TTS)
`bun run stack:down`	Stop and remove containers
`bun run stack:logs`	View service logs
`bun run stack:build`	Rebuild container images
`bun run stack:test`	Run smoke tests

Hosted Vs Self-Hosted

Use the hosted platform if you want:

The fastest path to integration
Managed app configuration
Platform-managed setup with appId

Use self-hosted if you want:

Infrastructure control
Your own token and networking boundaries
Custom backend mediation for session access

Required API Keys

Before deploying, obtain API keys from:

Provider	Purpose	Where to Get
Deepgram	Speech-to-text and text-to-speech	deepgram.com
Groq or OpenRouter	LLM for AI responses	groq.com or openrouter.ai

Fully Self-Hosted (No External Speech APIs)

If you run Echoline for self-hosted STT/TTS, you can eliminate the Deepgram dependency:

bash

docker compose --profile echoline up

Echoline requires a GPU for real-time performance. See Self-Hosted Speech for setup instructions.

Echoline (self-hosted STT/TTS option) does not require external API keys.

Testing

The stack includes multiple testing capabilities:

Smoke Test (bun run stack:test) A quick health check that verifies the Core and Engine services are running, validates token generation works, and confirms WebSocket connections can be established.

Test Harness Framework An LLM-powered automated testing system in the Engine that simulates human users conducting conversations with your voice agent. It validates that the agent correctly uses tools, handles multi-turn conversations, and maintains context across interactions. The framework includes built-in test scenarios for weather lookups, calculations, multi-tool conversations, and context retention.

Custom Test Scenarios Create your own test cases by defining conversation objectives, expected tool calls with validation logic, and mock return data. Tests can be run against different LLM providers and generate detailed Markdown logs of each run.

See the Testing page for detailed documentation on the Test Harness, creating custom scenarios, and CI/CD integration.

Documentation

Deployment - Docker Compose setup, prerequisites, production deployment
Configuration - Complete environment variable reference
Core - Token service API, bootstrap process, Web UI
Realtime Engine - WebSocket API, events, runtime config
Self-Hosted Speech (Echoline) - Run local STT/TTS without external APIs
Architecture - How components fit together
Testing - Smoke tests, Test Harness framework, custom scenarios
Troubleshooting - Debug common issues, logs, health checks

Source Repository

The self-hosted stack is open source at github.com/usevowel/stack.

Individual components:

Core: github.com/usevowel/core
Engine: github.com/usevowel/engine

Self-Hosted ​

Who This Is For ​

What The Self-Hosted Stack Includes ​

Deployment Options ​

Option 1: Deepgram-Powered (Default - Recommended) ​

Option 2: Fully Self-Hosted with Echoline ​

Option 3: GPU-Accelerated (NVIDIA GPU Only) ​

Command Reference ​

Hosted Vs Self-Hosted ​

Required API Keys ​

Fully Self-Hosted (No External Speech APIs) ​

Testing ​

Documentation ​

Source Repository ​