Skip to content

Self-Hosted

Self-hosted deployment lets you run vowel on infrastructure you control.

Beta Release

This open-source release is in beta. You may encounter rough edges, incomplete features, or breaking changes. We are actively reviewing and merging community PRs, but please expect some instability as we iterate toward a stable release. Your feedback and contributions are welcome.

Who This Is For

Choose self-hosted when you want:

  • Your own deployment boundary
  • Your own token issuance path
  • Custom networking, auth, or backend policy
  • Operator control over runtime configuration
  • Data privacy with fully offline operation (optional)
Self-Hosted Stack Overview

What The Self-Hosted Stack Includes

The self-hosted stack includes these services:

ServiceDefault URLPurpose
Corehttp://localhost:3000Token issuance, app management, Web UI
Realtime Enginews://localhost:8787/v1/realtimeVoice AI WebSocket (OpenAI-compatible)
Echoline (optional)http://localhost:8000Self-hosted STT/TTS with faster-whisper + Kokoro

Optional:

ServiceDefault URLPurpose
Echolinehttp://localhost:8000Self-hosted STT/TTS (no external APIs)

Your application typically talks to Core or your own backend to get a token, then connects to the realtime engine with that token.

Deployment Options

Uses hosted STT/TTS from Deepgram. Works on all machines (no GPU required).

  • Pros: Fast setup, professional-grade quality, no model downloads
  • Cons: Requires Deepgram API key, ongoing API costs
  • Requirements: Deepgram API key + LLM provider key (Groq or OpenRouter)
  • Command: bun run stack:up

Option 2: Fully Self-Hosted with Echoline

Local speech processing with faster-whisper + Kokoro. Requires NVIDIA GPU.

  • Pros: No external APIs, data privacy, works offline, no API costs
  • Cons: Requires GPU, ~5GB disk space, slower initial startup
  • Requirements: NVIDIA GPU with 8GB+ VRAM
  • Command: bun run stack:up:full

Option 3: GPU-Accelerated (NVIDIA GPU Only)

Uses GPU for lower VAD latency with Deepgram quality.

  • Requirements: NVIDIA GPU + Container Toolkit
  • Command: bun run stack:up:gpu

Command Reference

Common stack management commands:

CommandDescription
bun run stack:upStart CPU stack (default)
bun run stack:up:gpuStart with GPU acceleration
bun run stack:up:fullStart with Echoline (self-hosted STT/TTS)
bun run stack:downStop and remove containers
bun run stack:logsView service logs
bun run stack:buildRebuild container images
bun run stack:testRun smoke tests

Hosted Vs Self-Hosted

Use the hosted platform if you want:

  • The fastest path to integration
  • Managed app configuration
  • Platform-managed setup with appId

Use self-hosted if you want:

  • Infrastructure control
  • Your own token and networking boundaries
  • Custom backend mediation for session access

Required API Keys

Before deploying, obtain API keys from:

ProviderPurposeWhere to Get
DeepgramSpeech-to-text and text-to-speechdeepgram.com
Groq or OpenRouterLLM for AI responsesgroq.com or openrouter.ai

Fully Self-Hosted (No External Speech APIs)

If you run Echoline for self-hosted STT/TTS, you can eliminate the Deepgram dependency:

bash
docker compose --profile echoline up

Echoline requires a GPU for real-time performance. See Self-Hosted Speech for setup instructions.

Echoline (self-hosted STT/TTS option) does not require external API keys.

Testing

The stack includes multiple testing capabilities:

Smoke Test (bun run stack:test) A quick health check that verifies the Core and Engine services are running, validates token generation works, and confirms WebSocket connections can be established.

Test Harness Framework An LLM-powered automated testing system in the Engine that simulates human users conducting conversations with your voice agent. It validates that the agent correctly uses tools, handles multi-turn conversations, and maintains context across interactions. The framework includes built-in test scenarios for weather lookups, calculations, multi-tool conversations, and context retention.

Custom Test Scenarios Create your own test cases by defining conversation objectives, expected tool calls with validation logic, and mock return data. Tests can be run against different LLM providers and generate detailed Markdown logs of each run.

See the Testing page for detailed documentation on the Test Harness, creating custom scenarios, and CI/CD integration.

Documentation

Source Repository

The self-hosted stack is open source at github.com/usevowel/stack.

Individual components: