How Avatarium Works

Understanding Avatarium's architecture helps you build better experiences.

The Conversation Flow

When a user interacts with an avatar, here's what happens:

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│              │     │                 │     │              │
│    User      │────▶│   Avatarium     │────▶│  AI Provider │
│              │     │   Gateway       │     │              │
│              │◀────│                 │◀────│              │
│              │     │                 │     │              │
└──────────────┘     └─────────────────┘     └──────────────┘
      │                      │
      │                      ▼
      │              ┌─────────────────┐
      │              │                 │
      │              │   TTS Layer     │
      │              │                 │
      │              └─────────────────┘
      │                      │
      ▼                      ▼
┌─────────────────────────────────────────────────┐
│                                                 │
│              3D Avatar Renderer                 │
│         (Lip sync, gestures, emotions)          │
│                                                 │
└─────────────────────────────────────────────────┘

Step by Step:

User Input – User types or speaks a message
Gateway Processing – Avatarium receives and validates the request
AI Processing – Message is sent to your configured AI provider
Response Generation – AI generates a response based on personality
Text-to-Speech – Response is converted to audio
Phoneme Extraction – Audio is analyzed for lip sync data
Avatar Animation – 3D avatar speaks with synced lips and gestures
Delivery – Response streams back to the user in real-time

Key Components

Avatars

The visual representation – a 3D character that talks and emotes. Avatars can be:

Pre-built – Choose from our library of ready-made characters
Custom – Upload your own 3D models (GLB format)
Ready Player Me – Import from Ready Player Me

Learn more about Avatars →

AI Providers

The brain behind the conversation. Avatarium supports:

Managed AI – GPT-4o Mini, Claude Haiku 3.5, Gemini 3.1 Flash, Grok 3 Mini (billed per minute from your credits)
Premium AI – GPT-5.4, Claude Sonnet 4.6 (billed per minute, or free via BYOK)
BYOK – Bring your own OpenAI, Anthropic, Google, or xAI keys
Custom – Any OpenAI-compatible API endpoint

Learn more about AI Providers →

TTS Layer

The voice of your avatar. Three quality tiers:

Basic – Fast, suitable for high-volume use
Standard – Natural-sounding, low latency
Premium – Ultra-realistic, human-like voices (billed at the Premium per-minute rate)

Learn more about Voice Configuration →

Conversations

A session between a user and avatar. Conversations:

Maintain context across messages
Can be saved and retrieved
Generate events for analytics
Expire after 30 minutes of inactivity

Learn more about Conversations →

Edge Architecture

Avatarium runs on a global edge network for minimal latency:

45+ edge locations worldwide
Sub-200ms average response time
WebSocket connections for real-time streaming
Automatic failover for reliability

Security Model

Your data is protected at every step:

API Key Authentication – Secure key-based access
HTTPS/WSS Only – All traffic encrypted
Data Isolation – Your data is never shared with other users

Your First Avatar Avatars