How Avatarium Works
Understanding Avatarium's architecture helps you build better experiences.
The Conversation Flow
When a user interacts with an avatar, here's what happens:
┌──────────────┐ ┌─────────────────┐ ┌──────────────┐
│ │ │ │ │ │
│ User │────▶│ Avatarium │────▶│ AI Provider │
│ │ │ Gateway │ │ │
│ │◀────│ │◀────│ │
│ │ │ │ │ │
└──────────────┘ └─────────────────┘ └──────────────┘
│ │
│ ▼
│ ┌─────────────────┐
│ │ │
│ │ TTS Layer │
│ │ │
│ └─────────────────┘
│ │
▼ ▼
┌─────────────────────────────────────────────────┐
│ │
│ 3D Avatar Renderer │
│ (Lip sync, gestures, emotions) │
│ │
└─────────────────────────────────────────────────┘Step by Step:
- User Input – User types or speaks a message
- Gateway Processing – Avatarium receives and validates the request
- AI Processing – Message is sent to your configured AI provider
- Response Generation – AI generates a response based on personality
- Text-to-Speech – Response is converted to audio
- Phoneme Extraction – Audio is analyzed for lip sync data
- Avatar Animation – 3D avatar speaks with synced lips and gestures
- Delivery – Response streams back to the user in real-time
Key Components
Avatars
The visual representation – a 3D character that talks and emotes. Avatars can be:
- Pre-built – Choose from our library of ready-made characters
- Custom – Upload your own 3D models (GLB format)
- Ready Player Me – Import from Ready Player Me
AI Providers
The brain behind the conversation. Avatarium supports:
- Managed AI – GPT-4o Mini, Claude Haiku 3.5, Gemini 3.1 Flash, Grok 3 Mini (included in all plans)
- Premium AI – GPT-5.4, Claude Sonnet 4.6 (Premium/Elite plans)
- BYOK – Bring your own OpenAI, Anthropic, Google, or xAI keys
- Custom – Any OpenAI-compatible API endpoint
Learn more about AI Providers →
TTS Layer
The voice of your avatar. Three quality tiers:
- Basic – Fast, suitable for high-volume use
- Standard – Natural-sounding, low latency
- Premium – Ultra-realistic, human-like voices (Premium/Elite plans)
Learn more about Voice Configuration →
Conversations
A session between a user and avatar. Conversations:
- Maintain context across messages
- Can be saved and retrieved
- Generate events for analytics
- Expire after 30 minutes of inactivity
Learn more about Conversations →
Edge Architecture
Avatarium runs on a global edge network for minimal latency:
- 45+ edge locations worldwide
- Sub-200ms average response time
- WebSocket connections for real-time streaming
- Automatic failover for reliability
Security Model
Your data is protected at every step:
- API Key Authentication – Secure key-based access
- HTTPS/WSS Only – All traffic encrypted
- Data Isolation – Your data is never shared with other users