Core Concepts
How It Works

How Avatarium Works

Understanding Avatarium's architecture helps you build better experiences.

The Conversation Flow

When a user interacts with an avatar, here's what happens:

┌──────────────┐     ┌─────────────────┐     ┌──────────────┐
│              │     │                 │     │              │
│    User      │────▶│   Avatarium     │────▶│  AI Provider │
│              │     │   Gateway       │     │              │
│              │◀────│                 │◀────│              │
│              │     │                 │     │              │
└──────────────┘     └─────────────────┘     └──────────────┘
      │                      │
      │                      ▼
      │              ┌─────────────────┐
      │              │                 │
      │              │   TTS Layer     │
      │              │                 │
      │              └─────────────────┘
      │                      │
      ▼                      ▼
┌─────────────────────────────────────────────────┐
│                                                 │
│              3D Avatar Renderer                 │
│         (Lip sync, gestures, emotions)          │
│                                                 │
└─────────────────────────────────────────────────┘

Step by Step:

  1. User Input – User types or speaks a message
  2. Gateway Processing – Avatarium receives and validates the request
  3. AI Processing – Message is sent to your configured AI provider
  4. Response Generation – AI generates a response based on personality
  5. Text-to-Speech – Response is converted to audio
  6. Phoneme Extraction – Audio is analyzed for lip sync data
  7. Avatar Animation – 3D avatar speaks with synced lips and gestures
  8. Delivery – Response streams back to the user in real-time

Key Components

Avatars

The visual representation – a 3D character that talks and emotes. Avatars can be:

  • Pre-built – Choose from our library of ready-made characters
  • Custom – Upload your own 3D models (GLB format)
  • Ready Player Me – Import from Ready Player Me

Learn more about Avatars →

AI Providers

The brain behind the conversation. Avatarium supports:

  • Managed AI – GPT-4o Mini, Claude Haiku 3.5, Gemini 3.1 Flash, Grok 3 Mini (included in all plans)
  • Premium AI – GPT-5.4, Claude Sonnet 4.6 (Premium/Elite plans)
  • BYOK – Bring your own OpenAI, Anthropic, Google, or xAI keys
  • Custom – Any OpenAI-compatible API endpoint

Learn more about AI Providers →

TTS Layer

The voice of your avatar. Three quality tiers:

  • Basic – Fast, suitable for high-volume use
  • Standard – Natural-sounding, low latency
  • Premium – Ultra-realistic, human-like voices (Premium/Elite plans)

Learn more about Voice Configuration →

Conversations

A session between a user and avatar. Conversations:

  • Maintain context across messages
  • Can be saved and retrieved
  • Generate events for analytics
  • Expire after 30 minutes of inactivity

Learn more about Conversations →

Edge Architecture

Avatarium runs on a global edge network for minimal latency:

  • 45+ edge locations worldwide
  • Sub-200ms average response time
  • WebSocket connections for real-time streaming
  • Automatic failover for reliability

Security Model

Your data is protected at every step:

  • API Key Authentication – Secure key-based access
  • HTTPS/WSS Only – All traffic encrypted
  • Data Isolation – Your data is never shared with other users