Guides
Voice Mode API

Voice Mode API

Control voice interactions with the avatar through postMessage API.

The Voice Mode API enables parent frame control of the embedded avatar's voice features, perfect for custom applications like desktop companions or kiosk displays.

Voice Modes

ModeBehaviorUse Case
continuousAlways listening with VAD (Voice Activity Detection)Hands-free conversation
pushToTalkManual recording controlNoisy environments, precise control

postMessage Commands

Send commands to the avatar iframe:

const iframe = document.getElementById('avatar-frame');
 
// Send a command
function sendToAvatar(type, data = {}) {
  iframe.contentWindow?.postMessage({ type, ...data }, '*');
}

Available Commands

CommandParametersDescription
setVoiceMode{ mode: 'continuous' | 'pushToTalk' }Switch between voice modes
startVoice{ mode?: string }Start voice mode
stopVoice-Stop voice mode
startRecording-Start PTT recording
stopRecording-Stop PTT recording and send
stopSpeaking-Interrupt avatar speech
getState-Request current state

Examples

// Start continuous listening
sendToAvatar('startVoice', { mode: 'continuous' });
 
// Stop listening
sendToAvatar('stopVoice');

Event Listening

Listen for events from the avatar:

window.addEventListener('message', (event) => {
  if (event.data?.source !== 'avatarium') return;
  
  const { type, state, mode } = event.data;
  
  switch (type) {
    case 'ready':
      console.log('Avatar ready');
      break;
    case 'stateChanged':
      console.log('State:', state);
      // States: IDLE, LISTENING, RECORDING, PROCESSING, THINKING, SPEAKING
      break;
    case 'modeChanged':
      console.log('Mode:', mode);
      // Modes: continuous, pushToTalk
      break;
  }
});

Event Types

EventDataDescription
ready{ avatarId, capabilities }Avatar loaded and ready
stateChanged{ state }Voice state changed
modeChanged{ mode }Voice mode changed

Voice States

IDLE → LISTENING → RECORDING → PROCESSING → THINKING → SPEAKING → IDLE
         ↑                                                    ↓
         └────────────────────────────────────────────────────┘
StateDescription
IDLEWaiting for input
LISTENINGVAD active, waiting for speech
RECORDINGCapturing user audio
PROCESSINGTranscribing audio (STT)
THINKINGAI generating response
SPEAKINGAvatar speaking response

Complete Example

<!DOCTYPE html>
<html>
<head>
  <title>Voice Mode Demo</title>
</head>
<body>
  <iframe 
    id="avatar-frame" 
    src="https://avatarium.ai/embed/YOUR_AVATAR_ID"
    allow="microphone"
    style="width: 400px; height: 600px; border: none;"
  ></iframe>
  
  <div id="controls">
    <button id="voice-btn">🎤 Talk</button>
    <label>
      <input type="checkbox" id="mode-toggle" checked>
      Always-on mode
    </label>
    <span id="status">IDLE</span>
  </div>
 
  <script>
    const iframe = document.getElementById('avatar-frame');
    const voiceBtn = document.getElementById('voice-btn');
    const modeToggle = document.getElementById('mode-toggle');
    const status = document.getElementById('status');
    
    let voiceMode = 'continuous';
    let isRecording = false;
    
    function sendToAvatar(type, data = {}) {
      iframe.contentWindow?.postMessage({ type, ...data }, '*');
    }
    
    // Listen for avatar events
    window.addEventListener('message', (event) => {
      if (event.data?.source !== 'avatarium') return;
      
      if (event.data.type === 'stateChanged') {
        status.textContent = event.data.state;
      }
      if (event.data.type === 'modeChanged') {
        voiceMode = event.data.mode;
        modeToggle.checked = voiceMode === 'continuous';
      }
    });
    
    // Mode toggle
    modeToggle.addEventListener('change', (e) => {
      voiceMode = e.target.checked ? 'continuous' : 'pushToTalk';
      sendToAvatar('setVoiceMode', { mode: voiceMode });
    });
    
    // Voice button - continuous mode: toggle, PTT mode: hold
    voiceBtn.addEventListener('click', () => {
      if (voiceMode === 'continuous') {
        sendToAvatar('startVoice', { mode: 'continuous' });
      }
    });
    
    voiceBtn.addEventListener('mousedown', () => {
      if (voiceMode === 'pushToTalk') {
        sendToAvatar('startRecording');
        isRecording = true;
      }
    });
    
    voiceBtn.addEventListener('mouseup', () => {
      if (voiceMode === 'pushToTalk' && isRecording) {
        sendToAvatar('stopRecording');
        isRecording = false;
      }
    });
    
    // Keyboard shortcuts
    document.addEventListener('keydown', (e) => {
      if (e.code === 'Space' && voiceMode === 'pushToTalk') {
        e.preventDefault();
        voiceBtn.dispatchEvent(new MouseEvent('mousedown'));
      }
      if (e.code === 'Escape') {
        sendToAvatar('stopSpeaking');
      }
    });
    
    document.addEventListener('keyup', (e) => {
      if (e.code === 'Space' && voiceMode === 'pushToTalk') {
        e.preventDefault();
        voiceBtn.dispatchEvent(new MouseEvent('mouseup'));
      }
    });
  </script>
</body>
</html>

Best Practices

Push-to-Talk for noisy environments: Use PTT mode when background noise could trigger false activations in continuous mode.

Keyboard Shortcuts

Recommended keyboard bindings:

KeyAction
SpaceHold for PTT, or toggle continuous
EscapeInterrupt/stop avatar
MToggle mode

Visual Feedback

Always show clear visual feedback for voice state:

.voice-btn.listening { background: #4CAF50; }
.voice-btn.recording { background: #f44336; animation: pulse 1s infinite; }
.voice-btn.processing { background: #FF9800; }
.voice-btn.speaking { background: #2196F3; }

Error Handling

Handle microphone permission errors gracefully:

navigator.mediaDevices.getUserMedia({ audio: true })
  .then(() => {
    // Microphone available
    sendToAvatar('startVoice');
  })
  .catch((err) => {
    if (err.name === 'NotAllowedError') {
      alert('Microphone access denied. Please enable in browser settings.');
    }
  });

Related