Vision & Camera

Enable your avatar to see and respond to what users show via their camera.

What is Vision?

Vision allows your avatar to:

See the user's camera feed
Analyze images shared in chat
Respond to visual context
Provide feedback on what it sees

Common use cases:

Product support – "Show me the error on your screen"
Education – "Let me see your work and I'll help"
Retail – "Show me the item you're looking at"
Technical support – "Point your camera at the device"

Enabling Vision

Dashboard

Go to Dashboard > Avatars > [Your Avatar] > Settings
Enable Vision capabilities
Choose AI provider (must support vision):
- GPT-4o (recommended)
- Claude 3.5 Sonnet
- Gemini Pro Vision
Save changes

SDK

<Avatar
  model="scarlett"
  vision={{
    enabled: true,
    mode: 'camera',  // 'camera' | 'upload' | 'both'
    aiProvider: 'gpt-4o'
  }}
/>

Vision Modes

Camera Mode

Real-time camera feed analysis:

<Avatar
  vision={{
    enabled: true,
    mode: 'camera',
    captureInterval: 5000,  // Analyze every 5 seconds
    autoCapture: false      // Require user to click "capture"
  }}
/>

Image Upload Mode

Users can share images from their device:

<Avatar
  vision={{
    enabled: true,
    mode: 'upload',
    maxFileSize: 10 * 1024 * 1024,  // 10MB
    acceptedTypes: ['image/jpeg', 'image/png', 'image/webp']
  }}
/>

Combined Mode

Both camera and upload available:

<Avatar
  vision={{
    enabled: true,
    mode: 'both'
  }}
/>

Personality Configuration

Update your prompt to handle visual input:

const PERSONALITY = `
You are a technical support agent with vision capabilities.
 
When the user shares an image or shows their camera:
1. Acknowledge what you see clearly
2. Ask clarifying questions if the image is unclear
3. Provide specific, actionable guidance
 
Example responses:
- "I can see the error message on your screen. It says..."
- "I see you're pointing at the power button. Try holding it for 10 seconds."
- "The image is a bit blurry. Could you hold the camera steadier?"
`;

Privacy Considerations

Vision involves sensitive user data. Handle responsibly:

User Consent

Always inform users about vision capabilities:

<Avatar
  vision={{
    enabled: true,
    consentRequired: true,
    consentMessage: "This assistant can see images you share. Your camera feed is processed in real-time but not stored."
  }}
/>

Data Handling

Data	Stored?	Details
Camera frames	No	Processed in memory, not saved
Uploaded images	Optional	Configure retention in settings
AI descriptions	Yes	Part of conversation transcript

Disable Recording

Prevent any image storage:

<Avatar
  vision={{
    enabled: true,
    storeImages: false,
    storeDescriptions: true  // Keep AI's text description
  }}
/>

Events

Listen for vision events:

<Avatar
  onImageCapture={(image) => {
    console.log('Image captured:', image.size);
  }}
  onVisionAnalysis={(result) => {
    console.log('AI saw:', result.description);
  }}
/>

Limits

Plan	Vision Analyses/Month
Free	100
Creator	1,000
Pro	10,000
Enterprise	Unlimited

Note: Vision analyses use more tokens than text-only conversations.

Enable Vision → (opens in a new tab)

Safety & Content Controls Team & Collaboration