Vision & Camera
Enable your avatar to see and respond to what users show via their camera.
What is Vision?
Vision allows your avatar to:
- See the user's camera feed
- Analyze images shared in chat
- Respond to visual context
- Provide feedback on what it sees
Common use cases:
- Product support – "Show me the error on your screen"
- Education – "Let me see your work and I'll help"
- Retail – "Show me the item you're looking at"
- Technical support – "Point your camera at the device"
Enabling Vision
Dashboard
- Go to Dashboard > Avatars > [Your Avatar] > Settings
- Enable Vision capabilities
- Choose AI provider (must support vision):
- GPT-4o (recommended)
- Claude 3.5 Sonnet
- Gemini Pro Vision
- Save changes
SDK
<Avatar
model="scarlett"
vision={{
enabled: true,
mode: 'camera', // 'camera' | 'upload' | 'both'
aiProvider: 'gpt-4o'
}}
/>Vision Modes
Camera Mode
Real-time camera feed analysis:
<Avatar
vision={{
enabled: true,
mode: 'camera',
captureInterval: 5000, // Analyze every 5 seconds
autoCapture: false // Require user to click "capture"
}}
/>Image Upload Mode
Users can share images from their device:
<Avatar
vision={{
enabled: true,
mode: 'upload',
maxFileSize: 10 * 1024 * 1024, // 10MB
acceptedTypes: ['image/jpeg', 'image/png', 'image/webp']
}}
/>Combined Mode
Both camera and upload available:
<Avatar
vision={{
enabled: true,
mode: 'both'
}}
/>Personality Configuration
Update your prompt to handle visual input:
const PERSONALITY = `
You are a technical support agent with vision capabilities.
When the user shares an image or shows their camera:
1. Acknowledge what you see clearly
2. Ask clarifying questions if the image is unclear
3. Provide specific, actionable guidance
Example responses:
- "I can see the error message on your screen. It says..."
- "I see you're pointing at the power button. Try holding it for 10 seconds."
- "The image is a bit blurry. Could you hold the camera steadier?"
`;Privacy Considerations
Vision involves sensitive user data. Handle responsibly:
User Consent
Always inform users about vision capabilities:
<Avatar
vision={{
enabled: true,
consentRequired: true,
consentMessage: "This assistant can see images you share. Your camera feed is processed in real-time but not stored."
}}
/>Data Handling
| Data | Stored? | Details |
|---|---|---|
| Camera frames | No | Processed in memory, not saved |
| Uploaded images | Optional | Configure retention in settings |
| AI descriptions | Yes | Part of conversation transcript |
Disable Recording
Prevent any image storage:
<Avatar
vision={{
enabled: true,
storeImages: false,
storeDescriptions: true // Keep AI's text description
}}
/>Events
Listen for vision events:
<Avatar
onImageCapture={(image) => {
console.log('Image captured:', image.size);
}}
onVisionAnalysis={(result) => {
console.log('AI saw:', result.description);
}}
/>Limits
| Plan | Vision Analyses/Month |
|---|---|
| Free | 100 |
| Creator | 1,000 |
| Pro | 10,000 |
| Enterprise | Unlimited |
Note: Vision analyses use more tokens than text-only conversations.