Safety & Content Controls
Configure guardrails to ensure your avatar stays on-topic, provides appropriate responses, and knows when to escalate to humans.
Why Safety Controls Matter
AI can occasionally:
- Go off-topic or make things up
- Provide advice outside its expertise
- Miss signals that a human is needed
- Discuss topics you'd rather avoid
Safety controls let you prevent these issues proactively.
Topic Boundaries
Define topics your avatar should avoid:
Dashboard Configuration
- Go to Dashboard > Avatars > [Your Avatar] > Safety
- Under Topics to Avoid, add entries like:
- "competitor pricing"
- "legal advice"
- "medical recommendations"
- "political opinions"
- Save changes
When users ask about avoided topics, the avatar responds with a polite redirect:
"I'm not able to help with that topic, but I can assist you with [relevant alternative]."
Via Personality Prompt
Include boundaries in your personality:
const PERSONALITY = `
You are a customer support agent for TechCorp.
## Topics to Avoid
Never discuss:
- Competitor products or pricing
- Legal advice or liability questions
- Medical or health recommendations
- Political or religious topics
If asked about these, say: "I'm not the best resource for that topic.
For [topic], I'd recommend consulting a [appropriate professional]."
`;Human Escalation
Configure when to transfer to a human agent:
Automatic Triggers
Enable in Dashboard > Safety > Human Escalation:
| Trigger | Description |
|---|---|
| User requests human | "Can I speak to a real person?" |
| Negative sentiment | Frustration or anger detected |
| Repeated failures | Avatar can't answer after 3 attempts |
| Sensitive topics | Certain keywords detected |
Escalation Response
When triggered, the avatar says:
"I think you'd be better served by one of my human colleagues. Let me connect you with our team."
Then either:
- Shows a contact form
- Provides support email/phone
- Integrates with your live chat (Enterprise)
Custom Escalation Logic
<Avatar
onEscalate={(reason, conversation) => {
// Your custom escalation handling
openLiveChat(conversation.id);
}}
escalationTriggers={[
{ type: 'keyword', values: ['refund', 'cancel', 'lawyer'] },
{ type: 'sentiment', threshold: -0.5 },
{ type: 'intent', values: ['speak_to_human'] }
]}
/>Safety Filters
Built-in filters that are always active:
- PII Protection – Avatar won't repeat back sensitive info (SSN, credit cards)
- Harmful Content – Blocks generation of harmful or illegal content
- Jailbreak Prevention – Resists prompt injection attempts
Custom Filters (Pro)
Add your own content filters:
POST /avatars/{avatarId}/safety/filters
{
"name": "profanity-filter",
"type": "block",
"patterns": ["badword1", "badword2"],
"action": "replace",
"replacement": "[filtered]"
}Response Validation
Ensure avatar responses meet your standards:
Confidence Scoring
Configure minimum confidence threshold:
<Avatar
minConfidence={0.7}
lowConfidenceResponse="I'm not entirely sure about that. Let me connect you with someone who can help."
/>Response Length Limits
Prevent overly long responses:
<Avatar
maxResponseLength={500} // characters
truncationMessage="..."
/>Monitoring & Alerts
Stay informed about safety events:
Dashboard Alerts
View safety events in Dashboard > Analytics > Safety Events:
- Escalation triggers
- Blocked topics
- Filter activations
- Low confidence responses
Webhooks
Receive real-time safety notifications:
POST /webhooks
{
"url": "https://your-site.com/avatarium-safety",
"events": ["safety.escalation", "safety.topic_blocked", "safety.filter_triggered"]
}Best Practices
- Start restrictive, loosen gradually – Better to over-block initially
- Review escalations weekly – Learn what the avatar struggles with
- Test adversarially – Try to break your own avatar
- Update based on real conversations – Add filters for unexpected issues