Safety & Content Controls

Configure guardrails to ensure your avatar stays on-topic, provides appropriate responses, and knows when to escalate to humans.

Why Safety Controls Matter

AI can occasionally:

Go off-topic or make things up
Provide advice outside its expertise
Miss signals that a human is needed
Discuss topics you'd rather avoid

Safety controls let you prevent these issues proactively.

Topic Boundaries

Define topics your avatar should avoid:

Dashboard Configuration

Go to Dashboard > Avatars > [Your Avatar] > Safety
Under Topics to Avoid, add entries like:
- "competitor pricing"
- "legal advice"
- "medical recommendations"
- "political opinions"
Save changes

When users ask about avoided topics, the avatar responds with a polite redirect:

"I'm not able to help with that topic, but I can assist you with [relevant alternative]."

Via Personality Prompt

Include boundaries in your personality:

const PERSONALITY = `
You are a customer support agent for TechCorp.
 
## Topics to Avoid
Never discuss:
- Competitor products or pricing
- Legal advice or liability questions
- Medical or health recommendations
- Political or religious topics
 
If asked about these, say: "I'm not the best resource for that topic.
For [topic], I'd recommend consulting a [appropriate professional]."
`;

Human Escalation

Configure when to transfer to a human agent:

Automatic Triggers

Enable in Dashboard > Safety > Human Escalation:

Trigger	Description
User requests human	"Can I speak to a real person?"
Negative sentiment	Frustration or anger detected
Repeated failures	Avatar can't answer after 3 attempts
Sensitive topics	Certain keywords detected

Escalation Response

When triggered, the avatar says:

"I think you'd be better served by one of my human colleagues. Let me connect you with our team."

Then either:

Shows a contact form
Provides support email/phone
Integrates with your live chat (Enterprise)

Custom Escalation Logic

<Avatar
  onEscalate={(reason, conversation) => {
    // Your custom escalation handling
    openLiveChat(conversation.id);
  }}
  escalationTriggers={[
    { type: 'keyword', values: ['refund', 'cancel', 'lawyer'] },
    { type: 'sentiment', threshold: -0.5 },
    { type: 'intent', values: ['speak_to_human'] }
  ]}
/>

Safety Filters

Built-in filters that are always active:

PII Protection – Avatar won't repeat back sensitive info (SSN, credit cards)
Harmful Content – Blocks generation of harmful or illegal content
Jailbreak Prevention – Resists prompt injection attempts

Custom Filters (Pro)

Add your own content filters:

POST /avatars/{avatarId}/safety/filters
{
  "name": "profanity-filter",
  "type": "block",
  "patterns": ["badword1", "badword2"],
  "action": "replace",
  "replacement": "[filtered]"
}

Response Validation

Ensure avatar responses meet your standards:

Confidence Scoring

Configure minimum confidence threshold:

<Avatar
  minConfidence={0.7}
  lowConfidenceResponse="I'm not entirely sure about that. Let me connect you with someone who can help."
/>

Response Length Limits

Prevent overly long responses:

<Avatar
  maxResponseLength={500}  // characters
  truncationMessage="..."
/>

Monitoring & Alerts

Stay informed about safety events:

Dashboard Alerts

View safety events in Dashboard > Analytics > Safety Events:

Escalation triggers
Blocked topics
Filter activations
Low confidence responses

Webhooks

Receive real-time safety notifications:

POST /webhooks
{
  "url": "https://your-site.com/avatarium-safety",
  "events": ["safety.escalation", "safety.topic_blocked", "safety.filter_triggered"]
}

Best Practices

Start restrictive, loosen gradually – Better to over-block initially
Review escalations weekly – Learn what the avatar struggles with
Test adversarially – Try to break your own avatar
Update based on real conversations – Add filters for unexpected issues

Configure in Dashboard → (opens in a new tab)

Knowledge Base Vision & Camera