Concepts
Anomaly Detection
Humane watches four behavioral + operational signals per tenant and fires an alert the moment any of them drift more than 2σ from a rolling 30-day baseline. It's the engine that turns “nobody noticed until the churn report” into “engineering got a Slack ping within 15 minutes.”
What it watches
The first release ships four metrics — tuned to be noisy enough to catch real problems but quiet enough to avoid alert fatigue:
- Sentiment drop — per end-user. Recent 24h average sentiment versus their own 30-day baseline. Catches users who are quietly souring on your product before they churn.
- Engagement drop— per end-user. Interactions in the last 24h versus their daily average over 30 days. Flags users who were regulars and suddenly aren't.
- Latency spike— tenant-wide. Last hour's P50 latency versus the last 24 hourly snapshots. Catches upstream LLM slowness, DB contention, and infrastructure hiccups.
- Error-rate spike — tenant-wide. Webhook 5xx rate in the last hour versus the last 24 hours. Catches a failing subscriber endpoint before it tanks your alerting pipeline.
Severity — and why we use z-scores
Every detection carries a z_score measuring how many standard deviations the observed value sits from its baseline. Severity maps directly from the absolute z:
|z| >= 4.0 → critical (+ email to tenant owner)
|z| >= 3.0 → warning (webhook only)
|z| >= 2.0 → info (dashboard only)
|z| < 2.0 → not persisted (normal variation)Standard-deviation thresholds give you a principled alert cadence — no hand-tuning per-customer. A latency spike on a tenant that normally runs 100 ms ± 50 ms triggers at 200 ms. A tenant that normally runs 100 ms ± 5 ms triggers at 110 ms. Same code, right threshold for each tenant.
Baselines + sample-size guards
The engine refuses to fire when the baseline is too thin to be trustworthy:
- Minimum 20 baseline samples — below this, the stats are too noisy and we skip rather than alarm.
- Baseline standard deviation must be > 0 — a flat metric can't have anomalies; dividing by zero σ would produce infinite z-scores.
- Webhook error rate needs ≥ 5 deliveries in the window — one failure on a quiet tenant doesn't deserve a 100%-error-rate alarm.
These guards mean brand-new tenants see "baseline too small"in their audit log until real traffic accumulates. Intentional: we'd rather be silent than cry wolf.
Deduplication
Without dedup, the 15-minute sweep would create a new row every tick for as long as a problem persists — flooding the dashboard and the webhook. A 2-hour cooldown per(tenant, metric_name, end_user_id) stops the flood:
sweep t=0 sentiment_drop on user_42 z=-2.8 → CREATE anomaly
sweep t=15 sentiment_drop on user_42 z=-3.1 → skip (cooldown)
sweep t=30 sentiment_drop on user_42 z=-2.9 → skip (cooldown)
...
sweep t=120 sentiment_drop on user_42 z=-2.7 → CREATE anomaly #2If a condition remains unresolved after 2 hours, the engine fires again — you see the severity possibly shifting (warning → critical), which itself is signal.
How alerts reach you
- Dashboard — Engine Monitorrenders the detector's last 7 days with severity badges, baseline stats, and a one-click “Acknowledge” button.
- Webhook event
anomaly.detected— fires on every severity level. Payload includes metric name, z-score, observed + baseline, and the scope (system-wide or specific end-user id). Wire to Slack, PagerDuty, Opsgenie, whatever. - Email to tenant owner — only for
criticalseverity. Delivered via yournotify_audit_warningspreference (unsubscribable per RFC 8058).
Webhook payload shape
{
"id": "5b8d3c... (anomaly UUID)",
"metric": "sentiment_drop",
"severity": "warning",
"z_score": -3.14,
"observed_value": 0.31,
"baseline_mean": 0.62,
"baseline_std": 0.09,
"sample_size": 420,
"description": "Sentiment drop: observed 0.310 vs baseline 0.620 ± 0.090 (z=-3.14, n=420)",
"end_user_id": "user_42", // null when system-wide
"user_id": "tenant-uuid",
"detected_at": "2026-04-25T10:45:12Z"
}anomaly.detectedevent is signed with your webhook's HMAC-SHA256 secret. Verify with the same X-Humane-Signature parser as your other handlers — see the Webhooks guide.REST API
Dashboards, Slack bots, or on-call triage flows can query anomalies directly. All endpoints are tenant-scoped via the API key.
GET /api/anomalies?severity=warning&since_hours=24&resolved=false
GET /api/anomalies/stats?since_hours=168
GET /api/anomalies/{id}
POST /api/anomalies/{id}/resolve { "note": "upstream LLM recovered" }Retention
Anomaly rows live for 90 days then are pruned by a nightly sweep (04:19 UTC). Resolved or not — 90 days is plenty for postmortems and audit review. If you need longer retention (regulated industries sometimes do), export via the API to your own data warehouse.
Tuning thresholds
The defaults — 20 samples, z≥2/3/4 tiers, 2-hour cooldown, 15-minute sweep — are intentionally conservative. If you find the engine too quiet, contact support and we'll adjust per-tenant. In the open-source community tier, edit app/anomaly_detector.py directly — the tunables are constants at the top of the file.