ML risk scoring that reflects real stakes: embeddings and audit history
Why "patient" is not enough
A keyword hit on “patient” can mean hospital discharge notes or a completely different domain with different harm profiles. Embeddings place the utterance in a semantic neighborhood before rules and models fire—context first, regex second.
That matters for HumanOS risk because the failure mode is rarely “wrong string”—it is wrong world model.
Multi-dimensional risk
A single float from a toy classifier is not an operational risk system. Dimensions like stake, harm_potential, and velocity belong in the same decision record as rules outputs. The HumanOS layer combines:
- Rules — explicit, auditable, versioned.
- ML scaffold — org-flagged, calibration-aware, explainable components.
Roll out gradually: flags, backfill jobs, admin endpoints—no silent flips on Friday.
KNN from audit history
Past governed decisions are not only logs—they are training signal for similarity: tasks like this one escalated before; here is the neighborhood in embedding space. That is how risk learns without inventing scores from marketing copy.
Footgun: legacy rows without embeddings until backfill completes—gate behavior so you do not change production risk on partial data.
Explainability
Compliance needs why, not only what. Surface:
fired_rules— deterministic triggers with codes.- Model components — bounded narrative suitable for audit (“nearest neighbors were escalation-heavy”), not a black box “trust us.”
Checklist: ML risk rollout
- Domain classifier evaluated on held-out org tasks—not only demo prompts.
- Embeddings backfill job idempotent; progress visible to admins.
- Fallback to rules-only when ML disabled or cold-start.
- Fourth Law path preserved—ML does not override human escalation when uncertain.
Policy narrative: Safety that moves at business speed. JWT wrapper: governance wrapper.
— Part of