ML risk scoring that reflects real stakes: embeddings and audit history

Why "patient" is not enough

A keyword hit on “patient” can mean hospital discharge notes or a completely different domain with different harm profiles. Embeddings place the utterance in a semantic neighborhood before rules and models fire—context first, regex second.

That matters for HumanOS risk because the failure mode is rarely “wrong string”—it is wrong world model.

Multi-dimensional risk

A single float from a toy classifier is not an operational risk system. Dimensions like stake, harm_potential, and velocity belong in the same decision record as rules outputs. The HumanOS layer combines:

Rules — explicit, auditable, versioned.
ML scaffold — org-flagged, calibration-aware, explainable components.

Roll out gradually: flags, backfill jobs, admin endpoints—no silent flips on Friday.

KNN from audit history

Past governed decisions are not only logs—they are training signal for similarity: tasks like this one escalated before; here is the neighborhood in embedding space. That is how risk learns without inventing scores from marketing copy.

Footgun: legacy rows without embeddings until backfill completes—gate behavior so you do not change production risk on partial data.

Explainability

Compliance needs why, not only what. Surface:

fired_rules — deterministic triggers with codes.
Model components — bounded narrative suitable for audit (“nearest neighbors were escalation-heavy”), not a black box “trust us.”

Checklist: ML risk rollout

Domain classifier evaluated on held-out org tasks—not only demo prompts.
Embeddings backfill job idempotent; progress visible to admins.
Fallback to rules-only when ML disabled or cold-start.
Fourth Law path preserved—ML does not override human escalation when uncertain.

Policy narrative: Safety that moves at business speed. JWT wrapper: governance wrapper.