Two verbs, two trust postures

Capture says: this happened once; govern it. Watch says: keep paying attention until I say stop. If your product uses one word for both, you will ship surveillance UX while believing you shipped productivity UX.

Modality without smuggling

Voice and video belong in the connector layer, then land as Capture rows with source_kind audio, video, or transcript — refs to blobs, not megabytes stuffed into human.call. Reasoning consumes text + governed refs (see kb/52 in the repo). Streams and rooms lean toward Watch, not Capture.

API shape as pedagogy

POST /v1/humanos/captures and POST /v1/humanos/watches are intentionally separate. Separate tables, separate list endpoints, separate copy in consoles (P12). That is how we train implementers: different nouns, different ethics.

Use cases

Given a user uploading a voice note, when STT completes, then a Capture row exists with payload_ref and effective_config_keys showing which org policy picked the connector.
Given an analyst monitoring a vendor feed, when they create a Watch, then evaluation ticks are scheduled — not confused with “another capture.”
Given a greenfield agent, when developers use Wave 7 scaffolds, then they do not fork modality per agent.

Trust expectation: Everything audible or visible is on the record as Capture — impossible to smuggle off-books processing.

Extended narrative — enqueue without a third queue

Wave 3’s engineering trap is inventing yet another worker because Capture “needs processing.” The program forbids it: Capture creation may call the existing gateway async facade (enqueue_invocation / enqueue_human_call) so the same BullMQ + execution_queue story handles handoff work. If Redis is down, the Capture row still exists — the response carries capture_invocation_error instead of pretending success.

Watch evaluation is similarly allergic to fairy tales. POST .../watches/:id/evaluate records an audit row (feedback store) with optional provenance_pointer. Scheduling can start as “something calls evaluate on cadence” and graduate to cron without renaming the user-facing noun.

Trust expectation: Operators see one async story and two durable nouns (Capture vs Watch) — never three queues and one overloaded “intake.”

Given / When / Then (use case 4)

Given a multimodal connector finishing STT, when it creates a Capture with source_kind: transcript, then the same REST path as text Capture applies — no shadow ingest API.

Given / When / Then (use case 5)

Given a watch on a vendor endpoint, when a worker calls evaluate hourly, then each tick has a feedback id suitable for audit dashboards.

Given / When / Then (use case 6)

Given an impatient PM asking for “just poll S3,” when engineers answer, then they cite Watch + evaluation doc — not a second Capture table.