White Paper SectionSection 15 / 17

Appendix B: Threat Model

Trust boundaries and mitigations for agentic mutation risks.

ShareLinkedIn X

Reader lens

Reference depth

Decision value

Authority, evidence, and replay

Next step

Appendix C: Implementation

Conventional security models prioritize preventing unauthorized access; governed autonomous execution must address a more insidious threat: authorized-looking execution derived from untrusted, manipulated, or semantically unsafe reasoning. A conventional security system asks whether a principal is authenticated and permitted to invoke an operation. Governed autonomous execution, by contrast, must verify whether a machine-generated intent is legitimate, contextually safe, bounded by policy, authorized by proof, and replayable after the fact.

Threat Model Principle

The control plane does not assume that the reasoning layer is malicious. It assumes something broader and more useful for security design: the reasoning layer is not authoritative.

This appendix establishes a threat model for the Autonomous State Control Plane. It maps the specific threat vectors arising when AI agents, external reasoning models, generated code, and dynamic policy engines interface with real-world infrastructure. Rather than cataloging generic AI risks, this model covers the transition path from initial reasoning to intent, policy, contract, identity, execution, evidence, replay, and protocol admission.

Scope and Assumptions

This threat model covers threats to AI-generated intent, the reasoning-to-execution boundary, context acquisition, policy evaluation, execution contracts, proof-derived execution identity, runtime execution adapters, evidence chains, replay and simulation, and generated-code admission. It supports the architecture described by Sovereign Agentic Loops, OpenKedge intent governance, Verifiable Agentic Infrastructure, Intent-to-Execution Evidence Chains, Protocol-Driven Development, replay, simulation, and audit.

Autonomy represents an authority-amplification surface: minor reasoning discrepancies, prompt injections, stale context inputs, or ambiguous contracts can propagate into material operational changes if model outputs translate directly into system execution. The control plane reduces this risk by ensuring that the reasoning layer generates proposals rather than authoritative actions.

This security framework operates under the following axioms: AI reasoning is inherently untrusted, external models reside outside the sovereign execution boundary, agents may be compromised or confused, and human operators are fallible. The architecture does not attempt to enforce perfect reasoning; instead, it confines the execution blast radius by enforcing deterministic boundaries external to the model.

This model does not guarantee model correctness, promise perfect prompt injection detection, or replace foundational cloud security. It defines the threats emerging at the intersection of autonomous reasoning and system mutation, and identifies architectural controls intended to prevent reasoning failures from becoming unbounded execution authority.

The primary assets requiring protection include sovereign execution authority, system state, policy definitions, execution contracts, short-lived credentials, and tamper-evident evidence logs. The core of this defense lies in protecting the authority boundary itself; flawed reasoning remains contained when the transition interface prevents unauthorized execution, maintains context freshness, and retains replayable evidence.

Table 23. Protected assets in governed autonomous execution.
Asset	Protection Goal
Execution authority	Prevent unauthorized or unjustified real-world mutation
Context data	Limit disclosure and prevent stale or manipulated decisions
Policy definitions	Preserve institutional rules and approval boundaries
Execution contracts	Ensure approved bounds cannot be widened or forged
Execution identity	Prevent misuse, reuse, or privilege expansion
Evidence chain	Preserve auditability, replayability, and accountability
Generated artifacts	Prevent unsafe code from entering operational workflows
Human approval authority	Prevent social, procedural, or system-level bypass

The protected assets are interdependent. If context can be manipulated, policy decisions may be wrong. If contracts can be widened, identity may become overbroad. If identity is reusable, runtime execution may escape its task. If evidence can be omitted, replay cannot establish accountability. The control plane therefore treats the full governance path as the security surface.

Actors and Trust Boundaries

We model the system across multiple actors and distinct trust boundaries. The relevant actors include the external model provider, reasoning agents, user requestors, agent runtimes, intent gateways, context providers, policy engines, governance brokers, execution adapters, and evidence stores.

The model defines multiple trust boundaries:

Reasoning boundary: external model output enters as a non-authoritative proposal.
Intent boundary: model output becomes structured intent.
Policy boundary: intent is evaluated under local policy and context.
Identity boundary: execution authority is created.
Execution boundary: actions affect real systems.
Evidence boundary: events are recorded for replay and audit.
Protocol admission boundary: generated artifacts become eligible for operational use.

Reasoning Boundary → Intent Boundary → Policy Boundary → Identity Boundary → Execution Boundary → Evidence Boundary

The security of this architecture does not depend on agent compliance or model-level alignment. Instead, the control-plane components (the runtime adapter, identity broker, policy engine, and evidence store) must independently enforce boundaries, even when presented with a well-formed but unsafe intent proposal.

Figure 9. Trust boundaries in the Autonomous State Control Plane threat model.

Threat Category 1: Reasoning-Layer Threats

Reasoning-layer threats emerge from prompt injections, tool-use manipulation, instruction hierarchy confusion, and hallucinated justifications. Sovereign Agentic Loops mitigate these vectors by treating all model outputs as non-authoritative proposals. Obfuscation membranes limit context exposure, structured intent validation sanitizes outputs, and policy evaluation is decoupled from the inference runtime. Model-neutral governance prevents single-provider dependencies from compromising the authorization path.

Threat Category 2: Intent-Layer Threats

Intent-layer threats arise when model outputs cross the governance boundary as candidate intents, manifesting as malformed intents, smuggled actions, scope inflation, or replayed requests. To neutralize these threats, the control plane enforces strict schema validation, risk categorization, temporal nonces, and context freshness verification before generating any execution contract.

The intent layer must reject ambiguity: if an objective is unclear, the scope is overbroad, or the requester is unauthenticated, the control plane escalates, constrains, or denies the intent rather than converting it into execution authority.

Threat Category 3: Context and Policy Threats

Context and policy threats occur when governance decisions rely on stale, manipulated, or inconsistent inputs. Mitigating these vectors requires context provenance tracking, narrow freshness windows, policy versioning, and deny-by-default behavior on missing context. High-risk intents require real-time context revalidation immediately preceding execution.

Threat Category 4: Execution Contract Threats

Execution contract threats involve forgery, contract widening, or ambiguity. Because the contract constitutes the primary enforcement boundary, it must be cryptographically signed, immutable, and parameterized with explicit resource bounds and expiration times. If an action cannot be deterministically validated against the contract, the runtime adapter must block execution.

Threat Category 5: Execution Identity Threats

Execution identity threats attempt to exploit standing privileges, leaked credentials, or long-lived tokens. The identity broker must enforce proof-derived workload identity, issuing short-lived, task-scoped, and non-reusable tokens that are strictly bounded by the execution contract (EID ≼ K).

Standing Privilege

Standing privilege turns an agent error into reusable authority. Proof-derived execution identity limits authority to the validated intent, contract, and time window.

The identity broker must fail closed: any validation failure regarding the contract, decision, context, or evidence requirements must prevent authority issuance.

Threat Category 6: Runtime Execution Threats

Runtime execution threats target the physical mutation interface via adapter bypass, parameter substitution, or race conditions. The execution adapter constitutes a critical element of the trusted computing base (TCB); it must verify the contract at the point of execution, fail closed, enforce atomic rollback, and emit detailed runtime evidence.

Threat Category 7: Evidence and Replay Threats

Evidence and replay threats aim to obscure accountability through omission, tampering, or selective logging. To support compliance in high-consequence environments, the system should employ append-only, hash-chained logs that record rejections and escalations alongside successful executions, supporting replayability.

Threat Category 8: Generated Software Threats

Generated software threats occur when AI-synthesized adapters, policy modules, or tools introduce malicious behavior or subtle bugs. Protocol-Driven Development (PDD) mitigates these threats by treating code as a candidate artifact. Artifacts are admitted to operational environments only after verifying structural, behavioral, and operational invariants within sandbox and property-testing pipelines.

Generated adapters and policy modules deserve special scrutiny because they participate directly in the active governance path. An unsafe adapter or compromised policy module can completely bypass runtime contract enforcement.

Mitigation Matrix

Table 24 summarizes the primary architectural mitigations for each threat category.

Table 24. Threat categories and primary architectural mitigations.
Threat Category	Primary Mitigations
Reasoning-layer threats	SAL, obfuscation membrane, intent isolation, no direct execution
Intent-layer threats	Intent schema validation, scope checks, risk classification, expiration
Context and policy threats	Context provenance, freshness checks, policy versioning, replay
Execution contract threats	Signed contracts, narrow bounds, expiration, revocation conditions
Execution identity threats	Proof-derived execution identity, short-lived credentials, no broader than contract
Runtime execution threats	Adapter enforcement, contract verification, fail-closed behavior
Evidence and replay threats	Append-only evidence, correlation ids, completeness checks, replay tests
Generated software threats	PDD, invariant checks, sandboxing, admission evidence, CI/CD gates

The mitigations must be implemented as layered controls. A single mechanism is insufficient: intent validation without policy evaluation is incomplete, and contracts without proof-derived execution identity are vulnerable to standing privilege bypass. Evidence without replay remains mere logging, and PDD without runtime enforcement cannot govern post-admission execution.

Residual Risks

Residual risks, including misconfigured policies, human operator errors, insider threats, and physical infrastructure compromise, must be managed through defense-in-depth, recurring red-team exercises, and periodic replay drills. Mature deployments should treat threat modeling as a continuous feedback loop, using incident evidence and validation failures to refine policy and adapter designs.

High-consequence domains must implement domain-specific safety cases, continuous system simulations, and clear manual escalation pathways. The ultimate objective of threat modeling is not to claim absolute security, but to delineate where authority must be bounded, where evidence must be produced, and where human sovereignty must be explicitly retained.