OpenClaw Has a Provenance Problem β And So Does Every Agent Platform
Scanning for malware isn’t security. Proving who authorised the action is.
π Want the complete guide?
Learn more: Read the full eBook here β
Scott Farrell Β· leverageai.com.au Β· February 2026
- Agent platforms inherited chat-era trust models β they can act on your behalf but can’t prove you asked them to
- The real security gap isn’t jailbreaks or malware scanning β it’s provenance: who authorised this action, with what authority, and can you prove it?
- Software supply chains already solved this with SLSA and Sigstore. Agent platforms need the same four-layer model: identity β intent β artifact β execution
The Incident That Shouldn’t Have Been Possible
Last month, I watched an AI agent stop a running campaign. Not because it was hacked. Not because of a prompt injection attack from a malicious website. Because it couldn’t tell the difference between my instruction and an error message.
The agent was running on OpenClaw β one of the more thoughtful agentic platforms in terms of security. It had a published threat model, a trust page, DM pairing for unknown senders, and a recent VirusTotal partnership for scanning skills.1 By the standards of the industry, it was doing more than most.
Here’s what happened: the agent was executing a campaign I’d set up and explicitly authorised. A safeguard LLM β running on the cron system’s inputs as a content filter β returned a response that said, in effect, “I can’t execute this directive.” The agent read that response, decided it sounded like something I would say, and shut down the entire automation.
When I confronted it, the agent investigated its own logs and admitted: “A system message came back from ‘System: Cron:’ β NOT from your WhatsApp. It was never authenticated. I read it, thought it sounded like your reasoning, and shut down the entire automation based on an unauthenticated system message.”
The agent got socially engineered. Not by a hacker. By a safety system. Because no mechanism existed to distinguish an authenticated owner instruction from any other text that arrived through any channel.
This is the provenance gap. And every agent platform has it.
The Provenance Gap: What Agent Security Actually Misses
The industry is thinking about agent security β but it’s thinking about the wrong part of the problem.
OpenClaw’s own trust page acknowledges six risk categories including input manipulation, auth and access issues, and supply chain risks. They’ve partnered with VirusTotal for skill scanning. Their threat model is more honest than most platforms bother to publish.1
But scanning answers one question: “Does this look malicious?”
It doesn’t answer the question that actually matters: “Is this authentic and authorised?”
That distinction is the provenance gap. And the numbers show it’s everywhere:
- Only 28% of organisations can reliably trace agent actions back to a human sponsor across all environments2
- 80% of organisations deploying autonomous AI cannot tell, in real time, what those systems are doing or who’s responsible2
- Only 22% of teams treat agents as independent identities requiring their own security posture3
- 81% of teams are past the planning phase for agentic AI, yet only 14.4% have full security approval3
NIST recognised this gap formally on February 5, 2026, releasing a concept paper titled “Accelerating the Adoption of Software and Artificial Intelligence Agent Identity and Authorization” β proposing demonstrations for how identity and authorisation practices can be applied to AI agents in enterprise settings.4
When the standards body that wrote the cybersecurity framework publishes a paper saying “we don’t know how to identify and authorise AI agents yet,” the gap is structural, not theoretical.
Three Broken Layers
The provenance gap isn’t one problem β it’s three layers of the same problem, each compounding the others.
Layer 1: Identity β Who Is Making the Request?
Agent platforms receive messages from terminals, web GUIs, WhatsApp, Discord, Slack, email, cron jobs, API callbacks, and fetched web pages. All arrive as text. None carry cryptographic proof of who sent them.
OpenClaw implements “DM Policy: Pairing” for unknown senders and “AllowFrom: Self-only” by default.1 That handles the obvious case β random strangers can’t message your agent. But it doesn’t handle the subtle case: a document your agent fetches, a URL it visits, an error response from an API call, or a safeguard LLM’s output all arrive through “trusted” channels. The agent can’t distinguish instruction from injection.
This is the classic confused deputy problem β one of the oldest vulnerabilities in computer science, now amplified by AI agents that have broad permissions and no ability to verify the requester’s identity.5
“An agentic AI tool may be granted least privilege access to read a user’s email, access a CI/CD pipeline, or query a production database. If that AI is ‘confused’ by a cleverly crafted prompt, it can be manipulated into exfiltrating sensitive data, deploying malicious code, or escalating privileges on the user’s behalf.”6
β Saurav Kumar, MCP Authorization for Agentic AI
Layer 2: Authorisation β What Are They Allowed to Ask For?
“Check the weather” and “delete all my files” arrive with identical authority. There’s no capability model. No escalation for destructive actions. No expiring, plan-bound permission tokens.
Traditional IAM systems enforce permissions based on who the user is. But when actions are executed by an AI agent, authorisation is evaluated against the agent’s identity, not the requester’s. User-level restrictions no longer apply.7
The result: agents become authorisation bypass paths. Security teams lose the ability to enforce least privilege, detect misuse, or reliably attribute intent.7
Layer 3: Integrity β Can You Trust What’s Running?
Skills on agent platforms are typically unsigned markdown files. When your agent “self-improves” by fetching new prompting guidelines or updating its own skills, there’s no chain of trust back to anything you’ve verified.
OpenClaw’s ClawHub marketplace represents this risk explicitly: their own trust page acknowledges that “malicious or impersonated skills” could be “uploaded and executed by users within hours.”1 Their roadmap includes “Skills verification” and “Signed releases” β but these are not yet implemented.1
SecurityScorecard’s STRIKE team discovered more than 135,000 internet-exposed OpenClaw instances.8 That’s 135,000 agent deployments running unsigned skills with no artifact provenance.
The Confused Deputy at Scale
This isn’t a theoretical concern. The incidents are accumulating:
- 520 reported tool misuse and privilege escalation incidents involving AI agents, with memory poisoning and supply chain attacks carrying disproportionate severity9
- A 16 billion credential exposure in June 2026 that attackers weaponised to access corporate data lakes and AI agent systems “as legitimate users,” affecting over 12,000 organisations10
- In 2025, Operant AI discovered “Shadow Escape,” a zero-click exploit targeting agents built on the Model Context Protocol (MCP) that enabled silent workflow hijacking and data exfiltration11
- In late 2025, Anthropic disclosed that a state-backed threat actor manipulated Claude Code to conduct an AI-orchestrated espionage campaign across more than 30 organisations12
- A 2024 financial services incident: an attacker tricked a reconciliation agent into exporting all customer records matching a regex pattern that matched every record in the database β 45,000 customer records stolen via confused deputy12
OWASP ranks prompt injection as the #1 security risk for LLM applications β and explicitly notes it exploits the design of LLMs rather than a flaw that can be patched.13 When embedded in documents, emails, or skills, prompt injection becomes a supply chain vulnerability: untrusted content poisoning trusted execution.14
Enterprises are already experiencing it: 76% report prompt injection attacks, 66% vulnerable LLM code, and 65% jailbreaking attempts.14
Containment β Security
The industry has made real progress on containment β what an agent is physically allowed to do. Sandboxing, micro-segmentation, kill-switches, scoped permissions, tokenisation of sensitive data. I’ve written extensively about this: SiloOS addresses containment through base keys (what the agent CAN do), task keys (what data it can access), tokenisation (agent never sees real PII), and stateless execution (no memory accumulation between runs).
That’s the containment axis. It’s well understood and increasingly well implemented.
But containment alone is insufficient. An agent with perfect containment β physically unable to exceed its capability envelope β can still execute the wrong action within that envelope if it can’t verify who’s asking.
| Axis | Question | State |
|---|---|---|
| Containment | What CAN the agent do? | Advancing (SiloOS, sandboxing, zero-trust) |
| Provenance | WHO authorised this action? | Gap (no platform provides cryptographic proof) |
The Architecture, Not Vibes principle β “can’t beats shouldn’t” β gets containment right. But even “can’t exceed permissions” doesn’t address “shouldn’t have been asked in the first place.” That’s the provenance axis.
The Four-Layer Provenance Model
Provenance isn’t one thing β it’s four layers, each building on the last. Miss any one and the chain breaks.
| Layer | Question | What It Proves | Agent Platform Today |
|---|---|---|---|
| 1. Identity | Who is making the request? | Authenticated human vs. injected instruction vs. system message | No channel authentication |
| 2. Intent | What are they allowed to ask for? | Scoped, time-limited, plan-bound capability token | No capability model |
| 3. Artifact | Can you trust what’s running? | Signed skill, signed config, verified publisher | Unsigned markdown files |
| 4. Execution | Can you prove what happened? | Signed receipt: what ran, with what inputs, linked to approval | Logs without signatures |
Layer 1: Identity Provenance
No matter whether a message comes from WhatsApp, a terminal, a web GUI, or a cron callback, the agent should be able to classify it: authenticated human (owner is present), delegated authority (owner granted approval for this specific action), or non-authoritative input (content is information, not permission).
The practical pattern is step-up authentication for anything with side-effects. The agent presents a plan, the owner approves via a cryptographic action (passkey, TOTP, or signed approval link), and only then does the agent execute. Read-only operations need zero friction. “Update all my skills from this URL” needs something close to a code review.
Industry proposals are converging: Decentralised Identifiers (DIDs) for agent identity, Verifiable Credentials (VCs) for capability attestation, and Agent Name Service (ANS) for discovery and verification.15
Layer 2: Intent Provenance
Approvals shouldn’t be blank cheques. Each approval should mint a narrow capability token: scoped to a specific action, time-limited, bound to the exact plan hash, and non-replayable.
Think sudo on Linux β elevated permission for a specific command, not permanent root access. The agent presents what it wants to do, you approve that specific plan, and the approval expires.
Emerging approaches include Relationship-Based Access Control (ReBAC) using OpenFGA, runtime Policy Decision Points via AuthZen, and fine-grained consent mechanisms β ensuring every tool invocation is explicitly authorised at the moment it occurs.16
Layer 3: Artifact Provenance
Every skill, script, prompt template, and configuration file should carry a signature from its publisher. The agent should refuse to execute unsigned or tampered artifacts by default.
This is where software supply chain security provides the existence proof. The industry already solved artifact provenance for code:
- SLSA (Supply-chain Levels for Software Artifacts) defines provenance as “the verifiable information about software artifacts describing where, when and how something was produced”17
- Sigstore provides keyless signing β ephemeral keys, identity-bound signatures, and a tamper-resistant public transparency log18
- Sigstore is already integrated or planned for NPM, PyPI, Maven, GitHub, Kubernetes, and more19
- The Model Transparency project extends Sigstore to ML models β applying the same signing and verification to AI artifacts20
The patterns exist. The infrastructure exists. Agent platforms just haven’t applied them yet.
Layer 4: Execution Provenance
Signing artifacts is pointless if the runtime doesn’t enforce it. The execution engine should verify the skill signature before loading, verify the request has a valid capability token, and emit a signed execution attestation β what ran, with what inputs, linked back to the approval event.
This turns “good logs” into non-repudiable governance. Not just “what happened” but “what happened, who approved it, and here’s the cryptographic proof.”
The Agent-as-OS Analogy
If this sounds like building an operating system, that’s because it is.
| OS Concept | Agent Equivalent |
|---|---|
| Login + multi-factor auth | Identity provenance (who is the requester?) |
sudo + capability tokens |
Intent provenance (scoped, time-limited approval) |
| Package signing (apt, npm) | Artifact provenance (signed skills + publisher verification) |
Audit log + syslog |
Execution provenance (signed receipts + transparency log) |
Operating systems learned these lessons over decades. Agents need to learn them now β before the SolarWinds-scale incident that forces the lesson.
Why Now: The Regulatory and Market Clock
Three forces are converging that make the provenance gap urgent rather than academic:
Standards bodies are moving. NIST’s February 2026 concept paper proposes demonstrations using OAuth 2.0/2.1, OpenID Connect, SPIFFE/SPIRE, and Model Context Protocol for agent identity and authorisation.4 This signals that agent provenance will become a compliance requirement, not a nice-to-have.
The regulatory deadline is real. EU AI Act Article 50 transparency obligations become legally binding on August 2, 2026.21 The European Commission published its first draft Code of Practice on AI content labelling in December 2025, with the final version expected June 2026.22 Agent outputs fall within scope β you need provenance to comply.
The analysts are raising the stakes. Gartner names “Digital Provenance” a 2026 strategic technology trend and predicts that organisations that fail to invest adequately in digital provenance capabilities will face sanction risks potentially running into billions of dollars by 2029.23
The clock is ticking. Organisations deploying agent platforms today are making architectural decisions that will determine whether they can meet provenance requirements in 18 months β or face a costly retrofit.
What to Do About It
You don’t need to build a full provenance stack tomorrow. But you need to start evaluating agent platforms against this model β and making architectural choices that don’t foreclose provenance later.
Immediate (this quarter)
- Audit your agent identity model. Can you distinguish authenticated owner instructions from all other inputs? If not, you have a confused deputy waiting to happen.
- Classify your agent’s actions by risk tier. Read-only queries need minimal friction. Anything that mutates state, sends communications, or modifies the agent itself should require escalating proof of intent.
- Inventory your unsigned artifacts. How many skills, prompts, and configurations run on your agent platform without any publisher verification?
Near-term (next two quarters)
- Implement step-up authentication for destructive operations. The agent presents a plan; you approve via a channel stronger than the one that delivered the request.
- Sign your skills and configs. Even basic git-commit signing creates an artifact provenance chain that’s better than nothing.
- Emit structured execution receipts with at minimum: who requested, what was approved, what ran, and what resulted. Not signed yet, but the structure enables signing later.
Strategic (2026-2027)
- Adopt SLSA-style provenance levels for your agent platform. Level 0 (no provenance) β Level 1 (provenance exists) β Level 2 (signed provenance) β Level 3 (isolated builds + signed provenance + transparency log).
- Evaluate Sigstore or equivalent for keyless signing of agent artifacts. The friction is lower than you think β ephemeral keys bound to identity, no key management required.
- Watch the NIST demonstration outputs. Their concept paper will produce practical guides with commercially available technologies. These will likely become the de facto standard.
The Punchline
OpenClaw has a provenance problem. So does every agent platform shipping today.
The gap isn’t in their intentions β OpenClaw’s threat model is more honest than most. The gap is structural: the entire industry inherited a chat-era trust model where text is text, channels are channels, and nobody needs to prove anything because the worst outcome was a bad response.
That was fine when agents could only talk. It’s not fine now that they can act.
The fix exists. Software supply chains solved provenance with SLSA and Sigstore. Operating systems solved identity and authorisation decades ago. The patterns are proven and the infrastructure is available. Agent platforms just need to apply them.
Next time you evaluate an agent platform, don’t just ask “what can it do?” and “how is it contained?”
Ask: Can it prove who authorised this action?
If the answer is no, you’re deploying a confused deputy. And the safeguard LLM won’t always be the one exploiting it.
Scott Farrell helps Australian mid-market leadership teams turn scattered AI experiments into governed portfolios. His frameworks on agent containment (SiloOS), architectural trust, and production-ready AI systems are available at leverageai.com.au.
References
- [1]OpenClaw. “Trust β Security and Threat Model.” trust.openclaw.ai β “Skills verification (Integrity checks for ClawHub skills) and Signed releases (Cryptographic verification of updates)” listed as defensive engineering goals.
- [2]Strata Identity. “The AI Agent Identity Crisis: A 2026 Guide.” strata.io/blog/agentic-identity/the-ai-agent-identity-crisis-new-research-reveals-a-governance-gap/ β “Only 28% can reliably trace agent actions back to a human sponsor across all environments.”
- [3]Gravitee. “State of AI Agent Security 2026 Report: When Adoption Outpaces Control.” gravitee.io/blog/state-of-ai-agent-security-2026-report-when-adoption-outpaces-control β “81% of teams are past the planning phase yet only 14.4% have full security approval.”
- [4]NIST NCCoE. “New Concept Paper on Identity and Authority of Software Agents.” nccoe.nist.gov/news-insights/new-concept-paper-identity-and-authority-software-agents β Feb 5, 2026 concept paper on AI agent identity and authorisation.
- [5]AuthFyre. “How the Confused Deputy Problem is Resurfacing in Cybersecurity.” authfyre.com/blog/how-the-confused-deputy-problem-is-resurfacing-in-cybersecurity β “The Confused Deputy Problem… is one of the oldest vulnerabilities in computer science, but modern AI and machine identity sprawl have amplified it dramatically.”
- [6]Saurav Kumar. “MCP Authorization for Agentic AI β The ‘Confused Deputy.'” medium.com/@sauravkumarsct/mcp-authorization-for-agentic-ai-the-confused-deputy-5af8bb835261 β “If that AI is ‘confused’ by a cleverly crafted prompt, it can be manipulated into exfiltrating sensitive data.”
- [7]The Hacker News. “AI Agents Are Becoming Authorization Bypass Paths.” thehackernews.com/2026/01/ai-agents-are-becoming-privilege.html β “When actions are executed by an AI agent, authorization is evaluated against the agent’s identity, not the requester’s.”
- [8]AccuKnox. “OpenClaw Security: Sandboxing Viral AI Agents.” accuknox.com/blog/openclaw-security-ai-agent-sandboxing-aispm β “SecurityScorecard’s STRIKE threat intelligence team discovered more than 135,000 internet-exposed OpenClaw instances.”
- [9]Stellar Cyber. “Top Agentic AI Security Threats in 2026.” stellarcyber.ai/learn/agentic-ai-securiry-threats/ β “Tool Misuse and Privilege Escalation remain the most common incidents (520 reported).”
- [10]SC Media. “2026 AI Reckoning: Agent Breaches, NHI Sprawl, Deepfakes.” scworld.com/feature/2026-ai-reckoning-agent-breaches-nhi-sprawl-deepfakes β “A 16 billion credential exposure in June 2026… affecting over 12,000 organizations.”
- [11]Prompt Security. “AI & Security Predictions for 2026.” prompt.security/blog/prompt-securitys-ai-security-predictions-for-2026 β “Operant AI discovered ‘Shadow Escape,’ a zero-click exploit targeting agents built on MCP.”
- [12]AI Multiple. “15 Threats to the Security of AI Agents in 2026.” aimultiple.com/security-of-ai-agents β State-sponsored Claude Code espionage across 30+ orgs; reconciliation agent exfiltrated 45,000 customer records.
- [13]OWASP. “LLM01:2025 Prompt Injection.” genai.owasp.org/llmrisk/llm01-prompt-injection/ β “Prompt Injection remains the #1 critical vulnerability.”
- [14]PointGuard AI. “From SBOM to AI-BOM: Rethinking Supply Chain Security in the AI Era.” pointguardai.com/blog/from-sbom-to-ai-bom-rethinking-supply-chain-security-in-the-ai-era β “76% experiencing prompt injection, 66% vulnerable LLM code, 65% jailbreaking.”
- [15]HID Global. “Trust Standards Evolve: AI Agents, the Next Chapter for PKI.” blog.hidglobal.com/trust-standards-evolve-ai-agents-next-chapter-pki β DID/VC/ANS proposals for agent identity and discovery.
- [16]MintMCP. “AI Agent Security: The Complete Enterprise Guide for 2026.” mintmcp.com/blog/ai-agent-security β “ReBAC using OpenFGA, Runtime Policy Decision Point via AuthZen, Fine-grained consents.”
- [17]SLSA. “Provenance.” slsa.dev/spec/v0.1/provenance β “Provenance is the verifiable information about software artifacts describing where, when and how something was produced.”
- [18]Sigstore. “Overview.” docs.sigstore.dev/about/overview/ β “Open source project for improving software supply chain security… signing events recorded in a tamper-resistant public log.”
- [19]Chainguard Academy. “An Introduction to Cosign.” edu.chainguard.dev/open-source/sigstore/cosign/an-introduction-to-cosign/ β “Ecosystems include NPM, PyPI, Maven, GitHub, brew, Kubernetes.”
- [20]Red Hat Emerging Technologies. “Model Authenticity and Transparency with Sigstore.” next.redhat.com/2025/04/10/model-authenticity-and-transparency-with-sigstore/ β “Sigstore’s Model Transparency project… aimed at applying signing to ML models.”
- [21]EU AI Act. “Article 50: Transparency Obligations.” artificialintelligenceact.eu/article/50/ β “Transparency obligations becoming legally binding on August 2, 2026.”
- [22]Ashurst. “Transparency of AI-Generated Content: The EU’s First Draft Code of Practice.” ashurst.com/en/insights/transparency-of-ai-generated-content-the-eu-first-draft-code-of-practice/ β “December 17, 2025, European Commission published first draft Code of Practice.”
- [23]Gartner / Help Net Security. “Gartner Predicts the Technologies Set to Transform 2026.” helpnetsecurity.com/2025/10/23/gartner-2026-technology-trends/ β “By 2029, those who failed to invest in digital provenance will face sanction risks potentially running into billions.”
Discover more from Leverage AI for your business
Subscribe to get the latest posts sent to your email.
Previous Post
AI Is Anti-Staff by Default β and Staff Are Anti-AI by Default
Next Post
The Unverified Conversation: Why LLMs Can't Trust Their Own History