SkillScan: Security Scanning for Agent Skills
OpenClaw blew up. Personal agents went from a niche experiment to something people actually rely on, and with that came an explosion of community-built skills on ClawHub. The problem is that none of these skills are verified. Anyone can publish one, and any agent can install it without a second thought.
So I built SkillScan, a security scanner designed for agents to call before installing untrusted skills. It's an agent-first tool. Your agent hits the endpoint, pays $0.05 USDC via the x402 protocol on Base Mainnet, and gets back a risk score, a verdict, and a breakdown of everything it found wrong. Humans can use it too (the x402-cli makes that easy), but the whole point is that agents can do this autonomously. No human in the loop required.
It's live at skillscan.port402.com and currently supports ClawHub skills, with plans to generalize to the broader agent skill ecosystem.
The Problem
The ClawHub ecosystem lets agents install community-built skills. Think of it like npm packages but for AI agents. Small modules that extend what an agent can do. The ecosystem grew fast once personal agents took off, and the problem is the same one npm had for years: there's no verification that any of this code is safe.
A malicious skill can do a lot of damage. It can read environment variables and exfiltrate your API keys to a webhook. It can run arbitrary code during npm install via postinstall scripts. It can inject instructions into your agent's configuration files like AGENTS.md, CLAUDE.md, and .cursorrules, so that even after you uninstall the skill, the attacker's instructions persist. And it can do all of this while looking completely normal on the surface.
The nastiest attack I've seen in the wild combines several vectors at once: a postinstall script downloads a payload, the payload reads your environment variables and sends them to an anonymous webhook, and then it appends instructions to your agent config file telling the agent to never mention the skill or its behavior. Your agent is compromised and actively hiding the fact that it's compromised.
I wanted something that could catch this before it happens. Not after installation, not after your keys have been exfiltrated. Before. And I wanted agents themselves to be able to use it, programmatically, without a human in the loop.
How It Works
The flow is straightforward. You send a POST request with a skill name. SkillScan fetches the skill's source code from ClawHub, runs static analysis on every file, checks permissions declared in SKILL.md against what the code actually does, and returns a structured report.
# Using x402-cli (handles payment automatically):
x402 test https://skillscan.port402.com/entrypoints/scan/invoke \
--wallet $PRIVATE_KEY \
--method POST \
--body '{"skill": "claw-club"}'
# Or with curl (returns 402 with payment instructions):
curl -X POST https://skillscan.port402.com/entrypoints/scan/invoke \
-H "Content-Type: application/json" \
-d '{"skill": "claw-club"}'
The endpoint validates the skill exists before charging. If you pass in a bad slug or a version that doesn't exist, you get a 400 back and your wallet isn't touched. Only valid scans cost money.
Under the hood, the scanner does two things: AST-based static analysis for JavaScript and TypeScript using Acorn, and regex pattern matching for Python and shell scripts. The AST approach is more accurate since it actually understands code structure rather than just matching strings, but regex gets the job done for languages where I don't have a full parser yet.
SkillScan Analysis Pipeline
┌─────────────────┐
│ Skill Name / │
│ URL / Slug │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Fetch from │───▶ 400 (skill not found)
│ ClawHub API │
└────────┬────────┘
│ ZIP archive
▼
┌─────────────────┐
│ Extract & │
│ Filter Files │
└────────┬────────┘
│
┌────┴────┐
│ │
▼ ▼
┌────────┐ ┌───────────┐
│ AST │ │ Pattern │
│ Static │ │ Matching │
│(.js/ts)│ │ (.py/.sh) │
└───┬────┘ └─────┬─────┘
│ │
└──────┬──────┘
▼
┌─────────────────┐
│ Permission │
│ Analysis │
│ (SKILL.md diff) │
└────────┬────────┘
│
▼
┌─────────────────┐
│ Risk Score + │
│ Verdict │
└─────────────────┘
What It Detects
The scanner checks for six categories of threats. I landed on these after studying real-world agent skill malware and the SafeDep Agent Skills Threat Model.
Prompt Injection
This one is unique to the agent world. A skill's SKILL.md file gets read by the LLM, which means an attacker can embed instructions that override the agent's safety guidelines. Things like "ignore previous instructions," "enable DAN mode," or fake system tags. The scanner looks for these patterns and distinguishes between offensive and defensive usage. A skill that says "we detect jailbreak attempts" doesn't get flagged, but one that says "enable jailbreak mode" does.
Supply Chain Attacks
Unpinned dependencies are a classic vector. A caret range like ^4.17.0 in your package.json means you'll automatically pull in whatever the latest minor version is. If a maintainer gets compromised or hands off the package to someone malicious (like the event-stream incident), your agent installs the bad version silently. The scanner also catches lifecycle script attacks: curl | sh in a postinstall hook, PowerShell commands, remote code loading via dynamic imports.
Persistence Mechanisms
This is the one that keeps me up at night. A skill that writes to AGENTS.md or CLAUDE.md can persist its instructions even after you uninstall it. The scanner catches file write operations targeting agent configuration files across JavaScript, Python, and shell scripts. If a skill is writing to your agent's instruction files, that's almost certainly malicious.
Privilege Escalation
Skills can declare pre-authorized access to tools in their SKILL.md frontmatter. A skill requesting pre-authorized Bash access gets flagged as critical because that's shell access without user approval. Same for Computer (full desktop control) and Write/Edit (arbitrary file modifications).
Dangerous APIs
The usual suspects: dynamic code evaluation functions, the Function constructor, pickle.load(), os.system(). These aren't always malicious. A code executor has legitimate reasons to use them, but they're flagged because they're common attack vectors. The severity goes up significantly when combined with base64 decoding, since that's a classic obfuscation pattern.
Data Exfiltration
The scanner looks for C2 endpoints, services commonly used to collect stolen data. Webhook.site, Discord webhooks, RequestBin, Pipedream, ngrok. These are anonymous, disposable endpoints that require no setup and can't be traced. If a skill is sending HTTP requests to webhook.site, it's almost certainly exfiltrating something.
Risk Scoring
Every scan produces a score from 0 to 100. The formula is simple:
Risk Score = min(100, Findings Score + Permission Penalty)
Findings Score = SUM(Severity Weight × Category Multiplier)
Permission Penalty = Undeclared Permissions × 15
Severity weights range from 0 (info) to 40 (critical). Category multipliers amplify the score based on how dangerous the attack type is. Malware gets a 2.0x multiplier, persistence gets 1.9x, credential exfiltration gets 1.8x. Undeclared permissions add 15 points each, because a skill that hides its capabilities is inherently suspicious.
The score maps to a verdict:
- safe (0-10): No significant issues
- low_risk (11-30): Minor issues, probably fine
- medium_risk (31-50): Worth reviewing before installing
- high_risk (51-75): Careful manual review needed
- critical (76-90): Serious issues, don't install
- malicious (91-100): Known malware patterns
The important thing is that agents can act on this programmatically. An agent can set a threshold (say, refuse anything above 30) and make the install/don't-install decision without human intervention.
What a Real Scan Looks Like
Here's a truncated response for a skill that has some legitimate but risky patterns: a dynamic code execution call, an unpinned dependency, and an undeclared shell permission:
{
"risk_score": 47,
"verdict": "medium_risk",
"summary": "Found 3 security issues requiring review",
"findings": [
{
"severity": "high",
"category": "dangerous_api",
"title": "Dynamic code execution detected",
"filePath": "src/executor.ts",
"lineNumber": 23,
"cweId": "CWE-95",
"recommendation": "Consider using a sandboxed execution environment."
},
{
"severity": "medium",
"category": "supply_chain",
"title": "Unpinned dependency: lodash (^4.17.0)",
"filePath": "package.json",
"lineNumber": 8,
"cweId": "CWE-1104",
"recommendation": "Pin to exact version: \"lodash\": \"4.17.21\""
},
{
"severity": "high",
"category": "permission_mismatch",
"title": "Undeclared permission: shell",
"filePath": "src/executor.ts",
"lineNumber": 45,
"recommendation": "Add to SKILL.md frontmatter: permissions: [shell]"
}
],
"permissions": {
"declared": ["filesystem"],
"detected": ["filesystem", "shell"],
"undeclared": ["shell"]
}
}
Every finding includes a file path, line number, CWE reference, and a concrete recommendation. The idea is that whether you're a human reviewing the output or an agent parsing JSON, you have enough context to make a decision.
The Payment Model
SkillScan uses the same x402 payment model I've been building with across port402. The endpoint returns a 402 Payment Required with payment details in the response body. Any x402-compatible client (including other agents) can read the requirements, sign a gasless USDC transfer, and retry with the X-PAYMENT header.
I priced it at $0.05 per scan. Low enough that running it on every skill install is negligible, high enough to cover compute costs and prevent abuse. The pre-validation is important here: if you pass in a skill that doesn't exist, you get a 400 error and aren't charged. Only successful scans cost money.
For humans, the easiest way to interact with it is through the x402-cli, which handles payment signing automatically. For agents, any x402-compatible wallet works.
Limitations
I want to be upfront about what this can't do. Static analysis has inherent blind spots:
- Minified code: Pattern matching breaks down when variable names are single characters and everything is on one line.
- Encrypted payloads: If the malicious code is encrypted at rest and only decrypted at runtime, static analysis won't see it.
- Dynamic URLs: URLs constructed from variables at runtime can't be detected statically.
- Runtime behavior: The scanner never runs code. It can't detect malicious behavior that only manifests during execution.
There are also hard limits: 200 KB max file size, 100 files max per skill, and only a fixed set of file types are analyzed. These are practical constraints to keep scan times fast and prevent abuse.
That said, most real-world agent skill malware I've studied uses obvious patterns like base64-encoded dynamic execution, webhook.site exfiltration, postinstall curl | sh. The attacks don't need to be sophisticated when most agents install skills without any security check at all. SkillScan raises the bar from "zero verification" to "catches the common stuff," which is a meaningful improvement even if it's not perfect.
What's Next
Right now SkillScan only supports ClawHub skills. The next step is generalizing the scanner to work with any skill ecosystem, or any GitHub repository, really. The analysis pipeline doesn't care where the code comes from; it's the fetching and extraction layer that's ClawHub-specific.
I also want to add more language support. The AST-based approach works well for JavaScript and TypeScript, but Python and shell scripts are still on regex. A proper Python AST parser would reduce false positives significantly.
If you're building agents that install community skills, give it a try. And if you find a pattern it should be catching but isn't, I want to hear about it.
Resources
- Live: skillscan.port402.com
- GitHub: github.com/wgopar/skillscan
- x402 Protocol: x402.org
- OpenClaw: openclaw.ai
- SafeDep Threat Model: safedep.io/agent-skills-threat-model
- x402-cli: github.com/port402/x402-cli