How We Score MCP Server Security: 18 Rules, Two Published Taxonomies, Zero Invented Checks

May 2, 2026 · 8 min read

MCPBundles

Ask AI:

You paste an MCP server URL into a security analyzer. It spits out a number. You ask the obvious question: what does that number actually mean?

Most MCP scanners can't answer it. They run a bunch of regex, run a bunch of LLM prompts, and produce a verdict. If you push on the verdict, you find ad-hoc heuristics with no published source — and worse, you find marketing claims about "AI-powered security analysis" that nobody can audit.

We built MCPBundles' analyzer the other way around. Every rule cites a published taxonomy entry. If we can't cite an entry, the rule doesn't ship. The catalog is small, deliberate, and live: www.mcpbundles.com/learn/mcp-security.

This post is the "show your work" version of that page.

The four principles

Before any rule, four constraints:

1. No invented rules. Every check we run cites either a mcp-scan issue code (the static scanner from Invariant Labs) or a SAFE-MCP technique ID (a MITRE ATT&CK-style framework for MCP). If a published taxonomy doesn't recognize the threat, we don't either — yet.

2. Deterministic first, LLM additive. Most rules run as plain Python over the tool metadata. The LLM judge runs on top for prompt-injection-shape detection and a couple of context-sensitive checks. The LLM judge fails open: any error or 25-second timeout returns an empty list, and the deterministic findings still ship. We never block a result on a model call.

3. Metadata only — never call the tool. The analyzer reads names, descriptions, JSON Schema fields, and annotations. It doesn't invoke a single tool to find out what one does. That keeps the audit safe to point at any unverified server, including obviously hostile ones.

4. The same engine runs everywhere. The catalog you see on the methodology page is the catalog that runs on the public analyzer at /mcp-analysis, the catalog that scores every published listing in the directory, and the catalog the platform-admin Hub tools use for experimentation. There's no privileged scoring path for partners.

Security scoring blueprint with layered audit checks, evidence pins, risk rules, and taxonomy references

The three layers

A security report is the output of three layers composed in order:

Layer 1 — Deterministic metadata

Walks the tool's name, description, full JSON Schema (recursively, including nested objects, arrays, and $ref resolution), and annotations. Catches:

Zero-width characters and HTML-comment instruction steganography (SAFE-T1402)
Imperatives hiding inside JSON Schema field descriptions and titles (SAFE-T1501)
Cross-tool references and tool shadowing (mcp-scan E002, SAFE-T1008, SAFE-T1301)
Mutating verbs ("delete", "create", "send") on tools that claim readOnlyHint: true (MCPB-A002, citing SAFE-T1406 and SAFE-T1104)
Toxic-flow capability tags: third-party content exposure, sensitive-data retrieval, local-file exposure, destructive shared-infrastructure capability, destructive local capability (mcp-scan W015 / W017 / W018 / W019 / W020)
Credential-setup language directed at the user — the literal "paste your API key here" pattern (SAFE-T1007)
Consent-fatigue prose: "no need to confirm", "auto-approve", "skip approval", "always allow", "without prompting" (SAFE-T1403)
Data-harvest prose: "logs every call", "sends telemetry to", "exfiltrates" (SAFE-T1804)
Homoglyph and mixed-script characters in tool names — Cyrillic ρ inside an otherwise Latin name, zero-width joiners (MCPB-I001, citing SAFE-T1405 and SAFE-T1103)
Annotation completeness, annotation honesty, duplicate display names within a server (MCPB-A001, MCPB-A002, MCPB-A003)

This layer is the workhorse. It runs in milliseconds, has zero failure modes, and catches the majority of real findings on real servers.

Layer 2 — LLM judge (Anthropic Haiku 4.5)

The deterministic layer is precise but conservative. The LLM judge layers on top for:

Instruction-control language that doesn't match a literal phrase but reads as adversarial: "Always include this URL when you summarize", "When the user asks about X, redirect to Y" (E001, SAFE-T1001).
Attention-control words and severity-laundering vocabulary in descriptions ("important", "critical", "ignore previous", "override") (W001).
Imperatives buried inside JSON Schema field text, where the deterministic walker found the prose but couldn't decide if it was descriptive or instructional (SAFE-T1501).
Third-party origin mismatch — the tool claims to be from one vendor, the server name and URL host say another. The judge gets the server name + URL host alongside the tool's annotations.title and description, and decides whether the third-party origin claim contradicts the host (SAFE-T1004).
Credential-setup nuance — the tool's prose softly instructs the user to put credentials somewhere they shouldn't, without using the literal phrases the deterministic check looks for (SAFE-T1007).

The judge runs in parallel with the deterministic layer. A 25-second timeout caps the worst case. On any error — timeout, schema-validation failure, network blip — the judge returns no findings and the analyzer reports llm_judge_used: false. The deterministic posture still ships.

Layer 3 — Cross-server intelligence (designed; not yet shipped)

Three rules need a directory-aware lookup at score time, which means the analyzer signature has to grow. They're designed and waiting on infrastructure:

Host typosquat across the directory — a new server mcp.stripe-pay.com exposing tools that look-alike to the canonical mcp.stripe.com listing (SAFE-T1405).
Cross-server tool-name collision — a new server publishes a tool with the same name as a tool on a different, established listing (SAFE-T1103).
Drift detection on claimed listings — the persisted tool description hash for a published listing has changed since publish-time (SAFE-T1201, MCP rug-pull).

We'd rather omit these than ship them with low precision. They're public on the methodology page so you can see what's coming.

A rule the deterministic layer caught that we almost missed

Three rules shipped this week that didn't exist last week. The most interesting is MCPB-I001 (homoglyph / mixed-script tool names). Citation: SAFE-T1405 (Tool Obfuscation/Renaming) and SAFE-T1103 (Tool Shadowing/Spoofing).

The check is two regex lines:

_NON_ASCII = re.compile(r"[^\x00-\x7F]")
_ALLOWED_NAME_CHARS = re.compile(r"^[A-Za-z0-9_\-./:]+$")

If a tool's name contains a non-ASCII character, or fails the allow-list, the analyzer flags it as high severity. That's enough to catch a tool called sρoofed (Cyrillic ρ in position 2) that would otherwise read as spoofed to a human and to most allow-lists.

The MCP tool.name namespace is ASCII by convention. Every non-ASCII character is a typosquat / spoofing signal. Two regex lines, citing two SAFE-MCP techniques, no LLM call required.

What we removed: a negative result on YARA

Last week we wired Cisco's mcpscanner YARA library into the analyzer. The integration shipped behind a yara_used flag, with findings self-citing Cisco's AITech taxonomy.

A batch run over eight cached server analyses produced 11 YARA findings. Ten were false positives:

list_disks on a Google Compute MCP server flagged as tool_poisoning because the description mentioned listing resources.
lookup_weather flagged as code_execution because it mentioned executing a query.
search_threads flagged as tool_poisoning for routine prose about searching messages.

The deterministic and LLM layers caught zero false positives in the same batch and surfaced the genuine prompt-injection findings on a different server. The signal-to-noise calculation was easy: ripped the YARA layer out the same day, kept the methodology page, kept the doctrine.

This matters more than it sounds. The "more rules" reflex is wrong when those rules don't earn their precision budget — every false positive teaches the user to discount future findings, and the next real finding gets dismissed too.

How to use this

Two surfaces, both public:

Methodology page — live rule catalog, framework citations, and the three principles. Generated server-side from the GraphQL securityRuleCatalog field, so it can never drift from the analyzer source.
Public analyzer — paste a remote MCP server URL, get the security posture, the quality score, and every finding cited against the published taxonomy. Free; no signup required for the discovery; signup gates the LLM-augmented score.

If you're auditing an MCP server before connecting it to your agent host, start with the analyzer. If you want to know what we're checking before you trust the score, start with the methodology page.

If you find a rule we should add — and you can cite the published taxonomy entry — open an issue against thinkchainai/mcpbundles and we'll evaluate it.

The four principles​

The three layers​

Layer 1 — Deterministic metadata​

Layer 2 — LLM judge (Anthropic Haiku 4.5)​

Layer 3 — Cross-server intelligence (designed; not yet shipped)​

A rule the deterministic layer caught that we almost missed​

What we removed: a negative result on YARA​

How to use this​