Skip to main content

Tool Design Matters More Than Execution Mode

· 8 min read
MCPBundles

Anthropic's recent blog post about code execution with MCP got everyone excited about converting tool calls into code. But I think we're optimizing the wrong thing.

Cartoon illustration of a person designing clean, simple MCP tools with clear names and predictable parameters
Good tool design—clear names, predictable parameters, obvious purpose—makes tools easy for AI to understand and use correctly.

The Real Problem Isn't Tool Calls vs. Code

Yeah, tool overload is real. When you connect to hundreds of MCP servers, loading every tool definition upfront gets expensive. Anthropic's right about that part.

But their solution—converting tools to code APIs—misses the bigger picture.

Whether you're loading tool definitions or reading TypeScript files, you're still paying the discovery cost. You still need to understand what each tool does, what parameters it takes, and what it returns. The format changed. The work didn't.

The 98.7% token reduction they claim? That only works if you've got 1,000 tools and use 2 of them. Start loading more tools on-demand and the savings disappear. You just moved the cost from upfront to progressive. The total? Pretty much the same.

What Actually Works: Good Tool Design

Here's something interesting. AI works really well with the GitHub CLI and AWS CLI. Why?

Not because they're exposed as code APIs. It's because they're in the training data.

The model has seen thousands of examples of gh pr create --title "Fix bug" and aws s3 cp file.txt s3://bucket/. It knows the patterns. It understands the parameters. It can compose commands without looking up docs.

That's why it's often better to use the CLI tool directly rather than wrapping it in an MCP server. If the AI already knows how to use gh or aws, why add another layer? The CLI is already optimized for AI use—consistent naming, predictable parameters, clear documentation.

But for tools that aren't in the training data, you need to design them well. That's the insight everyone's missing. Tool design matters way more than execution mode.

What Makes a Tool AI-Friendly?

Look at CLIs that work well with AI:

Consistent naming - GitHub uses pr, issue, repo. AWS uses s3, ec2, lambda. You can guess what they do.

Predictable parameters - Most tools follow conventions. --title is obvious. --output json works everywhere. --region us-east-1 is consistent across AWS.

Clear scope - Each tool does one thing. gh pr create creates a PR. aws s3 cp copies files. They don't try to do everything.

Composable - You can chain them. gh pr list --json number | jq '.[0].number' | gh pr view works because each tool has a clear input/output contract.

Now look at typical MCP tools:

{
"name": "stripe_update_record_with_options_and_metadata",
"description": "Updates a Stripe record with optional field validation and metadata refresh",
"parameters": {
"objectType": "string (Customer|Charge|PaymentIntent|Subscription)",
"recordId": "string",
"data": "object (any fields)",
"validateFields": "boolean (optional)",
"refreshMetadata": "boolean (optional)",
"returnFullRecord": "boolean (optional)"
}
}

That's not AI-friendly.

The name is verbose. The parameters have conditional logic baked in. The data field is a blob of whatever. An AI seeing this for the first time has to read the full documentation to understand it.

Cartoon illustration showing contrast between well-designed simple tools and poorly-designed complex tools
Well-designed tools are simple and clear. Poorly-designed tools are verbose and confusing, forcing AI to parse complex schemas.

Better version:

{
"name": "stripe_list_balance_transactions",
"description": "Retrieve a list of balance transactions related to your Stripe account, such as charges, refunds, and payouts. Use this when you need an overview of funds moving in and out of your account. Returns detailed information for each transaction, including amount, type, and status.",
"parameters": {}
}

Or with parameters:

{
"name": "stripe_list_setup_attempts",
"description": "Retrieve a log of setup attempts for payment methods within your account. This is useful for reviewing and troubleshooting the creation of payment setups. It returns detailed records of each attempt, and you can paginate results with 'setup_intent', 'starting_after', and 'limit' parameters.",
"parameters": {
"setup_intent": {
"type": "string",
"description": "Filter setup attempts by setup intent ID. Required by Stripe API."
},
"starting_after": {
"type": "string",
"description": "Pagination cursor/offset token"
},
"limit": {
"type": "integer",
"description": "Number of items per page (default: 100)",
"default": 100
}
}
}

Clear, descriptive name that indicates the operation (list) and the object type (balance_transactions or setup_attempts). Rich descriptions on both the tool and each parameter teach the AI how to use it. The tool description explains what it does, when to use it, and what it returns. Parameter descriptions explain their purpose, format, and usage. The response is filtered to return only essential data, reducing token costs.

The Future: AI Working With Thousands of Tools

Anthropic's code execution approach assumes you need to hide most tools from the model. Load only what you need, filter aggressively, minimize context.

I think that's backwards.

AI is getting better at handling large contexts. We're at 200k now, heading to 1M+. The models are getting smarter at selective attention and retrieval. In a year, loading 1,000 tool definitions won't be a problem.

What will matter is whether the AI can understand those tools quickly.

Cartoon illustration of an AI agent working with thousands of well-organized tools, easily finding and using them
As AI handles larger contexts, well-designed tools will scale. The model can scan and understand thousands of tools quickly if they're designed well.

If your tool is well-designed—clear name, predictable parameters, obvious purpose—the model can scan it and move on. If it's poorly designed, the model has to stop, parse complex schemas, reason about edge cases, and maybe still get it wrong.

But good design isn't just about names. It's about teaching the AI through descriptions. Every tool description should explain what it does, when to use it, and what it returns. Every parameter description should explain its purpose, format, and impact. This is how you make tools discoverable without overwhelming the model.

And don't forget filtering. Well-designed tools return only what's needed by default, reducing token costs. If someone needs more data, they can request it explicitly.

What This Means Right Now

Instead of converting your MCP server to a code API, make your tools better.

Use descriptive, consistent naming - Use patterns like stripe_list_balance_transactions, stripe_list_setup_attempts, stripe_upsert_customer. The name should indicate the operation (list, get, upsert, create) and the object type. Not get_document_with_metadata, but get_document with a description explaining metadata handling.

Rich descriptions on tools AND parameters - The tool description should explain what it does, when to use it, and what it returns. Each parameter description should explain its purpose, format, and how it affects behavior. This is how you teach the AI without bloating the schema.

One tool, one job - If your tool has 15 optional parameters that change its behavior, split it into multiple tools. stripe_list_balance_transactions and stripe_get_balance_transaction are separate tools, not one tool with a detailed flag.

Predictable schemas - Use common parameter names. id not recordId or objectIdentifier. limit not maxResults or numberOfItems. But more importantly, describe what each parameter does.

Filter responses to reduce tokens - Return only essential fields by default. If someone needs more data, they can request it explicitly through parameters. This dramatically reduces token costs without sacrificing functionality.

Add examples in descriptions - The model learns fast from examples. Show it what a typical call looks like, what the response format is, and common use cases.

Code Execution Isn't Bad, It's Just Not The Solution

Code execution has real benefits. Composition without context bloat. Loops and conditionals. State persistence. Those are useful.

But it doesn't solve the discovery problem.

Whether you're reading tool definitions or TypeScript files, you still need to understand what's available and how it works. Well-designed tools work great with direct calls. Poorly-designed tools will be annoying in any execution mode.

Start With Design

The MCP ecosystem is still early. We have thousands of servers but many are one-off experiments with inconsistent design.

Now's the time to establish patterns.

Look at what works—CLIs like GitHub and AWS that AI handles well, APIs that models understand easily. Extract the principles. Apply them to your MCP tools.

But remember: if a CLI tool like gh or aws is already in the training data and works well with AI, use it directly. Don't wrap it in an MCP server just because you can. The CLI is already optimized for AI use.

Make your tools so clear that a model can scan the definition once and use it correctly. Use descriptive names like stripe_list_balance_transactions that indicate operation and object. Write rich descriptions on tools and parameters that teach the AI how to use them. Filter responses to reduce token costs. That's the real optimization.

Everything else is just shuffling tokens around.