Skip to main content

Weaviate MCP Tool Design: Comprehensive Tool & Parameter Descriptions

· 12 min read
MCPBundles

Design Philosophy

We designed the Weaviate MCP tools following a modern, AI-first approach that prioritizes:

  1. Rich, Actionable Descriptions - Every tool and parameter has extensive descriptions that tell AI agents not just what something is, but when to use it, why it matters, and how it impacts the operation
  2. OpenAI MCP Connector Compatibility - search and fetch tools follow OpenAI's ChatGPT Connectors and Deep Research specification
  3. Smart ID Routing - Universal fetch tool uses colon-separated IDs (weaviate:object:uuid, weaviate:schema:CollectionName) to encode multiple resource types in a single endpoint
  4. Consolidated CRUD Operations - Single weaviate_upsert tool handles insert/update and single/batch operations intelligently
  5. Granular List Tools - Object-specific parameters for fine-grained control over discovery and browsing
  6. Maximum Tool Count = 10 - Design constraint to ensure focus and avoid overwhelming AI agents with too many options

Final 6-Tool Design

Tool Description:

"Search the Weaviate vector store using hybrid semantic + keyword search. Returns ranked results with IDs, titles, and relevance scores. Automatically falls back to BM25 keyword search if no vectorizer is configured."

Parameters:

  • query (string, required)

    "Natural language search query OR structured query string. Performs semantic vector search combined with keyword matching (hybrid search). Examples: 'product documentation about APIs', 'customer feedback from Q4 2024', 'technical specifications for authentication'. You can also use structured syntax like 'type:feedback after:2025-01 sentiment:negative' for more precise queries."

  • collection (string, optional)

    "Weaviate collection name to search within. If not provided, uses the default collection configured in your provider settings (typically 'DefaultCollection'). Use weaviate_list_collections to discover available collections and their schemas."

  • limit (integer, optional, default: 10)

    "Maximum number of search results to return. Range: 1-100. Higher limits may impact performance but provide more comprehensive results. Default is 10, which is optimal for most use cases."

Design Rationale:

  • OpenAI MCP compatible output format with id, title, url, score
  • Automatic fallback from hybrid → BM25 if no vectorizer configured
  • Structured query syntax hints for advanced use cases
  • Returns smart IDs compatible with the fetch tool

2. fetch - Universal Getter with Smart ID Routing

Tool Description:

"Universal fetch tool that retrieves any object, schema, or metadata using smart ID routing with colon-separated format. Supports direct object retrieval, schema inspection, and collection metadata."

Parameters:

  • id (string, required)

    "Smart identifier with colon-separated routing. Supported formats: 'weaviate:object:uuid' (fetch from default collection), 'weaviate:object:CollectionName:uuid' (fetch from specific collection), 'weaviate:schema:CollectionName' (get collection schema), 'weaviate:collections:list' (list all collections). Example: 'weaviate:object:Products:b05fedb5-df83-46c0-bf7e-2b8f34abeab5' fetches a product document. Plain UUIDs are supported for backward compatibility (uses default collection)."

Design Rationale:

  • Smart ID Routing Pattern - Single id parameter encodes multiple resource types
  • Eliminates Need for Multiple Get Tools - One tool replaces get_object, get_schema, list_collections
  • Format Examples in Description - Shows concrete usage patterns like weaviate:object:Products:uuid
  • Backward Compatibility - Plain UUIDs still work for simplicity
  • OpenAI MCP Compatible Output - Returns content, title, text, url structure

3. weaviate_list_collections - Collection Discovery

Tool Description:

"List all Weaviate collections with rich filtering options. Supports pattern matching, schema inclusion, object counts, and pagination. Use this for collection discovery and understanding your data structure."

Parameters:

  • pattern (string, optional)

    "Name pattern for filtering collections. Supports wildcards: '' matches any characters. Examples: 'Product' (all starting with Product), 'Test' (containing Test), 'Users' (exact match). If not provided, returns all collections."

  • with_schema (boolean, optional, default: false)

    "Include full schema information for each collection (property definitions, data types, vectorizer configuration). Default: false. Enable this to get detailed metadata about collection structure, useful when planning inserts or understanding data models."

  • with_object_count (boolean, optional, default: false)

    "Include the total count of objects in each collection. Default: false. Enabling this adds a query per collection, so use judiciously for large installations. Useful for understanding data distribution and collection sizes."

  • limit (integer, optional)

    "Maximum number of collections to return. Useful when you have many collections and want to paginate results. Default: no limit (returns all matching collections)."

  • offset (integer, optional, default: 0)

    "Number of collections to skip before starting to return results. Used for pagination in combination with limit. Default: 0 (start from beginning)."

Design Rationale:

  • Rich, Optional Metadata - with_schema and with_object_count provide detailed info on-demand
  • Wildcard Pattern Matching - Concrete examples (Product*, *Test*) show usage
  • Performance Guidance - Warns about with_object_count adding queries per collection
  • Pagination Support - limit + offset for large installations

4. weaviate_list_objects - Object Browsing with Granular Control

Tool Description:

"Browse objects in a collection with rich filtering, sorting, pagination, and property selection. The workhorse tool for exploring data with granular control over what's returned. Supports search filtering, structured WHERE conditions, and field selection."

Parameters:

  • collection (string, required)

    "Weaviate collection name to browse. Required. Use weaviate_list_collections to discover available collections. Each collection has its own schema defining available properties and data types."

  • search (string, optional)

    "Optional search string to filter results within the collection. Performs a mini-search to narrow down the list. Example: 'Europe' to find objects with European references. Leave empty to list all objects."

  • limit (integer, optional, default: 10)

    "Number of objects to return per page. Range: 1-100. Default: 10. Higher limits return more data but may impact performance. Use in combination with offset for pagination through large result sets."

  • offset (integer, optional, default: 0)

    "Number of objects to skip before starting results. Used for pagination. Default: 0 (start from beginning). Example: offset=10 with limit=10 returns the second page of results."

  • properties (array of strings, optional)

    "Array of property names to include in results. If not provided, returns all properties. Use this to reduce response size when you only need specific fields. Example: ['city', 'country', 'population'] returns only those three properties. Use weaviate_get_schema to see available properties."

  • include_vector (boolean, optional, default: false)

    "Whether to include vector embeddings in the response. Default: false. Vector embeddings are large arrays of floats used for semantic search. Only enable if you specifically need the raw vectors (e.g., for external processing or analysis)."

  • sort_by (string, optional)

    "Property name to sort results by. Must be a valid property in the collection schema. Example: 'created_at', 'name', 'score'. Sorting is applied before pagination, ensuring consistent ordering across pages."

  • sort_order (string, optional, enum: ["asc", "desc"], default: "asc")

    "Sort direction. 'asc' = ascending (A-Z, 0-9, oldest first), 'desc' = descending (Z-A, 9-0, newest first). Default: 'asc'. Only used when sort_by is specified."

  • where (object, optional)

    "Filter condition to match specific property values. Format: {'property': 'field_name', 'operator': 'Equal'|'NotEqual'|'GreaterThan'|'LessThan', 'value': comparison_value}. Example: {'property': 'continent', 'operator': 'Equal', 'value': 'Europe'} returns only European objects. For complex filters, combine with search parameter."

Design Rationale:

  • Most Parameters of Any Tool - 8 parameters for maximum granular control
  • Every Parameter Answers: Why, When, How - Not just "what is sort_order" but "asc = A-Z, 0-9, oldest first"
  • Concrete Examples Throughout - ['city', 'country', 'population'], 'Europe', offset=10 with limit=10
  • Performance Guidance - "large arrays of floats", "may impact performance"
  • Cross-Tool References - "Use weaviate_list_collections to discover", "Use weaviate_get_schema to see"
  • Mini-Search Within List - search parameter for filtering within browse operation

5. weaviate_upsert - Unified Insert/Update

Tool Description:

"Insert or update objects in Weaviate. Data is ALWAYS an array - use [obj] for single items, [obj1, obj2, ...] for batch. Provide ids array for updates, omit for inserts. Automatically optimizes batch operations."

Parameters:

  • data (array of objects, required)

    "Array of objects to insert or update. ALWAYS provide as an array, even for a single item: [{'city': 'Paris'}] for one object, or [{'city': 'Tokyo'}, {'city': 'Rome'}] for multiple. Properties must match the collection schema. Use fetch('weaviate:schema:CollectionName') to see required properties and data types."

  • collection (string, optional)

    "Target Weaviate collection for the operation. If not provided, uses default collection from provider settings. The collection must already exist with a defined schema. Use weaviate_list_collections to see available collections."

  • ids (array of strings, optional)

    "Array of UUIDs for update mode. If provided, updates existing objects instead of inserting new ones. Must match the length of the data array - each UUID corresponds to an object in data. For inserts, omit this parameter to auto-generate UUIDs. Example: ['uuid-1', 'uuid-2'] for updating two objects."

Design Rationale:

  • Consolidated CRUD - Single tool replaces insert_one, batch_insert, update_object
  • ALWAYS Array - Emphasized in BOTH tool description and parameter description
  • Mode Detection - Presence of ids → update mode; absence → insert mode
  • Length Validation - "Must match the length of the data array"
  • Concrete Format Examples - [{'city': 'Paris'}] vs [{'city': 'Tokyo'}, {'city': 'Rome'}]
  • Smart ID Tool Reference - "Use fetch('weaviate:schema:CollectionName')" shows how to get schema

6. weaviate_delete - Delete Operations

Tool Description:

"Delete object(s) or entire collection from Weaviate. Supports three modes: single object deletion, bulk deletion with failure tracking, and collection removal. All deletions are permanent and cannot be undone."

Parameters:

  • type (string, required, enum: ["object", "objects", "collection"])

    "Deletion mode. 'object' = delete one UUID (requires id parameter), 'objects' = delete multiple UUIDs in bulk (requires ids parameter), 'collection' = remove entire collection and all its data (PERMANENT, requires collection parameter). CAUTION: Collection deletion cannot be undone and removes all objects, schema, and configuration."

  • id (string, optional)

    "UUID of single object to delete. Required when type='object'. The object will be permanently removed from the collection. Format: standard UUID like 'b05fedb5-df83-46c0-bf7e-2b8f34abeab5'. Get UUIDs from search or list results."

  • ids (array of strings, optional)

    "Array of UUIDs for bulk deletion. Required when type='objects'. Example: ['uuid-1', 'uuid-2', 'uuid-3']. All specified objects will be deleted. Failed deletions are reported separately in the response with error details for debugging."

  • collection (string, optional)

    "Collection to delete from (for object/objects mode) or collection to completely remove (for collection mode). If not provided for object mode, uses default collection from provider settings. REQUIRED for collection deletion mode. Use with extreme caution for collection deletion."

Design Rationale:

  • Three Modes in One Tool - Single object, bulk, or collection deletion
  • LOUD WARNINGS for Destructive Operations - "PERMANENT", "CAUTION", "cannot be undone"
  • Conditional Requirements - "Required when type='object'", "REQUIRED for collection deletion mode"
  • Failure Tracking - "Failed deletions are reported separately"
  • UUID Format Example - Shows exact format b05fedb5-df83-46c0-bf7e-2b8f34abeab5

Key Design Patterns in Parameter Descriptions

1. Always Explain the Impact

❌ Bad: "Maximum number of results" ✅ Good: "Maximum number of results to return. Range: 1-100. Higher limits may impact performance but provide more comprehensive results."

2. Provide Concrete Examples

❌ Bad: "Filter pattern" ✅ Good: "Name pattern for filtering collections. Examples: 'Product*' (all starting with Product), 'Test' (containing Test)"

3. Explain Trade-offs

❌ Bad: "Include vector embeddings" ✅ Good: "Vector embeddings are large arrays of floats used for semantic search. Only enable if you specifically need the raw vectors"

4. Show Format with Examples

❌ Bad: "WHERE filter object" ✅ Good: "Format: {'property': 'field_name', 'operator': 'Equal', 'value': comparison_value}. Example: {'property': 'continent', 'operator': 'Equal', 'value': 'Europe'}"

❌ Bad: "Collection name" ✅ Good: "Collection name to browse. Use weaviate_list_collections to discover available collections."

6. Warn About Performance

❌ Bad: "Include object counts" ✅ Good: "Include object count. Enabling this adds a query per collection, so use judiciously for large installations."

7. Always vs Sometimes

❌ Bad: "Data to insert" ✅ Good: "Data to insert or update. ALWAYS provide as an array, even for a single item: [{'city': 'Paris'}]"

8. Explain Enum Values in Plain Language

❌ Bad: "Sort order: asc or desc" ✅ Good: "Sort direction. 'asc' = ascending (A-Z, 0-9, oldest first), 'desc' = descending (Z-A, 9-0, newest first)"


Why This Matters

For AI Agents

  • Reduces Guessing - AI knows exactly what each parameter does and when to use it
  • Provides Context - Not just "what" but "why" and "when"
  • Shows Examples - Concrete usage patterns are easier to follow than abstract descriptions
  • Prevents Mistakes - Performance warnings and format examples reduce errors

For Humans Reading Logs

  • Self-Documenting - Tool calls are understandable without referring to separate docs
  • Debugging is Easier - Clear parameter meanings make it obvious what went wrong
  • API Discovery - Users can explore the API through parameter descriptions alone

For Multi-Provider MCP Systems

  • Consistent Patterns - Smart ID routing (provider:type:identifier) works across providers
  • Universal Tools - search and fetch are standard across all providers
  • Provider-Specific Tools - weaviate_* tools handle Weaviate-specific operations

Implementation Notes

JSON Schema Location

All parameter descriptions live in input_schema dictionaries within each tool class:

input_schema = {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Natural language search query OR structured query string..."
}
}
}

Auto-Generation from Schemas

The catalog system automatically:

  1. Extracts input_schema from tool classes
  2. Includes parameter descriptions in catalog JSON
  3. Exposes descriptions via GraphQL API
  4. Displays them in the frontend UI

Testing

Test script validates:

  • All required parameters are provided
  • Optional parameters work with defaults
  • Examples from descriptions actually work
  • Error cases match description warnings

Evolution from 10 Tools → 6 Tools

Initial Design (10 tools)

  • search, fetch, weaviate_insert_one, weaviate_batch_insert, weaviate_update_object, weaviate_list_collections, weaviate_list_objects, weaviate_get_schema, weaviate_delete, weaviate_filter

Problems

  • Too many similar tools (insert_one vs batch_insert vs update)
  • filter was redundant with list_objects WHERE parameter
  • get_schema should be part of universal fetch
  • No clear pattern for when to use which tool

Final Design (6 tools)

  • Consolidated: weaviate_upsert handles all insert/update operations
  • Universal: fetch with smart ID routing handles object, schema, collections
  • Rich Parameters: weaviate_list_objects has 8 parameters for granular control
  • Removed: filter (merged into list_objects), get_schema (merged into fetch)

Result

  • 40% fewer tools
  • Zero functionality loss
  • Clearer mental model: 2 universal tools (search, fetch) + 4 domain tools
  • Richer parameter descriptions compensate for tool consolidation

Conclusion

The Weaviate MCP tool design demonstrates that rich, actionable descriptions are as important as the tools themselves. By investing heavily in parameter descriptions that explain not just what but why, when, how, and what if, we created a tool set that:

  1. Guides AI agents toward correct usage patterns
  2. Reduces errors through concrete examples and warnings
  3. Self-documents through comprehensive inline descriptions
  4. Scales to complex use cases while remaining approachable
  5. Follows standards (OpenAI MCP) while adding provider-specific power

This pattern is now our blueprint for designing all MCP tools across all providers.