OpenAI's MCP Integration Requirements: Why Search and Fetch Matter

October 9, 2025 · 18 min read

MCPBundles

When OpenAI integrated support for Anthropic's Model Context Protocol (MCP) into ChatGPT's deep research feature, they documented something elegant: a two-tool pattern that gives AI agents a consistent way to engage with any data source. If your MCP server implements search and fetch with their specific signatures, ChatGPT knows exactly how to explore your data without custom integration code.

Both tools accept only a single string parameter. That constraint isn't a limitation—it's what makes the pattern universal.

The Pattern: Two Tools, Universal Interface

For MCP servers to work with ChatGPT's deep research feature, OpenAI requires a specific contract: implement two tools—search and fetch—with exact signatures. Any MCP server that follows this pattern can integrate with ChatGPT without custom code:

`search` - Discovery with a Single String

Takes: One query string

Returns: A JSON-encoded array of result objects, each with:

id - unique identifier you'll use later in fetch
title - human-readable name
url - canonical URL for citations

The tool must return exactly one MCP content item with type: "text" and a JSON-stringified results array:

{
  "content": [
    {
      "type": "text",
      "text": "{\"results\":[{\"id\":\"weaviate:doc:uuid-123\",\"title\":\"Q4 Product Strategy\",\"url\":\"https://internal.acme.com/docs/q4-strategy\"}]}"
    }
  ]
}

Think of search as a constrained list endpoint. Since it only takes a single string, the server has freedom to interpret that string however it wants: semantic search, LLM-powered query parsing, structured filter extraction, full-text search, or a hybrid approach. The AI submits natural language; your server decides how to process it.

`fetch` - Retrieval with a Single String

Takes: One id string (from a previous search result)

Returns: A JSON-encoded document object with:

id - the identifier that was requested
title - document title
text - the full content (this is what the AI needs)
url - canonical URL for citation
metadata - optional key-value pairs about the document

Like search, this must be a single MCP content item with JSON-stringified payload:

{
  "content": [
    {
      "type": "text",
      "text": "{\"id\":\"weaviate:doc:uuid-123\",\"title\":\"Q4 Product Strategy\",\"text\":\"Our Q4 strategy focuses on three pillars...\",\"url\":\"https://internal.acme.com/docs/q4-strategy\",\"metadata\":{\"author\":\"Jane Doe\",\"updated\":\"2025-10-01\"}}"
    }
  ]
}

The single-string constraint means you'll often need to encode multiple pieces of information into the id. For example:

github:repo:acme/backend - repository identifier
confluence:page:123456 - page ID
salesforce:opportunity:006abc - CRM record
weaviate:collection:products:uuid-789 - vector store document with collection context

The fetch tool becomes your standard GET request, but instead of REST path parameters and query strings, you're packing everything into one identifier string that your server knows how to parse and route.

Writing Tool Descriptions That Guide AI

The AI reads two pieces of documentation for each tool: the tool-level description and the parameter-level description. Both matter. Good descriptions reduce errors and help the AI construct valid calls.

Tool-Level Description

Tell the AI what the tool does and when to use it:

@mcp.tool(description="Search documentation, API specs, and internal guides by query - follows OpenAI MCP standard")
async def search(
    query: Annotated[str, Field(description="Natural language search query or structured filters like 'tag:api status:published'")]
) -> dict:
    # Implementation

The tool description explains purpose and scope. The parameter description explains what formats are valid.

Parameter-Level Description

Guide the AI on how to construct the parameter:

@mcp.tool(description="Fetch full document content by ID - follows OpenAI MCP standard")
async def fetch(
    id: Annotated[str, Field(description="Document ID in 'source:type:identifier' format (e.g., 'confluence:page:12345', 'github:issue:repo/123')")]
) -> dict:
    # Implementation

The parameter description shows the ID format with examples. The AI learns the pattern and applies it.

Providing Structure in Descriptions

If your search tool accepts structured queries, document the schema in the description:

@mcp.tool(description="""Search customer records with optional structured filters.
Accepts natural language OR structured syntax:
- Natural: "enterprise customers in California"
- Structured: "tier:enterprise region:us-west status:active"
Supported filters: tier, region, status, created_after, value_gt
""")
async def search(
    query: Annotated[str, Field(description="""Search query as natural language OR 'field:value' pairs separated by spaces.
Supported fields: tier (enterprise|pro|starter), region (us-west|us-east|eu), status (active|archived), created_after (ISO date), value_gt (number)
Example: "tier:enterprise status:active" or "show me enterprise customers in California" """)]
) -> dict:
    # Parse the query string
    if ":" in query and not query.startswith("http"):
        # Structured query
        filters = parse_structured_query(query)
    else:
        # Natural language - use LLM to extract intent
        filters = await parse_with_llm(query)
    
    # Execute search with filters
    return await search_crm(filters)

The AI sees the supported filters and learns to construct structured queries when precision matters.

Alternative: JSON-Encoded Parameters

You can also accept JSON strings in the parameter, which works well when you need complex nested structures:

@mcp.tool(description="Search products with complex filters - follows OpenAI MCP standard")
async def search(
    query: Annotated[str, Field(description="""Search query as natural language OR JSON object.
JSON format: {"category": "electronics", "price": {"min": 100, "max": 500}, "in_stock": true, "tags": ["featured", "sale"]}
Supported JSON fields:
  - category (string): product category
  - price (object): {min: number, max: number}
  - in_stock (boolean): availability filter
  - tags (array): list of tag strings
  - created_after (string): ISO date
Natural language: "featured electronics under $500 in stock"
""")]
) -> dict:
    # Try to parse as JSON first
    try:
        filters = json.loads(query)
    except json.JSONDecodeError:
        # Fall back to natural language parsing
        filters = await parse_with_llm(query)
    
    # Execute search with parsed filters
    results = await database.search(filters)
    return format_search_results(results)

The AI learns to send either JSON like {"category": "electronics", "price": {"min": 100}} or natural language like "show me electronics under $500".

Error Responses That Teach

When the AI sends an invalid request, return errors that help it fix the problem:

@mcp.tool(description="Fetch document by ID - follows OpenAI MCP standard")
async def fetch(id: str) -> dict:
    # Validate ID format
    if ":" not in id:
        return {
            "content": [{
                "type": "text",
                "text": json.dumps({
                    "error": "INVALID_ID_FORMAT",
                    "message": "ID must be in 'source:type:identifier' format",
                    "examples": ["confluence:page:12345", "github:issue:repo/123"],
                    "received": id
                })
            }]
        }
    
    # Parse ID
    parts = id.split(":", 2)
    if len(parts) != 3:
        return {
            "content": [{
                "type": "text",
                "text": json.dumps({
                    "error": "INVALID_ID_STRUCTURE",
                    "message": f"Expected 3 parts (source:type:identifier), got {len(parts)}",
                    "received": id
                })
            }]
        }
    
    source, doc_type, identifier = parts
    
    # Validate source
    if source not in ["confluence", "github", "notion"]:
        return {
            "content": [{
                "type": "text",
                "text": json.dumps({
                    "error": "UNKNOWN_SOURCE",
                    "message": f"Unknown source '{source}'",
                    "supported_sources": ["confluence", "github", "notion"],
                    "hint": f"Try '{id.replace(source, 'confluence')}' if this is a Confluence document"
                })
            }]
        }
    
    # Document not found
    doc = await get_document(source, doc_type, identifier)
    if not doc:
        return {
            "content": [{
                "type": "text",
                "text": json.dumps({
                    "error": "DOCUMENT_NOT_FOUND",
                    "message": f"No document found with ID '{id}'",
                    "hint": "Use the search tool to find valid document IDs"
                })
            }]
        }
    
    # Success
    return {
        "content": [{
            "type": "text",
            "text": json.dumps({
                "id": id,
                "title": doc.title,
                "text": doc.content,
                "url": doc.url,
                "metadata": doc.metadata
            })
        }]
    }

When the AI receives an error with a clear code, message, and hint, it can adjust its next attempt. This creates a learning loop.

Using LLMs Server-Side

For search, you can use an LLM on your server to parse natural language into structured filters:

async def parse_search_with_llm(query: str) -> dict:
    """Use LLM to extract structured filters from natural language."""
    
    prompt = f"""Extract search filters from this query: "{query}"

Available filters:
- entity_type: project, customer, document, ticket
- status: active, archived, closed, open
- created_after: ISO date
- priority: low, medium, high, critical
- tag: any string

Return JSON with extracted filters. If uncertain, return empty dict.

Examples:
"critical bugs from last week" -> {{"entity_type": "ticket", "priority": "critical", "created_after": "2025-10-02"}}
"active enterprise customers" -> {{"entity_type": "customer", "status": "active", "tag": "enterprise"}}
"""
    
    response = await llm.complete(prompt)
    return json.loads(response)

@mcp.tool(description="Search across projects, customers, documents, and tickets - follows OpenAI MCP standard")
async def search(
    query: Annotated[str, Field(description="""Natural language search query. Server will parse into structured filters.
Server extracts: entity_type (project|customer|document|ticket), status (active|archived|closed|open), priority (low|medium|high|critical), created_after (ISO date), tag (string)
Examples: "critical bugs from last week" or "active enterprise customers" """)]
) -> dict:
    # Parse with LLM
    filters = await parse_search_with_llm(query)
    
    # Execute structured search
    results = await database.search(filters)
    
    return {
        "content": [{
            "type": "text",
            "text": json.dumps({
                "results": [
                    {"id": f"{r.type}:{r.id}", "title": r.title, "url": r.url}
                    for r in results
                ],
                "parsed_filters": filters  # Show the AI what filters were used
            })
        }]
    }

Including parsed_filters in the response helps the AI learn what worked. Next time it can construct better queries.

Why Single-String Parameters Are Smart

Accepting only one string parameter might seem limiting at first. Why not structured parameters with separate fields for filters, pagination, or resource types?

The single-string constraint is what makes the pattern work. Here's why:

1. Simplicity for AI Tool Selection

When an AI agent chooses which tool to call, simpler signatures reduce the cognitive load. A tool with one parameter is easier to reason about than a tool with five parameters across multiple types. The agent doesn't have to construct complex argument objects—it just passes a string.

For search, the AI naturally thinks: "I need to find documents about X" → call search with "X"

For fetch, the AI naturally thinks: "I need the full content of document Y" → call fetch with "Y"

No decisions about which fields to include, how to structure filters, or what pagination strategy to use. Just one string.

2. Server-Side Intelligence

The single-string constraint pushes complexity to the server, where it belongs. Your server can:

For search:

Parse natural language queries with an LLM
Extract structured filters from unstructured text
Perform semantic search across embeddings
Apply user-specific access controls
Rank and filter results based on relevance
Handle pagination internally and return top-N results

For fetch:

Parse composite IDs and route to the correct data source
Apply field selection based on ID patterns
Enforce permissions based on user context
Transform data into AI-readable formats
Handle cross-resource relationships

The AI doesn't need to know your internal routing, authentication, or data model. It sends a string; you handle the rest.

3. Future-Proof Extensibility

Because the parameters are strings, you can evolve your parsing logic without breaking the tool interface. Add new query syntax, support new ID formats, introduce new metadata—all without changing the tool signature that AI agents depend on.

For example, your search string parsing might evolve from:

Version 1: Simple keyword search

"customer feedback"

Version 2: Add date filters

"customer feedback after:2025-01-01"

Version 3: Add semantic operators

"customer feedback after:2025-01-01 sentiment:negative"

Version 4: LLM-powered intent extraction

"show me recent negative feedback from enterprise customers"

The AI still sends one string. Your server gets smarter about parsing it. No schema changes required.

What Search and Fetch Provide (And What They Don't)

The search and fetch pattern is deliberately focused. These two tools provide a foundation, not a complete solution. Understanding what they're designed for—and what they're not—helps you design better MCP servers.

Beyond the Standard

Write operations: Search and fetch are read-only. You can't create, update, or delete records.

Complex workflows: Multi-step operations like "create a ticket and assign it to the on-call engineer" require additional tools.

Rich filtering: A single search string can't express complex AND/OR filter combinations as clearly as structured parameters.

Batch operations: Fetching 50 documents one at a time is inefficient compared to a batch GET tool (see our batch consolidation post).

Typed discovery: If your domain has distinct entity types (projects, tickets, customers), you might want type-specific list endpoints with model-specific filters.

The Foundation They Create

search and fetch solve the cold-start problem. When an AI agent encounters your MCP server for the first time with zero domain knowledge:

Discovery: Call search with a broad query to find what exists
Retrieval: Call fetch on interesting results to get full content
Understanding: Read the content and decide next steps

This two-tool pattern is enough to make your data source usable by AI, even if it's not complete. And for OpenAI's deep research mode, it's exactly what's needed: iterative search to discover documents, fetch to read them, synthesize insights.

Think of search and fetch as the minimum viable interface for AI-readable data sources. They're the HTTP GET of the MCP world—basic, universal, and good enough to get started.

Adapting Your MCP Server to the Standard

Different data sources require different approaches to implement search and fetch. Here's how to adapt the pattern to common backend systems:

Vector Store (Semantic Search)

@mcp.tool(description="Search documentation by semantic similarity - follows OpenAI MCP standard")
async def search(
    query: Annotated[str, Field(description="Search query (natural language)")]
) -> dict:
    # Embed the query
    embedding = await embed_text(query)
    
    # Semantic search in Weaviate/Pinecone/Qdrant
    results = await vector_db.search(
        collection="documents",
        vector=embedding,
        limit=10
    )
    
    # Return with standard format
    return format_search_results([
        {"id": f"vector:{r.id}", "title": r.metadata["title"], "url": r.metadata["url"]}
        for r in results
    ])

@mcp.tool(description="Fetch full document content - follows OpenAI MCP standard")
async def fetch(
    id: Annotated[str, Field(description="Document ID from search results (format: 'vector:uuid')")]
) -> dict:
    # Parse and validate
    if not id.startswith("vector:"):
        return format_error("INVALID_ID", "ID must start with 'vector:'", id)
    
    _, doc_id = id.split(":", 1)
    doc = await vector_db.get(doc_id)
    
    if not doc:
        return format_error("NOT_FOUND", f"Document {doc_id} not found", id)
    
    # Return full content
    return format_document(
        id=id,
        title=doc.metadata["title"],
        text=doc.content,
        url=doc.metadata["url"],
        metadata={"created": doc.metadata["created"], "author": doc.metadata["author"]}
    )

Use helper functions to keep the tool code clean and ensure consistent response formats.

CRM System (Structured Data with LLM Parsing)

@mcp.tool(description="""Search CRM opportunities, contacts, and accounts - follows OpenAI MCP standard
Accepts natural language or structured syntax:
- Natural: "high-value opportunities closing this quarter"
- Structured: "type:opportunity stage:closing value:>100000"
""")
async def search(
    query: Annotated[str, Field(description="""Search query as natural language OR 'field:value' pairs.
Structured fields: type (opportunity|contact|account), stage (prospecting|qualification|closing|closed), value (>N, <N, N), owner (name or id), created_after/before (ISO date)
Examples: "type:opportunity stage:closing value:>100000" or "high-value opportunities closing this quarter" """)]
) -> dict:
    # Use LLM to parse natural language into structured filters
    filters = await parse_query_with_llm(query)
    # Example: "high-value opportunities closing this quarter"
    # → {object_type: "opportunity", stage: "closing", value_gt: 100000, close_date_q: "Q4"}
    
    # Query Salesforce/HubSpot API with parsed filters
    records = await crm_api.query(
        object_type=filters.get("object_type", "opportunity"),
        filters=filters,
        limit=20
    )
    
    return format_search_results([
        {"id": f"crm:{r.type}:{r.id}", "title": r.name, "url": r.web_url}
        for r in records
    ])

@mcp.tool(description="Fetch full CRM record details - follows OpenAI MCP standard")
async def fetch(
    id: Annotated[str, Field(description="Record ID from search (format: 'crm:type:id' like 'crm:opportunity:006abc')")]
) -> dict:
    # Parse: "crm:opportunity:006abc123"
    parts = id.split(":", 2)
    if len(parts) != 3 or parts[0] != "crm":
        return format_error("INVALID_ID", "Expected format: 'crm:type:id'", id)
    
    _, object_type, record_id = parts
    record = await crm_api.get(object_type, record_id)
    
    if not record:
        return format_error("NOT_FOUND", f"No {object_type} found with ID {record_id}", id)
    
    # Format for AI - include relevant fields as readable text
    text = format_crm_record_as_text(record)
    
    return format_document(
        id=id,
        title=record.name,
        text=text,
        url=record.web_url,
        metadata={"stage": record.stage, "value": record.amount, "owner": record.owner_name}
    )

The LLM parsing step converts natural language into API filters, making search intuitive for users.

Knowledge Base (Hybrid Search with Reranking)

@mcp.tool(description="Search internal documentation, guides, and wiki pages - follows OpenAI MCP standard")
async def search(
    query: Annotated[str, Field(description="Search query (keywords or natural language)")]
) -> dict:
    # Combine full-text and semantic search
    fts_results = await elasticsearch.search(query, size=50)
    
    embedding = await embed_text(query)
    vector_results = await vector_db.search(embedding, limit=50)
    
    # Merge and rerank with cross-encoder for best relevance
    combined = merge_results(fts_results, vector_results)
    reranked = await rerank_with_cross_encoder(query, combined, top_k=10)
    
    return format_search_results([
        {"id": f"docs:{r.doc_id}", "title": r.title, "url": r.public_url}
        for r in reranked
    ])

Hybrid search combines keyword matching with semantic understanding, then reranks for optimal relevance.

ID Design for Multi-Source Systems

Your fetch IDs need to encode routing information. Design a consistent format that's easy to parse and extend:

Hierarchical with delimiters (most common):

# Format: "source:type:identifier"
"weaviate:document:uuid-123"
"github:issue:acme/backend/456"
"salesforce:contact:003abc789"

# Parse in fetch:
source, resource_type, identifier = id.split(":", 2)

With collection or namespace:

# Format: "source:collection:type:id"
"weaviate:products:document:uuid-123"
"confluence:engineering:page:98765"

# Parse:
source, collection, resource_type, identifier = id.split(":", 3)

Choose a format that's easy to parse, human-readable for debugging, and flexible enough to add new sources later.

Adding Value Beyond Search and Fetch

Once you've implemented the standard search and fetch tools, you can layer on additional capabilities:

Write Operations

@mcp.tool(description="Create a new ticket")
async def create_ticket(
    title: str,
    description: str,
    priority: str = "medium"
) -> dict:
    # ...

Batch Operations

@mcp.tool(description="Fetch multiple documents by IDs")
async def batch_fetch(ids: List[str]) -> dict:
    # See our batch consolidation post
    # ...

Domain Actions

@mcp.tool(description="Assign opportunity to sales rep")
async def assign_opportunity(
    opportunity_id: str,
    assignee: str
) -> dict:
    # Business logic here
    # ...

But start with search and fetch. They're the foundation that makes your server discoverable and usable by any AI that follows the pattern.

Why This Creates an Ecosystem

Anthropic created MCP as an open protocol in November 2024, and OpenAI adopted it in March 2025. By documenting specific search and fetch requirements for ChatGPT integration, OpenAI is establishing a pattern that other AI systems can follow. When multiple AI platforms adopt compatible interfaces:

For server builders:

Implement once, work with multiple AI platforms
Clear contract reduces guesswork
Standard patterns emerge for common problems

For AI systems:

No custom integration per data source
Can reason about "search" and "fetch" generically
Consistent UX across different backend systems

For end users:

Connect new data sources with predictable behavior
AI agents work across platforms
Less configuration, more value

This is similar to how HTTP standardized web interactions. You don't need custom protocols for every website because they all speak HTTP. With search and fetch, you don't need custom integrations for every data source because they all speak the same MCP interface.

Implementation Checklist

Building an MCP server that meets OpenAI's requirements for ChatGPT integration:

1. Implement search tool:

✅ Accepts single query string parameter
✅ Returns JSON-stringified results array with id, title, url
✅ Wraps in MCP content item with type: "text"
✅ Handles natural language queries
✅ Returns relevant, ranked results
✅ Applies user-specific access controls

2. Implement fetch tool:

✅ Accepts single id string parameter
✅ Returns JSON-stringified document with id, title, text, url, metadata
✅ Wraps in MCP content item with type: "text"
✅ Parses composite IDs and routes correctly
✅ Returns full document content in text field
✅ Handles missing/invalid IDs gracefully

3. Design your ID format:

✅ Encodes enough information for routing
✅ Easy to parse and extend
✅ Consistent across resource types
✅ Human-readable for debugging

4. Test with ChatGPT:

✅ Configure MCP URL in ChatGPT connector settings
✅ Enable deep research mode
✅ Ask questions that require iterative search
✅ Verify citations link to correct URLs
✅ Check that fetched content is complete

Where This Pattern Goes Next

Anthropic created MCP and uses it with Claude. OpenAI documented specific implementation requirements for ChatGPT integration. As MCP adoption grows, other platforms are likely to adopt similar patterns:

Anthropic (Claude) - Already supports MCP; may document similar search/fetch requirements
Google - Could adopt search/fetch patterns for Gemini integrations
Microsoft - May adopt the pattern for Copilot connectors
Open-source agents - Likely to expect search/fetch as baseline MCP interface

The two-tool standard creates a lowest common denominator that works across platforms. Your MCP server can expose dozens of specialized tools, but if it implements search and fetch with these signatures, it works with any AI that follows the pattern.

OpenAI documented what works for their deep research integration. Other platforms are likely to adopt compatible patterns. Design your MCP server with search and fetch, and you're building for an ecosystem, not just one AI platform.

Key Takeaways

MCP is Anthropic's protocol: Anthropic created MCP in November 2024; OpenAI adopted it in March 2025 and documented specific integration requirements
OpenAI's pattern is simple: Two tools (search and fetch) with single-string parameters create a universal interface for ChatGPT integration
Single strings push complexity server-side: Where it belongs—your server parses queries, routes IDs, and handles the details
Foundation, not complete solution: These tools make your data source discoverable; add domain-specific tools for workflows
Implementation varies by backend: Vector stores use semantic search, CRMs parse with LLMs, knowledge bases blend approaches
ID design enables routing: Pack provider, resource type, and identifier into one parseable string format
Design for the ecosystem: Implement the pattern, and your server works with ChatGPT and potentially other platforms as they adopt similar requirements

OpenAI documented their integration requirements for Anthropic's MCP protocol. Other platforms are likely to follow with compatible patterns. Build search and fetch into your MCP server now, and you're ready for the ecosystem that's forming.

Want to see this in action? Check out our ChatGPT integration guide, look at the field descriptions or read OpenAI's MCP integration guide and the MCP tool guide.

The Pattern: Two Tools, Universal Interface​

search - Discovery with a Single String​

fetch - Retrieval with a Single String​

Writing Tool Descriptions That Guide AI​

Tool-Level Description​

Parameter-Level Description​

Providing Structure in Descriptions​

Alternative: JSON-Encoded Parameters​

Error Responses That Teach​

Using LLMs Server-Side​

Why Single-String Parameters Are Smart​

1. Simplicity for AI Tool Selection​

2. Server-Side Intelligence​

3. Future-Proof Extensibility​

What Search and Fetch Provide (And What They Don't)​

Beyond the Standard​

The Foundation They Create​

Adapting Your MCP Server to the Standard​

Vector Store (Semantic Search)​

CRM System (Structured Data with LLM Parsing)​

Knowledge Base (Hybrid Search with Reranking)​

ID Design for Multi-Source Systems​

Adding Value Beyond Search and Fetch​

Write Operations​

Batch Operations​

Domain Actions​

Why This Creates an Ecosystem​

Implementation Checklist​

Where This Pattern Goes Next​

Key Takeaways​