Internet Archive Wayback Machine MCP Server & Skill

AI Skill
SKILL.md

Domain knowledge for Internet Archive Wayback Machine — workflow patterns, data models, and gotchas for your AI agent.

Internet Archive Wayback Machine

The Wayback Machine archives the web. Capabilities span availability probes (single and batch), CDX index exploration, timelines, raw HTML retrieval, and on-demand captures. No auth required.

Workflows

Snapshot presence — Ask whether a URL has any capture, returning the nearest timestamp plus the canonical archive link. Treat empty availability answers skeptically for large sites; CDX queries are the authoritative existence signal.
Batch presence — Send many {url, tag?, timestamp?, closest?} tuples in one round trip and read per-URL outcomes grouped by optional tags.
CDX discovery — Mine the index for metadata rows (urlkey, timestamp, mime type, HTTP status, digest, byte length). Tune match scope (exact URL, prefix, host, domain), YYYYMMDD windows, field projections, filters like statuscode:200, and JSON/CSV/text encodings.
Timelines — Collapse up to ~100 chronological captures for a URL, each with status, mime, digest, and replay link; backed by the same CDX machinery for stability.
Content replay — Pull stored markup for a capture. Supply YYYYMMDDHHMMSS for a precise revision or omit to follow the closest mirror. raw strips the Wayback chrome; view keeps the standard wrapper.
Capture requests — Queue new crawls asynchronously (may bounce to an existing mirror). After submitting, re-run CDX/timeline reads to confirm the new digest landed.

Timestamp Formats

Availability API: ISO 8601 (YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DDTHH:mm:SSz)
CDX search: YYYYMMDD for date range filters (from_date, to_date)
Content retrieval: YYYYMMDDHHMMSS for specific snapshots
CDX results: Timestamps returned as YYYYMMDDHHMMSS strings

Gotchas

Rate limits: Be respectful; avoid bulk scraping. Use batch availability for multiple URLs. Keep CDX search limits reasonable (default 100).
Snapshot URL format: https://web.archive.org/web/YYYYMMDDHHMMSS/URL. Append id_/ before URL for raw content (no Wayback toolbar).
CDX API is slow: Broad searches (prefix/domain match, high limits) can take 30-45 seconds. Narrow with date ranges or lower limits.
String parameters: Timestamps and dates must be strings, not integers. Quote them when calling from CLI.
Flaky availability: The availability endpoint returns empty for some well-known URLs. CDX search is more reliable for confirming archive existence.

Tools in this Server (6)

Get Wayback Available

Check if a URL has been archived by the Wayback Machine. Returns the closest available snapshot with its timestamp and archive URL. Pass a timestamp t...

Post Wayback Available

Check availability of multiple URLs in a single batch request. Pass an array of {url, tag?, timestamp?, closest?} objects. Each result is tagged for e...

Wayback Cdx Search

Search the Internet Archive's Wayback Machine CDX index for detailed archive metadata. The CDX index contains detailed information about every archiv...

Wayback Get Content

Retrieve the actual archived HTML content of a web page from the Wayback Machine. Returns the archived version of a web page's content, either as pro...

Wayback Save Url

Request the Internet Archive to archive a URL for future preservation and access. Submits a URL to the Wayback Machine for archiving, ensuring import...

Wayback Timemap

Retrieve a complete timeline (timemap) of all archived snapshots for a URL. Returns all available archived versions of a web page with their timestam...

Frequently Asked Questions

What is the Internet Archive Wayback Machine MCP server?

Internet Archive's Wayback Machine is a digital archive of the World Wide Web. Access historical snapshots of websites, retrieve archived pages, check URL availability across time, and explore billions of saved web pages from the past. It provides 6 tools that AI agents can use through the Model Context Protocol (MCP).

How do I connect Internet Archive Wayback Machine to my AI agent?

Add the MCPBundles server URL to your MCP client configuration (Claude Desktop, Cursor, VS Code, etc.). The URL format is: https://mcp.mcpbundles.com/bundle/archive-org-wayback. Authentication is handled automatically.

How many tools does Internet Archive Wayback Machine provide?

Internet Archive Wayback Machine provides 6 tools that can be called by AI agents, along with a SKILL.md that gives your AI agent domain knowledge about when and how to use them.

What authentication does Internet Archive Wayback Machine require?

Internet Archive Wayback Machine uses open data APIs — no authentication required.

Internet Archive Wayback Machine MCP Server

AI Skill
SKILL.md

Internet Archive Wayback Machine

Workflows

Timestamp Formats

Gotchas

Tools in this Server (6)

Frequently Asked Questions

Setup Instructions

What is MCP?

Use this bundle in 3 steps

Claude Desktop Users

Pick your tool tab for exact steps

Related MCP Servers

Ready to chat with Internet Archive Wayback Machine?

Internet Archive Wayback Machine MCP Server

AI SkillSKILL.md

Internet Archive Wayback Machine

Workflows

Timestamp Formats

Gotchas

Tools in this Server (6)

Frequently Asked Questions

Setup Instructions

What is MCP?

Use this bundle in 3 steps

Claude Desktop Users

Pick your tool tab for exact steps

Related MCP Servers

Ready to chat with Internet Archive Wayback Machine?

AI Skill
SKILL.md