What you can do with Internet Archive Wayback Machine

Built for

Researchers, journalists, legal teams, SEO analysts, trust-and-safety teams, security researchers, archivists, and operators investigating changed or deleted web pages

Example workflows

Find archived snapshots

Turns a URL into a concrete historical record with replay links.

Try this

Check whether this URL has Wayback Machine captures, then return the closest snapshot date, archive link, HTTP status, and content type.

Compare page history

Uses CDX metadata to understand how a page changed over time.

Try this

Search the Wayback CDX index for this page across 2024 and summarize the major capture dates, status changes, content types, and duplicate digests.

Retrieve archived HTML

Moves from discovery to content replay for historical analysis.

Try this

Find a reliable archived snapshot of this page from early 2023, retrieve the raw HTML, and summarize the visible page title and key text.

Preserve a current URL

Queues time-sensitive pages for preservation and closes the loop with verification.

Try this

Request a new Wayback Machine capture for this URL, then explain how to confirm whether the new snapshot appears in the archive.

Context to know first

What can agents do with the Wayback Machine?

Agents can check whether pages are archived, batch-check URLs, search CDX metadata, build capture timelines, retrieve archived HTML, and request new captures.

Does the Wayback Machine connection need an API key?

No. The public Wayback Machine workflows available here do not require a third-party API key, OAuth login, or paid Archive.org account.

When should I use CDX search instead of availability?

Use CDX search when availability returns an empty result, when you need many captures, or when you need metadata such as status codes, MIME types, digests, and timestamp ranges.

AI Skill
SKILL.md

Domain knowledge for Internet Archive Wayback Machine — workflow patterns, data models, and gotchas for your AI agent.

Internet Archive Wayback Machine

The Wayback Machine archives the web. Capabilities span availability probes (single and batch), CDX index exploration, timelines, raw HTML retrieval, and on-demand captures. No auth required.

Workflows

  • Snapshot presence — Ask whether a URL has any capture, returning the nearest timestamp plus the canonical archive link. Treat empty availability answers skeptically for large sites; CDX queries are the authoritative existence signal.
  • Batch presence — Send many {url, tag?, timestamp?, closest?} tuples in one round trip and read per-URL outcomes grouped by optional tags.
  • CDX discovery — Mine the index for metadata rows (urlkey, timestamp, mime type, HTTP status, digest, byte length). Tune match scope (exact URL, prefix, host, domain), YYYYMMDD windows, field projections, filters like statuscode:200, and JSON/CSV/text encodings.
  • Timelines — Collapse up to ~100 chronological captures for a URL, each with status, mime, digest, and replay link; backed by the same CDX machinery for stability.
  • Content replay — Pull stored markup for a capture. Supply YYYYMMDDHHMMSS for a precise revision or omit to follow the closest mirror. raw strips the Wayback chrome; view keeps the standard wrapper.
  • Capture requests — Queue new crawls asynchronously (may bounce to an existing mirror). After submitting, re-run CDX/timeline reads to confirm the new digest landed.

Timestamp Formats

  • Availability API: ISO 8601 (YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DDTHH:mm:SSz)
  • CDX search: YYYYMMDD for date range filters (from_date, to_date)
  • Content retrieval: YYYYMMDDHHMMSS for specific snapshots
  • CDX results: Timestamps returned as YYYYMMDDHHMMSS strings

Gotchas

  • Rate limits: Be respectful; avoid bulk scraping. Use batch availability for multiple URLs. Keep CDX search limits reasonable (default 100).
  • Snapshot URL format: https://web.archive.org/web/YYYYMMDDHHMMSS/URL. Append id_/ before URL for raw content (no Wayback toolbar).
  • CDX API is slow: Broad searches (prefix/domain match, high limits) can take 30-45 seconds. Narrow with date ranges or lower limits.
  • String parameters: Timestamps and dates must be strings, not integers. Quote them when calling from CLI.
  • Flaky availability: The availability endpoint returns empty for some well-known URLs. CDX search is more reliable for confirming archive existence.

Tools in this Server (6)

Get Wayback Available

Check if a URL has been archived by the Wayback Machine. Returns the closest available snapshot with its timestamp and archive URL. Pass a timestamp t...

Post Wayback Available

Check availability of multiple URLs in a single batch request. Pass an array of {url, tag?, timestamp?, closest?} objects. Each result is tagged for e...

Wayback Cdx Search

Search the Internet Archive's Wayback Machine CDX index for detailed archive metadata. The CDX index contains detailed information about every archiv...

Wayback Get Content

Retrieve the actual archived HTML content of a web page from the Wayback Machine. Returns the archived version of a web page's content, either as pro...

Wayback Save Url

Request the Internet Archive to archive a URL for future preservation and access. Submits a URL to the Wayback Machine for archiving, ensuring import...

Wayback Timemap

Retrieve a complete timeline (timemap) of all archived snapshots for a URL. Returns all available archived versions of a web page with their timestam...

Frequently Asked Questions

What is the Internet Archive Wayback Machine MCP server?

Internet Archive's Wayback Machine is a digital archive of the World Wide Web. Access historical snapshots of websites, retrieve archived pages, check URL availability across time, and explore billions of saved web pages from the past. It provides 6 tools that AI agents can use through the Model Context Protocol (MCP).

How do I connect Internet Archive Wayback Machine to my AI agent?

Add the MCPBundles server URL to your MCP client configuration (Claude Desktop, Cursor, VS Code, etc.). The URL format is: https://mcp.mcpbundles.com/bundle/archive-org-wayback. Authentication is handled automatically.

How many tools does Internet Archive Wayback Machine provide?

Internet Archive Wayback Machine provides 6 tools that can be called by AI agents, along with a SKILL.md that gives your AI agent domain knowledge about when and how to use them.

What authentication does Internet Archive Wayback Machine require?

Internet Archive Wayback Machine uses open data APIs — no authentication required.

What can agents do with the Wayback Machine?

Agents can check whether pages are archived, batch-check URLs, search CDX metadata, build capture timelines, retrieve archived HTML, and request new captures.

Does the Wayback Machine connection need an API key?

No. The public Wayback Machine workflows available here do not require a third-party API key, OAuth login, or paid Archive.org account.

When should I use CDX search instead of availability?

Use CDX search when availability returns an empty result, when you need many captures, or when you need metadata such as status codes, MIME types, digests, and timestamp ranges.

Can agents compare archived page versions?

Agents can retrieve metadata and raw HTML for selected captures, then summarize visible changes. They should cite timestamps and archive URLs so the comparison stays traceable.

Are new Wayback captures immediate?

No. Save requests are asynchronous and can return an existing mirror or take time to appear. A follow-up CDX or timeline search is the best way to confirm the capture landed.

Setup Instructions

Connect Internet Archive Wayback Machine to any MCP client in minutes

https://mcp.mcpbundles.com/bundle/archive-org-wayback

One-click install:

The link prefills the Add custom connector dialog — you still review the values and click Add, then Connect to complete OAuth.

Or add manually

  1. Open claude.ai → Settings → Connectors.
  2. Click the + button and choose Add custom connector.
  3. Set Name to Internet Archive Wayback Machine and paste the MCP URL into Remote MCP server URL.
  4. Click Add. Internet Archive Wayback Machine will appear under Not connected — select it and click Connect to complete OAuth.
Name: Internet Archive Wayback Machine
Remote MCP server URL: https://mcp.mcpbundles.com/bundle/archive-org-wayback
Authentication: OAuth

Custom connectors at claude.ai require a paid Claude plan (Pro, Max, Team, or Enterprise).

Try Internet Archive Wayback Machine now

No API key or third-party login required. Chat with AI and run tools instantly.

Internet Archive Wayback Machine MCP Server & Skill