Submit

mcp-server-webcrawl

@pragmar

Bridge the gap between your web crawler and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Support for WARC, wget, InterroBot, Katana, and SiteOne crawlers is available out of the gate. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.
Overview

what is mcp-server-webcrawl?

mcp-server-webcrawl is an open-source server that bridges the gap between web crawlers and AI language models using the Model Context Protocol (MCP). It allows AI clients to filter and analyze web content, extracting insights either under user direction or autonomously.

how to use mcp-server-webcrawl?

To use mcp-server-webcrawl, install it via pip with the command: pip install mcp-server-webcrawl. You can then run the server using the command: mcp-server-webcrawl --crawler wget --datasrc /path/to/wget/archives/.

key features of mcp-server-webcrawl?

  • Compatibility with Claude Desktop
  • Full-text search interface with boolean support
  • Resource filtering by type and HTTP status
  • Support for various crawlers including wget, WARC, and more
  • Ability to augment your LLM knowledge base
  • ChatGPT support is coming soon

use cases of mcp-server-webcrawl?

  1. Analyzing web content for research purposes
  2. Extracting insights from large datasets collected by web crawlers
  3. Enhancing AI language models with real-time web data

FAQ from mcp-server-webcrawl?

  • Is mcp-server-webcrawl free to use?

Yes! mcp-server-webcrawl is free and open-source.

  • What are the system requirements?

It requires Claude Desktop and Python version 3.10 or higher.

  • Which crawlers are supported?

It supports wget, WARC, InterroBot, Katana, and SiteOne crawlers.

Server Config

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
      "args": [
        "--crawler",
        "wget",
        "--datasrc",
        "/path/to/wget/archives/"
      ]
    }
  }
}
© 2025 MCP.so. All rights reserved.

Build with ShipAny.