mcp-server-webcrawl

@pragmar

a year ago

Bridge the gap between your web crawler and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Support for WARC, wget, InterroBot, Katana, and SiteOne crawlers is available out of the gate. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.

Overview Tools Comments

Overview

what is mcp-server-webcrawl?

mcp-server-webcrawl is an open-source server that bridges the gap between web crawlers and AI language models using the Model Context Protocol (MCP). It allows AI clients to filter and analyze web content, extracting insights either under user direction or autonomously.

how to use mcp-server-webcrawl?

To use mcp-server-webcrawl, install it via pip with the command: pip install mcp-server-webcrawl. You can then run the server using the command: mcp-server-webcrawl --crawler wget --datasrc /path/to/wget/archives/.

key features of mcp-server-webcrawl?

Compatibility with Claude Desktop
Full-text search interface with boolean support
Resource filtering by type and HTTP status
Support for various crawlers including wget, WARC, and more
Ability to augment your LLM knowledge base
ChatGPT support is coming soon

use cases of mcp-server-webcrawl?

Analyzing web content for research purposes
Extracting insights from large datasets collected by web crawlers
Enhancing AI language models with real-time web data

FAQ from mcp-server-webcrawl?

Is mcp-server-webcrawl free to use?

Yes! mcp-server-webcrawl is free and open-source.

What are the system requirements?

It requires Claude Desktop and Python version 3.10 or higher.

Which crawlers are supported?

It supports wget, WARC, InterroBot, Katana, and SiteOne crawlers.

Try in Playground

Server Config

{
  "mcpServers": {
    "webcrawl": {
      "command": "mcp-server-webcrawl",
      "args": [
        "--crawler",
        "wget",
        "--datasrc",
        "/path/to/wget/archives/"
      ]
    }
  }
}

Build with ShipAny.