8 months ago
Bridge the gap between your web crawler and AI language models using Model Context Protocol (MCP). With mcp-server-webcrawl, your AI client filters and analyzes web content under your direction or autonomously, extracting insights from your web content. Support for WARC, wget, InterroBot, Katana, and SiteOne crawlers is available out of the gate. The server includes a full-text search interface with boolean support, resource filtering by type, HTTP status, and more.
Overview
what is mcp-server-webcrawl?
mcp-server-webcrawl is an open-source server that bridges the gap between web crawlers and AI language models using the Model Context Protocol (MCP). It allows AI clients to filter and analyze web content, extracting insights either under user direction or autonomously.
how to use mcp-server-webcrawl?
To use mcp-server-webcrawl, install it via pip with the command: pip install mcp-server-webcrawl. You can then run the server using the command: mcp-server-webcrawl --crawler wget --datasrc /path/to/wget/archives/.
key features of mcp-server-webcrawl?
- Compatibility with Claude Desktop
- Full-text search interface with boolean support
- Resource filtering by type and HTTP status
- Support for various crawlers including wget, WARC, and more
- Ability to augment your LLM knowledge base
- ChatGPT support is coming soon
use cases of mcp-server-webcrawl?
- Analyzing web content for research purposes
- Extracting insights from large datasets collected by web crawlers
- Enhancing AI language models with real-time web data
FAQ from mcp-server-webcrawl?
- Is mcp-server-webcrawl free to use?
Yes! mcp-server-webcrawl is free and open-source.
- What are the system requirements?
It requires Claude Desktop and Python version 3.10 or higher.
- Which crawlers are supported?
It supports wget, WARC, InterroBot, Katana, and SiteOne crawlers.
Server Config
{
"mcpServers": {
"webcrawl": {
"command": "mcp-server-webcrawl",
"args": [
"--crawler",
"wget",
"--datasrc",
"/path/to/wget/archives/"
]
}
}
}