6 months ago
A Markdown Content Preprocessor that fetches web pages, strips noise, and converts content to clean Markdown while preserving links. Designed for with minimal token footprint so entire pages can be read at once. Crawl and scrape webpage and whole sites locally with minimal dependencies.
Overview
MCP Read Website
Fetches web pages, strips noise, and converts content to clean Markdown while preserving links. Designed for LLM pipelines with minimal token footprint. Crawl sites locally with minimal dependencies.
Features
- Content extraction using Mozilla Readability (same as Firefox Reader View)
- HTML to Markdown conversion with Turndown + GFM support
- Smart caching with SHA-256 hashed URLs
- Polite crawling with robots.txt support and rate limiting
- Concurrent fetching with configurable depth crawling
- Stream-first design for low memory usage
- Link preservation for knowledge graphs
- Optional chunking for downstream processing
Available Tools
read_website_fast- Fetches a webpage and converts it to clean markdown- Parameters:
url(required): The HTTP/HTTPS URL to fetchdepth(optional): Crawl depth (0 = single page)respectRobots(optional): Whether to respect robots.txt
- Parameters:
Available Resources
read-website-fast://status- Get cache statisticsread-website-fast://clear-cache- Clear the cache directory
Server Config
{
"mcpServers": {
"read-website-fast": {
"command": "npx",
"args": [
"-y",
"github:just-every/mcp-read-website-fast",
"serve"
]
}
}
}