#vision

32 results found

🚀 OpenCV MCP Server

OpenCV MCP Server provides OpenCV's image and video processing capabilities through the Model Context Protocol (MCP). Access powerful computer vision tools for tasks ranging from basic image manipulation to advanced object detection and tracking.

MCP OpenVision

MCP Server using OpenRouter models to get descriptions for images

groundlight-mcp-server

MCP Server for Groundlight

MCP Server for CVDLT(Computer Vision & Deep Learning Tools)

The repo is based on Model Context procotol of Python SDK, including DL models in CV, and provide the abilities to the LLM or vLLM model

�

🚀 Wayland MCP Server

MCP Server for Wayland

MCPControl

MCP server for Windows OS automation

Snaprender Url To Screenshot

Screenshot API for AI agents. Capture any website as PNG, JPEG, WebP, or PDF with a single tool call. Supports full-page capture, device emulation (iPhone, iPad, Pixel, MacBook), dark mode, ad blocking, cookie banner removal, custom viewports, and CSS selector hiding. Includes cache checking (free, doesn't count against quota) and real-time usage monitoring. Stealth mode defeats most bot detection. Free tier: 50 screenshots/month, no credit card required.

Apple RAG MCP

Transform your AI agents into Apple development experts! Apple RAG MCP gives you instant access to official Swift docs, design guidelines, and comprehensive Apple platform knowledge through cutting-edge RAG technology. With professional AI reranking and hybrid search across iOS, macOS, watchOS, tvOS, and visionOS documentation plus Apple Developer YouTube content, you'll get precise, contextual answers every time. Compatible with Cursor, Claude Desktop, and all MCP tools - start building smarter Apple apps today!

LibreChat

Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active project.

AutoProvisioner MCP Server (open beta)

Mirror of

UI-TARS Desktop

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

Trend Vision One MCP Server

The Trend Vision One Model Context Protocol (MCP) Server enables natural language interaction between your favourite AI tooling and the Trend Vision One web APIs. This allows users to harness the power of Large Language Models (LLM) to interpret and respond to security events.

UI-TARS Desktop 🚀

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

Vision Mcp Server | 图片分析 Mcp

This MCP addresses the visual recognition limitations of text-based models by enabling accurate image description and identification, making it excellent for AI-assisted reference design interface analysis. It currently supports dropping links into the dialog box or placing images in the project folder for recognition. The tool can be integrated with MCP platforms like Claude Code, Cline, and Trae. Beyond programming applications, it also provides visual recognition capabilities for models that lack native image processing functionality. For visual models, users can select their preferred model from ModelScope community and replace it during MCP configuration setup. 📱 Daily Use Cases: Send screenshots to directly identify errors or issues Share image links or place screenshots in the project folder for AI-assisted layout optimization Submit product image links to generate promotional copy 该mcp可以解决文字模型图片识别的视觉的问题，可以准确识别描述图片，用来给AI看参考设计界面很nice~ 目前支持丢链接到对话框，以及把图片放到项目文件夹进行识别。支持加入到Claude Code，Cline和Trae等mcp工具中。除了编程外，如果你使用的模型本身不支持视觉图片识别，也可以使用~ 视觉模型可以自己去魔搭社区选一个自己喜欢的，在填写mcp配置的时候替换即可 📱 日常使用场景 - 截图发过去，直接告诉哪里出错了 - 丢过去一个图片链接或者截图放到项目文件夹内，让AI帮忙优化布局 - 发个产品图链接，让AI写推广文案

Frametrace | Reverse Video Search

FrameTrace is an AI-powered reverse video search engine that helps you find any video's original source, detect duplicates, and verify authenticity across platforms like YouTube, TikTok, Instagram, and Reddit. Using advanced computer vision and machine learning, it analyzes videos frame-by-frame to trace content origins even after editing or re-encoding.

Roboflow

Create, train, and deploy computer vision models.

Kelnix Receipt Mcp Api

Description: Turn any receipt into structured, accounting-ready JSON or clean Markdown with one API call. AI-powered vision extracts merchant, date, line items, tax breakdown, totals, currency, and confidence scores — then suggests the right GL account for instant bookkeeping. 7 tools for the full receipt-to-journal-entry pipeline. Built for expense automation agents. 50 free credits on signup, no credit card required.

Superdocs

A structured-document editor for AI agents. SuperDocs gives your AI 21 MCP tools and 4 workflow prompts to make section-precise edits — bold a specific paragraph, replace a single table cell, restructure a heading — without disturbing surrounding content. Tables, borders, alternating row shading, fonts, and inline styling all survive AI edits AND round-trip exports across .docx, PDF, HTML, Markdown, and RTF. Other capabilities: pre-signed URL upload/download (no context bloat for files >100KB), compact response mode for editing 100-page documents efficiently (~140× token reduction), multimodal vision on attachments, human-in-the-loop approval for sensitive edits, and multi-language editing across 16+ languages. Free plan: 500 ops/month, no credit card required.

Asterwise — Astrology, Numerology & Tarot MCP Server

MCP server for astrology and divination. Connect Claude, ChatGPT, or any MCP-compatible AI to real ephemeris calculations across Vedic and Western astrology, numerology, and tarot. Covers natal charts, 5-level Vimshottari Dasha, yoga and dosha detection, matchmaking with Rajju/Vedha vetoes, panchanga, KP system, Lal Kitab, and tarot spreads. 103 tools, OAuth 2.1. Free sandbox tier — 500 calls/month, no credit card.

Luxxon

On-demand live vision for AI agents. Open a session at a lat/lng, receive a JPEG snapshot or WebRTC stream, settle per-second in USDC on Base. The agent-first way to give an LLM eyes on the physical world. Four tools today: get_session, get_frame, get_stream_url, cancel_session. Drop the MCP server into Claude Desktop, Cursor, or Claude Code with one npx command + a Luxxon API key from console.luxxon.dev. Currently on Base Sepolia testnet. Docs: https://docs.luxxon.dev.

Build with ShipAny.