Local Speech-to-Text MCP Server

@SmartLittleApps

Visit Server

9 months ago

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.

Overview Tools Comments

Overview

What is Local Speech-to-Text MCP Server?

Local Speech-to-Text MCP Server is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.

How to use Local Speech-to-Text MCP Server?

To use the server, clone the repository from GitHub, install the necessary dependencies, and configure your MCP client to connect to the server. You can transcribe audio files in various formats.

Key features of Local Speech-to-Text MCP Server?

100% Local Processing for complete privacy
Optimized for Apple Silicon with 15x+ real-time transcription speed
Speaker Diarization to identify and separate multiple speakers
Universal Audio Support with automatic conversion from various formats
Multiple Output Formats including txt, json, vtt, srt, csv
Low Memory Footprint of less than 2GB
Full TypeScript support for modern development

Use cases of Local Speech-to-Text MCP Server?

Transcribing meetings or lectures for documentation.
Creating subtitles for videos from audio content.
Assisting in accessibility by providing text for spoken content.

FAQ from Local Speech-to-Text MCP Server?

Is the transcription process cloud-based?

No, all processing is done locally, ensuring privacy.

What audio formats are supported?

The server supports WAV, FLAC, MP3, M4A, and more, with automatic conversion capabilities.

Do I need a HuggingFace account for speaker diarization?

Yes, a HuggingFace token is required for speaker diarization functionality.

Build with ShipAny.