Submit

Local Speech-to-Text MCP Server

@SmartLittleApps

A high-performance Model Context Protocol (MCP) server providing local speech-to-text transcription using whisper.cpp, optimized for Apple Silicon.
Overview

What is Local Speech-to-Text MCP Server?

Local Speech-to-Text MCP Server is a high-performance Model Context Protocol (MCP) server that provides local speech-to-text transcription using whisper.cpp, specifically optimized for Apple Silicon devices.

How to use Local Speech-to-Text MCP Server?

To use the server, clone the repository from GitHub, install the necessary dependencies, and configure your MCP client to connect to the server. You can transcribe audio files in various formats.

Key features of Local Speech-to-Text MCP Server?

  • 100% Local Processing for complete privacy
  • Optimized for Apple Silicon with 15x+ real-time transcription speed
  • Speaker Diarization to identify and separate multiple speakers
  • Universal Audio Support with automatic conversion from various formats
  • Multiple Output Formats including txt, json, vtt, srt, csv
  • Low Memory Footprint of less than 2GB
  • Full TypeScript support for modern development

Use cases of Local Speech-to-Text MCP Server?

  1. Transcribing meetings or lectures for documentation.
  2. Creating subtitles for videos from audio content.
  3. Assisting in accessibility by providing text for spoken content.

FAQ from Local Speech-to-Text MCP Server?

  • Is the transcription process cloud-based?

No, all processing is done locally, ensuring privacy.

  • What audio formats are supported?

The server supports WAV, FLAC, MP3, M4A, and more, with automatic conversion capabilities.

  • Do I need a HuggingFace account for speaker diarization?

Yes, a HuggingFace token is required for speaker diarization functionality.

© 2025 MCP.so. All rights reserved.

Build with ShipAny.