Submit

Portuguese Legal Document PDF Metadata Extractor

@geek2geeks

MCP server for extracting metadata from Portuguese legal documents using advanced PDF processing and database architecture
Overview

The Portuguese Legal Document PDF Metadata Extractor is a robust Python tool designed to extract structured metadata from Portuguese legal document PDFs, specifically those formatted according to the European Case Law Identifier (ECLI).

To use the extractor, clone the project repository, install the required dependencies, and place your PDF files in the designated directory. You can then utilize the PortugueseLegalPDFExtractor class to extract metadata from individual PDFs or batch process multiple documents.

  • High accuracy with a 100% confidence score and 96.84% exact match rate.
  • Production-ready with two extractor variants for different use cases.
  • Robust error handling and comprehensive validation.
  • Flexible confidence scoring options.
  • User-friendly interface with clear progress reporting.
  1. Extracting metadata from legal documents for research purposes.
  2. Automating the processing of large volumes of legal PDFs.
  3. Validating the accuracy of extracted data against ground truth.
  • What types of documents can be processed?

    The extractor is designed for Portuguese legal documents formatted in ECLI.

  • Is there a command line interface available?

    Yes, the production extractor includes a full CLI for easy usage.

  • What are the prerequisites for installation?

    You need Python 3.8+ and the pdfplumber package installed.

© 2025 MCP.so. All rights reserved.

Build with ShipAny.