Party Extraction from Legal Contracts Dataset


Description:

The Party Annotated Legal Contracts Dataset is a large-scale dataset designed to facilitate the extraction of legal parties from contract documents. It consists of 1,000 legal contracts annotated with precise party information. Among them:

  • 510 contracts are sourced from the CUAD dataset and have been re-annotated to improve precision.
  • 490 contracts are newly annotated from the EDGAR database.

The dataset is specifically tailored for machine learning and NLP-based legal research, addressing challenges posed by false positives, complex contract structures, and inconsistencies in legal entity recognition.

Features:

  • Language: English
  • Format: JSON format with structured annotations for extractive QA models.
  • Annotations: Exact spans of contracting parties for improved legal entity extraction.
  • Contract Types: Covers 23 types of legal contract

Applications:

  • Legal AI & NLP: Benchmarking party extraction models for contract analysis.
  • Legal Assistive Software: Automating contract review for law firms and enterprises.
  • Named Entity Recognition (NER): Enhancing legal NER systems.
  • Question Answering (QA): Improving extractive QA models for legal document understanding.

Conference Information

  • Conference:
    Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing (ACL 2023)


Dataset Access