The Party Annotated Legal Contracts Dataset is a large-scale dataset designed to facilitate the extraction of legal parties from contract documents. It consists of 1,000 legal contracts annotated with precise party information. Among them:
- 510 contracts are sourced from the CUAD dataset and have been re-annotated to improve precision.
- 490 contracts are newly annotated from the EDGAR database.
The dataset is specifically tailored for machine learning and NLP-based legal research, addressing challenges posed by false positives, complex contract structures, and inconsistencies in legal entity recognition.
Features:
- Language: English
- Format: JSON format with structured annotations for extractive QA models.
- Annotations: Exact spans of contracting parties for improved legal entity extraction.
- Contract Types: Covers 23 types of legal contract
Applications:
- Legal AI & NLP: Benchmarking party extraction models for contract analysis.
- Legal Assistive Software: Automating contract review for law firms and enterprises.
- Named Entity Recognition (NER): Enhancing legal NER systems.
- Question Answering (QA): Improving extractive QA models for legal document understanding.
Conference Information
-
Conference:
Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing (ACL 2023)
Dataset Access
-
GitHub Repository:
Party Extraction GitHub Repository - Dataset Download: here