r/MachineLearning - [N] Legal NLP Dataset With Over 13,000 Anotations Released

By reddit - 2021-03-12

Description

272 votes, 10 comments. Legal datasets are extremely expensive because lawyers are, and this has bottlenecked legal NLP. To address this, a by the …

Summary

  • [N] Legal NLP Dataset With Over 13,000 Anotations Released [N] Legal NLP Dataset With Over 13,000 Anotations Released Legal datasets are extremely expensive because lawyers are, and this has bottlenecked legal NLP.
  • the beta, posted last year, only had ~3,000 labels.
  • The dataset called CUAD is somewhat like the SQuAD 2.0 dataset because models highlight relevant portions of the document.
  • It looks like the models were trained by finetuning directly on question answering (span labeling?)

 

Topics

  1. NLP (0.19)
  2. UX (0.04)
  3. Management (0.02)

Similar Articles

huggingface/datasets

By GitHub - 2021-01-05

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools - huggingface/datasets

asahi417/tner

By GitHub - 2021-03-03

Language model finetuning on NER with an easy interface, and cross-domain evaluation. We released NER models finetuned on various domain via huggingface model hub. - asahi417/tner

Datasets should behave like git repositories

By DAGsHub Blog - 2021-01-18

Create, maintain, and contribute to a long-living dataset that will update itself automatically across projects, using git and DVC as versioning systems.

Random Forest for Time Series Forecasting

By Machine Learning Mastery - 2020-11-01

Random Forest is a popular and effective ensemble machine learning algorithm. It is widely used for classification and regression predictive modeling problems with structured (tabular) data sets, e.g. ...