Description
272 votes, 10 comments. Legal datasets are extremely expensive because lawyers are, and this has bottlenecked legal NLP. To address this, a by the …
Summary
- [N] Legal NLP Dataset With Over 13,000 Anotations Released [N] Legal NLP Dataset With Over 13,000 Anotations Released Legal datasets are extremely expensive because lawyers are, and this has bottlenecked legal NLP.
- the beta, posted last year, only had ~3,000 labels.
- The dataset called CUAD is somewhat like the SQuAD 2.0 dataset because models highlight relevant portions of the document.
- It looks like the models were trained by finetuning directly on question answering (span labeling?)