huggingface/datasets

By GitHub - 2021-01-05

Description

🤗 The largest hub of ready-to-use NLP datasets for ML models with fast, easy-to-use and efficient data manipulation tools - huggingface/datasets

Summary

  • README.md 🤗Datasets is a lightweight library providing two main features: With a simple command like squad_dataset = load_datasets("squad"), get any of these datasets ready to use in a dataloader for training/evaluating a ML model (Numpy/Pandas/PyTorch/TensorFlow/JAX), efficient data pre-processing: This gives access to the pair of a benchmark dataset and a benchmark metric for instance for benchmarks like SQuAD or GLUE.
  • Dataset but a built-in framework-agnostic dataset class with methods inspired by what we like in tf.data (like a map() method).

 

Topics

  1. Backend (0.31)
  2. Database (0.18)
  3. Frontend (0.1)

Similar Articles

asahi417/tner

By GitHub - 2021-03-03

Language model finetuning on NER with an easy interface, and cross-domain evaluation. We released NER models finetuned on various domain via huggingface model hub. - asahi417/tner

Datasets should behave like git repositories

By DAGsHub Blog - 2021-01-18

Create, maintain, and contribute to a long-living dataset that will update itself automatically across projects, using git and DVC as versioning systems.