Helsinki-NLP/Tatoeba-Challenge

By GitHub - 2021-03-22

Description

Contribute to Helsinki-NLP/Tatoeba-Challenge development by creating an account on GitHub.

Summary

  • In more detail This package provides data sets for machine translation in many languages with test data taken from Tatoeba.
  • Naturally, training data do not include Tatoeba sentences and the popular WMT testsets are not included to allow a fair comparison to other models using those data sets.
  • We will also publish (reasonable) models to be re-used and deployed through OPUS-MT and linked from the model subdir in this github.
  • However, there can be identical source sentences or identical target sentences in both sets, which are not linked to the same translations.

 

Topics

  1. Backend (0.32)
  2. Database (0.17)
  3. NLP (0.16)

Similar Articles

15 Essential Steps To Build Reliable Data Pipelines

By Medium - 2020-12-01

If I learned anything from working as a data engineer, it is that practically any data pipeline fails at some point. Broken connection, broken dependencies, data arriving too late, or some external…

The Growing Importance of Metadata Management Systems

By Gradient Flow - 2021-02-02

Metadata will be the foundation for data governance solutions, data catalogs, and other enterprise data systems. By Assaf Araki and Ben Lorica. Introduction As companies embrace digital technologie…