Utility is in the Eye of the User: A Critique of NLP Leaderboards

By arXiv.org - 2020-10-01


Benchmarks such as GLUE have helped drive advances in NLP by incentivizing the creation of more accurate models. While this leaderboard paradigm has been remarkably successful, a historical focus on p ...


  • Abstract: While this leaderboard paradigm has been remarkably successful, a historical focus on performance-based evaluation has been at the expense of other qualities that the NLP community values in models, such as compactness, fairness, and energy efficiency.
  • We frame both the leaderboard and NLP practitioners as consumers and the benefit they get from a model as its utility to them.



  1. UX (0.18)
  2. NLP (0.15)
  3. Backend (0.05)

Similar Articles

Code and Named Entity Recognition in StackOverflow

By arXiv.org - 2020-10-14

There is an increasing interest in studying natural language and computer code together, as large corpora of programming texts become readily available on the Internet. For example, StackOverflow curr ...