Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision

By arXiv.org - 2020-10-15

Description

Humans learn language by listening, speaking, writing, reading, and also, via interaction with the multimodal real world. Existing language pre-training frameworks show the effectiveness of text-only ...

Summary

  • Computer Science > Computation and Language Abstract: Existing language pre-training frameworks show the effectiveness of text-only self-supervision while we explore the idea of a visually-supervised language model in this paper.
  • We find that the main reason hindering this exploration is the large divergence in magnitude and distributions between the visually-grounded language datasets and pure-language corpora.

 

Topics

  1. NLP (0.32)
  2. UX (0.08)
  3. Backend (0.08)

Similar Articles

Code and Named Entity Recognition in StackOverflow

By arXiv.org - 2020-10-14

There is an increasing interest in studying natural language and computer code together, as large corpora of programming texts become readily available on the Internet. For example, StackOverflow curr ...