TRIC — Transformer-based Relative Image Captioning

By Medium - 2021-03-17

Description

This blog post describes the TRIC model — an architecture for Relative Image Captioning task that was created as a part of my Master Thesis. Below you can find the list of questions that will be…

Summary

  • Give me two dresses and TRIC will tell you the differences between them 👗 👚 This blog post describes the TRIC model — an architecture for Relative Image Captioning task that was created as a part of my Master Thesis.
  • Initially, it seemed almost magical that the model is able to generate a caption describing the image’s content.
  • Having n vectors each of size 768 (where n is the length of the caption and 768 is the hidden dim of BERT), one has to add information about the position of tokens within the caption.
  • As it can be seen in the image above the model is able to generate meaningful captions but the direction of the relationship is wrong.

 

Topics

  1. NLP (0.18)
  2. Machine_Learning (0.09)
  3. UX (0.08)

Similar Articles

A Beginner’s Guide to the CLIP Model

By KDnuggets - 2021-03-11

CLIP is a bridge between computer vision and natural language processing. I'm here to break CLIP down for you in an accessible and fun read! In this post, I'll cover what CLIP is, how CLIP works, and ...

Semantic hand segmentation using Pytorch

By Medium - 2020-12-02

Semantic segmentation is the task of predicting the class of each pixel in an image. This problem is more difficult than object detection…

pytorch-widedeep: deep learning for tabular data

By Medium - 2021-02-22

This is the third of a series of posts introducing pytorch-widedeepa flexible package to combine tabular data with text and images (that could also be used for “standard” tabular data alone). The…

trekhleb/links-detector

By GitHub - 2020-12-07

📖 👆🏻 Links Detector makes printed links clickable via your smartphone camera. No need to type a link in, just scan and click on it. - trekhleb/links-detector