Description
This blog post describes the TRIC model — an architecture for Relative Image Captioning task that was created as a part of my Master Thesis. Below you can find the list of questions that will be…
Summary
- Give me two dresses and TRIC will tell you the differences between them 👗 👚 This blog post describes the TRIC model — an architecture for Relative Image Captioning task that was created as a part of my Master Thesis.
- Initially, it seemed almost magical that the model is able to generate a caption describing the image’s content.
- Having n vectors each of size 768 (where n is the length of the caption and 768 is the hidden dim of BERT), one has to add information about the position of tokens within the caption.
- As it can be seen in the image above the model is able to generate meaningful captions but the direction of the relationship is wrong.