Skip to Main Content (Press Enter)

Logo UNIMORE
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNIMORE

|

UNI-FIND

unimore.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Pubblicazioni

Retrieval-Augmented Transformer for Image Captioning

Contributo in Atti di convegno
Data di Pubblicazione:
2022
Citazione:
Retrieval-Augmented Transformer for Image Captioning / Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita. - (2022), pp. 1-7. ( 19th International Conference on Content-based Multimedia Indexing, CBMI 2022 Graz, Austria SEP 14-16, 2022) [10.1145/3549555.3549585].
Abstract:
Image captioning models aim at connecting Vision and Language by providing natural language descriptions of input images. In the past few years, the task has been tackled by learning parametric models and proposing visual feature extraction advancements or by modeling better multi-modal connections. In this paper, we investigate the development of an image captioning approach with a kNN memory, with which knowledge can be retrieved from an external corpus to aid the generation process. Our architecture combines a knowledge retriever based on visual similarities, a differentiable encoder, and a kNN-augmented attention layer to predict tokens based on the past context and on text retrieved from the external memory. Experimental results, conducted on the COCO dataset, demonstrate that employing an explicit external memory can aid the generation process and increase caption quality. Our work opens up new avenues for improving image captioning models at larger scale.
Tipologia CRIS:
Relazione in Atti di Convegno
Keywords:
image captioning; image retrieval; vision-and-language;
Elenco autori:
Sarto, Sara; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita
Autori di Ateneo:
BARALDI LORENZO
CORNIA MARCELLA
CUCCHIARA Rita
SARTO SARA
Link alla scheda completa:
https://iris.unimore.it/handle/11380/1281718
Link al Full Text:
https://iris.unimore.it//retrieve/handle/11380/1281718/467384/2022_CBMI_Captioning.pdf
Titolo del libro:
Proceedings of the 19th International Conference on Content-based Multimedia Indexing, CBMI 2022
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.5.0.0