Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization

Articolo

Data di Pubblicazione:

2024

Citazione:

Are Learnable Prompts the Right Way of Prompting? Adapting Vision-and-Language Models with Memory Optimization / Moratelli, N., Barraco, M., Cornia, M., Baraldi, L., Cucchiara, R.. - In: IEEE INTELLIGENT SYSTEMS. - ISSN 1541-1672. - 39:3(2024), pp. 26-34. [10.1109/MIS.2024.3386099]

Abstract:

Few-shot learning (FSL) requires fine-tuning a pretrained model on a limited set of examples from novel classes. When applied to vision-and-language models, the dominant approach for FSL has been that of learning input prompts which can be concatenated to the input context of the model. Despite the considerable promise they hold, the effectiveness and expressive power of prompts are limited by the fact that they can only lie at the input of the architecture. In this article, we critically question the usage of learnable prompts, and instead leverage the concept of “implicit memory” to directly capture low- and high-level relationships within the attention mechanism at any layer of the architecture, thereby establishing an alternative to prompts in FSL. Our proposed approach, termed MemOp, exhibits superior performance across 11 widely recognized image classification datasets and a benchmark for contextual domain shift evaluation, effectively addressing the challenges associated with learnable prompts.

Tipologia CRIS:

Articolo su rivista

Elenco autori:

Moratelli, Nicholas; Barraco, Manuele; Cornia, Marcella; Baraldi, Lorenzo; Cucchiara, Rita

Autori di Ateneo:

BARALDI LORENZO

CORNIA MARCELLA

CUCCHIARA Rita

Link alla scheda completa:

https://iris.unimore.it/handle/11380/1335593

Pubblicato in:

IEEE INTELLIGENT SYSTEMS

Journal