Skip to Main Content (Press Enter)

Logo UNIMORE
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNIMORE

|

UNI-FIND

unimore.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Pubblicazioni

Investigating Bidimensional Downsampling in Vision Transformer Models

Contributo in Atti di convegno
Data di Pubblicazione:
2022
Citazione:
Investigating Bidimensional Downsampling in Vision Transformer Models / Bruno, Paolo; Amoroso, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita. - 13232:(2022), pp. 287-299. ( 21st International Conference on Image Analysis and Processing, ICIAP 2022 Lecce, Italy 23 - 27 May 2022) [10.1007/978-3-031-06430-2_24].
Abstract:
Vision Transformers (ViT) and other Transformer-based architectures for image classification have achieved promising performances in the last two years. However, ViT-based models require large datasets, memory, and computational power to obtain state-of-the-art results compared to more traditional architectures. The generic ViT model, indeed, maintains a full-length patch sequence during inference, which is redundant and lacks hierarchical representation. With the goal of increasing the efficiency of Transformer-based models, we explore the application of a 2D max-pooling operator on the outputs of Transformer encoders. We conduct extensive experiments on the CIFAR-100 dataset and the large ImageNet dataset and consider both accuracy and efficiency metrics, with the final goal of reducing the token sequence length without affecting the classification performance. Experimental results show that bidimensional downsampling can outperform previous classification approaches while requiring relatively limited computation resources.
Tipologia CRIS:
Relazione in Atti di Convegno
Keywords:
Bidimensional downsampling; Vision Transformer; ViT
Elenco autori:
Bruno, Paolo; Amoroso, Roberto; Cornia, Marcella; Cascianelli, Silvia; Baraldi, Lorenzo; Cucchiara, Rita
Autori di Ateneo:
BARALDI LORENZO
CASCIANELLI Silvia
CORNIA MARCELLA
CUCCHIARA Rita
Link alla scheda completa:
https://iris.unimore.it/handle/11380/1268738
Titolo del libro:
Proceedings of the 21st International Conference on Image Analysis and Processing
Pubblicato in:
LECTURE NOTES IN COMPUTER SCIENCE
Journal
LECTURE NOTES IN COMPUTER SCIENCE
Series
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.4.5.0