Skip to Main Content (Press Enter)

Logo UNIMORE
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNIMORE

|

UNI-FIND

unimore.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Pubblicazioni

Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era

Contributo in Atti di convegno
Data di Pubblicazione:
2017
Citazione:
Attentive Models in Vision: Computing Saliency Maps in the Deep Learning Era / Cornia, Marcella; Abati, Davide; Baraldi, Lorenzo; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita. - 10640:(2017), pp. 387-399. ( 16th International Conference of the Italian Association for Artificial Intelligence Bari, Italy November 14-17, 2017) [10.1007/978-3-319-70169-1_29].
Abstract:
Estimating the focus of attention of a person looking at an image or a video is a crucial step which can enhance many vision-based inference mechanisms: image segmentation and annotation, video captioning, autonomous driving are some examples. The early stages of the attentive behavior are typically bottom-up; reproducing the same mechanism means to find the saliency embodied in the images, i.e. which parts of an image pop out of a visual scene. This process has been studied for decades in neuroscience and in terms of computational models for reproducing the human cortical process. In the last few years, early models have been replaced by deep learning architectures, that outperform any early approach compared against public datasets. In this paper, we propose a discussion on why convolutional neural networks (CNNs) are so accurate in saliency prediction. We present our DL architectures which combine both bottom-up cues and higher-level semantics, and incorporate the concept of time in the attentional process through LSTM recurrent architectures. Eventually, we present a video-specific architecture based on the C3D network, which can extracts spatio-temporal features by means of 3D convolutions to model task-driven attentive behaviors. The merit of this work is to show how these deep networks are not mere brute-force methods tuned on massive amount of data, but represent well-defined architectures which recall very closely the early saliency models, although improved with the semantics learned by human ground-thuth.
Tipologia CRIS:
Relazione in Atti di Convegno
Elenco autori:
Cornia, Marcella; Abati, Davide; Baraldi, Lorenzo; Palazzi, Andrea; Calderara, Simone; Cucchiara, Rita
Autori di Ateneo:
BARALDI LORENZO
CALDERARA Simone
CORNIA MARCELLA
CUCCHIARA Rita
Link alla scheda completa:
https://iris.unimore.it/handle/11380/1144997
Link al Full Text:
https://iris.unimore.it//retrieve/handle/11380/1144997/165110/paper_84%20(1).pdf
Titolo del libro:
AI*IA 2017 Advances in Artificial Intelligence
Pubblicato in:
LECTURE NOTES IN COMPUTER SCIENCE
Journal
LECTURE NOTES IN COMPUTER SCIENCE
Series
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.4.5.0