Skip to Main Content (Press Enter)

Logo UNIMORE
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNIMORE

|

UNI-FIND

unimore.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Persone

Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data

Contributo in Atti di convegno
Data di Pubblicazione:
2021
Citazione:
Learning to Read L'Infinito: Handwritten Text Recognition with Synthetic Training Data / Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Piazzi, Maria Ludovica; Schiuma, Rosiana; Cucchiara, Rita. - 13053:(2021), pp. 340-350. ( 19th International Conference on Computer Analysis of Images and Patterns, CAIP 2021 Virtual 27 September - 01 October 2021) [10.1007/978-3-030-89131-2_31].
Abstract:
Deep learning-based approaches to Handwritten Text Recognition (HTR) have shown remarkable results on publicly available large datasets, both modern and historical. However, it is often the case that historical manuscripts are preserved in small collections, most of the time with unique characteristics in terms of paper support, author handwriting style, and language. State-of-the-art HTR approaches struggle to obtain good performance on such small manuscript collections, for which few training samples are available. In this paper, we focus on HTR on small historical datasets and propose a new historical dataset, which we call Leopardi, with the typical characteristics of small manuscript collections, consisting of letters by the poet Giacomo Leopardi, and devise strategies to deal with the training data scarcity scenario. In particular, we explore the use of carefully designed but cost-effective synthetic data for pre-training HTR models to be applied to small single-author manuscripts. Extensive experiments validate the suitability of the proposed approach, and both the Leopardi dataset and synthetic data will be available to favor further research in this direction.
Tipologia CRIS:
Relazione in Atti di Convegno
Keywords:
Handwritten text recognition; Historical documents; Synthetic data;
Elenco autori:
Cascianelli, Silvia; Cornia, Marcella; Baraldi, Lorenzo; Piazzi, Maria Ludovica; Schiuma, Rosiana; Cucchiara, Rita
Autori di Ateneo:
BARALDI LORENZO
CASCIANELLI Silvia
CORNIA MARCELLA
CUCCHIARA Rita
SCHIUMA ROSIANA
Link alla scheda completa:
https://iris.unimore.it/handle/11380/1249339
Link al Full Text:
https://iris.unimore.it//retrieve/handle/11380/1249339/360571/2021_CAIP_HTR.pdf
Titolo del libro:
Proceedings of the 19th International Conference on Computer Analysis of Images and Patterns
Pubblicato in:
LECTURE NOTES IN COMPUTER SCIENCE
Journal
LECTURE NOTES IN COMPUTER SCIENCE
Series
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.4.5.0