Data di Pubblicazione:
2017
Citazione:
Layout analysis and content classification in digitized books / Corbelli, Andrea; Baraldi, Lorenzo; Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita. - ELETTRONICO. - 701:(2017), pp. 153-165. ( 12th Italian Research Conference on Digital Libraries, IRCDL 2016 Firenze Feb. 4-5) [10.1007/978-3-319-56300-8_14].
Abstract:
Automatic layout analysis has proven to be extremely important in the process of digitization of large amounts of documents. In this paper we present a mixed approach to layout analysis, introducing a SVM-aided layout segmentation process and a classification process based on local and geometrical features. The final output of the automatic analysis algorithm is a complete and structured annotation in JSON format, containing the digitalized text as well as all the references to the illustrations of the input page, and which can be used by visualization interfaces as well as annotation interfaces. We evaluate our algorithm on a large dataset built upon the first volume of the “Enciclopedia Treccani”.
Tipologia CRIS:
Relazione in Atti di Convegno
Keywords:
digitazion, digital libraries, layout analysis, content classification
Elenco autori:
Corbelli, Andrea; Baraldi, Lorenzo; Balducci, Fabrizio; Grana, Costantino; Cucchiara, Rita
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Digital Libraries and Multimedia Archives
Pubblicato in: