Skip to Main Content (Press Enter)

Logo UNIMORE
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze

UNI-FIND
Logo UNIMORE

|

UNI-FIND

unimore.it
  • ×
  • Home
  • Corsi
  • Insegnamenti
  • Professioni
  • Persone
  • Pubblicazioni
  • Strutture
  • Terza Missione
  • Attività
  • Competenze
  1. Pubblicazioni

BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution

Articolo
Data di Pubblicazione:
2016
Citazione:
BLAST: a Loosely Schema-aware Meta-blocking Approach for Entity Resolution / Simonini, G., Bergamaschi, S., Jagadish, H.V.. - In: PROCEEDINGS OF THE VLDB ENDOWMENT. - ISSN 2150-8097. - STAMPA. - 9:12(2016), pp. 1173-1184. (42nd International Conference on Very Large Data Bases, VLDB 2016 2016) [10.14778/2994509.2994533].
Abstract:
Identifying records that refer to the same entity is a fundamental step for data integration. Since it is prohibitively expensive to compare every pair of records, blocking techniques are typically employed to reduce the complexity of this task. These techniques partition records into blocks and limit the comparison to records co-occurring in a block. Generally, to deal with highly heterogeneous and noisy data (e.g. semi-structured data of the Web), these techniques rely on redundancy to reduce the chance of missing matches.
Meta-blocking is the task of restructuring blocks generated by redundancy-based blocking techniques, removing superfluous comparisons. Existing meta-blocking approaches rely exclusively on schema-agnostic features.
In this paper, we demonstrate how “loose” schema information (i.e., statistics collected directly from the data) can be exploited to enhance the quality of the blocks in a holistic loosely schema-aware (meta-)blocking approach that can be used to speed up your favorite Entity Resolution algorithm. We call it Blast (Blocking with Loosely-Aware Schema Techniques). We show how Blast can automatically extract this loose information by adopting a LSH-based step for e ciently scaling to large datasets. We experimentally demonstrate, on real-world datasets, how Blast outperforms the state-of-the-art unsupervised meta-blocking approaches, and, in many cases, also the supervised one.
Tipologia CRIS:
Articolo su rivista
Keywords:
Entity Resolution; Meta-blocking; Big Data; Data Cleaning
Elenco autori:
Simonini, Giovanni; Bergamaschi, Sonia; Jagadish, H. V.
Autori di Ateneo:
BERGAMASCHI Sonia
SIMONINI GIOVANNI
Link alla scheda completa:
https://iris.unimore.it/handle/11380/1111659
Link al Full Text:
https://iris.unimore.it//retrieve/handle/11380/1111659/89895/p1173-simonini.pdf
Pubblicato in:
PROCEEDINGS OF THE VLDB ENDOWMENT
Journal
  • Dati Generali

Dati Generali

URL

http://www.vldb.org/pvldb/vol9/p1173-simonini.pdf
  • Utilizzo dei cookie

Realizzato con VIVO | Designed by Cineca | 26.6.1.0