Data di Pubblicazione:
2022
Citazione:
Progressive Entity Resolution with Node Embeddings / Simonini, Giovanni; Gagliardelli, Luca; Rinaldi, Michele; Zecchini, Luca; De Sabbata, Giulio; Aslam, Adeel; Beneventano, Domenico; Bergamaschi, Sonia. - 3194:(2022), pp. 52-60. ( 30th Italian Symposium on Advanced Database Systems (SEBD 2022) Tirrenia (Pisa) June 19-22, 2022).
Abstract:
Entity Resolution (ER) is the task of finding records that refer to the same real-world entity, which are called matches. ER is a fundamental pre-processing step when dealing with dirty and/or heterogeneous datasets; however, it can be very time-consuming when employing complex machine learning models to detect matches, as state-of-the-art ER methods do. Thus, when time is a critical component and having a partial ER result is better than having no result at all, progressive ER methods are employed to try to maximize the number of detected matches as a function of time.
In this paper, we study how to perform progressive ER by exploiting graph embeddings. The basic idea is to represent candidate matches in a graph: each node is a record and each edge is a possible comparison to check—we build that on top of a well-known, established graph-based ER framework. We experimentally show that our method performs better than existing state-of-the-art progressive ER methods on real-world benchmark datasets.
Tipologia CRIS:
Relazione in Atti di Convegno
Keywords:
Entity Resolution, Pay-as-you-go, Data Cleaning, Graph Embedding
Elenco autori:
Simonini, Giovanni; Gagliardelli, Luca; Rinaldi, Michele; Zecchini, Luca; De Sabbata, Giulio; Aslam, Adeel; Beneventano, Domenico; Bergamaschi, Sonia
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Proceedings of the 30th Italian Symposium on Advanced Database Systems, SEBD 2022, Tirrenia (PI), Italy, June 19-22, 2022
Pubblicato in: