Data di Pubblicazione:
2023
Citazione:
Entity Resolution On-Demand for Querying Dirty Datasets / Simonini, Giovanni; Zecchini, Luca; Naumann, Felix; Bergamaschi, Sonia. - 3478:(2023), pp. 410-419. ( 31st Italian Symposium on Advanced Database Systems (SEBD 2023) Galzignano Terme (Padova), Italy July 2-5, 2023).
Abstract:
Entity Resolution (ER) is the process of identifying and merging records that refer to the same real-world entity. ER is usually applied as an expensive cleaning step on the entire data before consuming it, yet the relevance of cleaned entities ultimately depends on the user’s specific application, which may only require a small portion of the entities. We introduce BrewER, a framework designed to evaluate SQL SP queries on unclean data while progressively providing results as if they were obtained from cleaned data. BrewER aims at cleaning a single entity at a time, adhering to an ORDER BY predicate, thus it inherently supports top-k queries and stop-and-resume execution. This approach can save a significant amount of resources for various applications. BrewER has been implemented as an open-source Python library and can be seamlessly employed with existing ER tools and algorithms. We thoroughly demonstrated its efficiency through its evaluation on four real-world datasets.
Tipologia CRIS:
Relazione in Atti di Convegno
Keywords:
Data Integration; ELT; Entity Resolution
Elenco autori:
Simonini, Giovanni; Zecchini, Luca; Naumann, Felix; Bergamaschi, Sonia
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Proceedings of the 31st Symposium of Advanced Database Systems, Galzignano Terme, Italy, July 2nd to 5th, 2023
Pubblicato in: