Publication Date:
2021
Short description:
Progressive Query-Driven Entity Resolution / Zecchini, Luca. - 13058:(2021), pp. 395-401. ( 14th International Conference on Similarity Search and Applications (SISAP 2021) Dortmund, Germany (virtual event) September, 29 - October, 1) [10.1007/978-3-030-89657-7_30].
abstract:
Entity Resolution (ER) aims to detect in a dirty dataset the records that refer to the same real-world entity, playing a fundamental role in data cleaning and integration tasks. Often, a data scientist is only interested in a portion of the dataset (e.g., data exploration); this interest can be expressed through a query. The traditional batch approach is far from optimal, since it requires to perform ER on the whole dataset before executing a query on its cleaned version, performing a huge number of useless comparisons. This causes a waste of time, resources and money. Proposed solutions to this problem follow a query-driven approach (perform ER only on the useful data) or a progressive one (the entities in the result are emitted as soon as they are solved), but these two aspects have never been reconciled. This paper introduces BrewER framework, which allows to execute clean queries on dirty datasets in a query-driven and progressive way, thanks to a preliminary filtering and an iteratively managed sorted list that defines emission priority. Early results obtained by first BrewER prototype on real-world datasets from different domains confirm the benefits of this combined solution, paving the way for a new and more comprehensive approach to ER.
Iris type:
Relazione in Atti di Convegno
Keywords:
Entity resolution, Data integration, Data cleaning
List of contributors:
Zecchini, Luca
Book title:
Similarity Search and Applications - 14th International Conference, SISAP 2021, Dortmund, Germany, September 29 - October 1, 2021, Proceedings
Published in: