Data civilizer 2.0: A holistic framework for data preparation and analytics
Contributo in Atti di convegno
Data di Pubblicazione:
2019
Citazione:
Data civilizer 2.0: A holistic framework for data preparation and analytics / Rezig, E.K., Cao, L., Stonebraker, M., Simonini, G., Tao, W., Madden, S., Ouzzani, M., Tang, N., Elmagarmid, A.K.. - In: PROCEEDINGS OF THE VLDB ENDOWMENT. - ISSN 2150-8097. - 12:12(2019), pp. 1954-1957. (45th International Conference on Very Large Data Bases, VLDB 2019 usa 2017) [10.14778/3352063.3352108].
Abstract:
Data scientists spend over 80% of their time (1) parameter-tuning machine learning models and (2) iterating between data cleaning and machine learning model execution. While there are existing efforts to support the first requirement, there is currently no integrated workflow system that couples data cleaning and machine learning development. The previous version of Data Civilizer was geared towards data cleaning and discovery using a set of pre-defined tools. In this paper, we introduce Data Civilizer 2.0, an end-to-end workflow system satisfying both requirements. In addition, this system also supports a sophisticated data debugger and a workflow visualization system. In this demo, we will show how we used Data Civilizer 2.0 to help scientists at the Massachusetts General Hospital build their cleaning and machine learning pipeline on their 30TB brain activity dataset.
Tipologia CRIS:
Relazione in Atti di Convegno
Elenco autori:
Rezig, E. K.; Cao, L.; Stonebraker, M.; Simonini, G.; Tao, W.; Madden, S.; Ouzzani, M.; Tang, N.; Elmagarmid, A. K.
Link alla scheda completa:
Link al Full Text:
Titolo del libro:
Proceedings of the VLDB Endowment
Pubblicato in: