Synthetic Data: A Solution to Medical Imaging Limitations

Progetto

The creation of image datasets for training deep neural networks mainly consists of data acquisition, data selection, and data labeling. Data acquisition is often limited, and data delivery is impaired by privacy regulations, especially in the medical imaging domain. Another major obstacle is the costly and time-intensive data labeling, which often requires medical professionals. Synthetic data may offer numerous benefits, including the ability to augment datasets with diverse and realistic images where real data is limited [1,2]. This reduces the costs and labor associated with annotating real images. Synthetic data also provides an ethical alternative to using sensitive patient data without compromising patient privacy or requiring ad hoc ethical committee approval for any specific project. Our project aims to design, implement, and test artificial intelligence tools for the massive generation of realistic synthetic data with a threefold objective: 1. Enriching existing datasets with the final goal of enhancing the performance of machine learning models in the field of medical imaging; 2. Providing a cost-effective alternative to the labor-intensive task of collecting and annotating real medical data by generating pairs based on user-defined classes (image, label); 3. Generating synthetic datasets that mimic the characteristics of real-world medical data, totally preserving the patient's privacy. In our project, we focus on three different data modalities: 3D medical images obtained from Cone Beam Computed Tomography (CBCT), high-resolution pyramidal images obtained with microscopy (confocal images and WSI), and mammographic (X-rays) images. In previous scientific collaborations, our groups have developed (i) deep learning algorithms to enhance 2D annotations of the Inferior Alveolar Canal (an osseous canal crossing the mandible) in CBCT scans, making them suitable for the training of 3D segmentation models [3], (ii) and generative models for the creation of synthetic pairs of dermoscopic images and segmentation masks [1]. In this proposal, the application of generative algorithms will be pushed further by designing algorithms that are able to generate an entire set of CBCT scans paired with ground-truth labels. In addition to 3D volumes, the algorithms will be tested in high-resolution image scenarios, specifically targeting WSI histological images and confocal data, and on X-rays data. Specifically, we intend to collect a significant amount of real data in the context of maxillofacial surgery, prostate cancer, and breast cancer and develop machine-learning techniques for the generation of synthetic datasets. The quality of generated data will be both qualitatively evaluated by clinical experts in the field and quantitatively assessed by measuring the performance of state-of-the-art automatic classification and segmentation algorithms when trained on such generated data, with the final goal of employing such models in daily clinical practice.

Partecipanti (7)

BOLELLI FEDERICO Responsabile scientifico

ANESI Alexandre Partecipante

AZZONI Paola Partecipante

BERTONI Laura Partecipante

Besutti Giulia Partecipante

MARCHESINI KEVIN Partecipante

SALVATORI Roberta Partecipante

Dipartimenti coinvolti

Dipartimento di Ingegneria "Enzo Ferrari" Principale

Tipo

FAR 2024 Progetti interdisciplinari - Linea FOMO

Finanziatore

FONDAZIONE DI MODENA

Ente Finanziatore

Partner

Università degli Studi di MODENA e REGGIO EMILIA

Contributo Totale (assegnato) Ateneo (EURO)

70.400€

Periodo di attività

Dicembre 2, 2024 - Dicembre 1, 2026

Durata progetto

24 mesi

Settori (3)

Settore ERC

LS7_1 - Medical imaging for prevention, diagnosis and monitoring of diseases - (2022)

PE6_7 - Artificial intelligence, intelligent systems, natural language processing - (2024)

Settore disciplinare

Settore IINF-05/A - Sistemi di elaborazione delle informazioni

Pubblicazioni (2)

State-of-the-art Review and Benchmarking of Barcode Localization Methods

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE

2025

Articolo

Open Access

Altmetric disabilitato. Abilitalo su "Utilizzo dei cookie"

Segmenting Maxillofacial Structures in CBCT Volumes