Skip to Main Content (Press Enter)

Logo UNIMORE
  • ×
  • Home
  • Degree programmes
  • Modules
  • Jobs
  • People
  • Research Outputs
  • Academic units
  • Third Mission
  • Projects
  • Skills

UNI-FIND
Logo UNIMORE

|

UNI-FIND

unimore.it
  • ×
  • Home
  • Degree programmes
  • Modules
  • Jobs
  • People
  • Research Outputs
  • Academic units
  • Third Mission
  • Projects
  • Skills
  1. Projects

Beyond Parliament: AI-Enhanced Multilingual Corpus Using Innovative Methodology for Non-Institutional Political Speeches in German, French, Spanish and Italian

Project
The research project aims to conduct an in-depth study of the scientific, technical, and copyright aspects related to the creation of a multilingual corpus. This planned corpus, utilising the latest advancements in Artificial Intelligence, will provide transcripts in the field of oral political discourse and serve as a basis for international studies in political linguistics. The project focuses on building a corpus and analysing non-institutional political speeches in German, French, Spanish and Italian. It aims to implement a methodology using web tools, Automatic Speech Recognition (ASR), and AI transcription systems to orthographically transcribe, segment, annotate, and analyse the collected speeches. The prosodic analysis of the speeches will be also carried out, testing methods that can allow to focus on the relation between prosody and other linguistic aspects such as lexical choices, metaphorical instances and so on. The outcomes of the prosodic analysis will be available in the corpus. By ‘non-institutional’ speeches, we refer to speeches not delivered in parliamentary settings, but in venues such as political conventions and public speeches during election campaigns. It seems necessary to focus on non-institutional speeches for several reasons: First, parliamentary speeches are usually transcribed with the help of ASR and AI systems and then corrected by stenographers, thus excluding typical idiosyncrasies of orality. Moreover, these speeches are often read aloud in sessions, making them more akin to written language than spoken language. Importantly, non-institutional speeches have not yet been deeply studied by scholars, possibly due to transcription challenges. Videos of these speeches are publicly available on platforms like YouTube, but unlike speeches in parliamentary settings, often these recordings suffer from lower audio quality due to background noise, making transcription challenging. Even if the corpus consists solely of transcripts, crucial visual context information can be captured in the transcription tools' comment section. Documenting non-verbal cues alongside audio data is essential for analysing prosodic features like intonation, stress, and rhythm. By considering both verbal and non-verbal aspects, researchers can achieve more nuanced interpretations, enriching the overall understanding of political speeches. This project aims to implement a tailored methodology for analysing non-institutional speeches. The outcome will be a corpus of speeches, made available for politolinguistic studies and even for more general discourse analysis. This corpus will present a model that combines orthographic transcripts with further analysis, including prosody. Additionally, a collaborative platform will be developed, where the transcripts of German, French, Spanish and Italian speeches delivered at election campaigns will be available alongside the references to the corresponding video and audio sources.
  • Overview
  • Skills

Overview

Contributor (4)

GANNUSCIO Vincenzo   Scientific Manager  
KAUNZNER ULRIKE ADELHEID   Participant  
MODENA SILVIA   Participant  
ZARCO Gloria Julieta   Participant  

Leading department

Department of Studies on Language and Culture   Principale  

Term type

FAR 2024 Progetti interdisciplinari - Linea UNIMORE

Financier

ATENEO
Funding Organization

Partner

Università degli Studi di MODENA e REGGIO EMILIA

Total Contribution (assigned) University (EUR)

59,556€

Date/time interval

December 2, 2024 - December 1, 2026

Project duration

24 months

Skills

Concepts (6)


SH4_11 - Pragmatics, sociolinguistics, linguistic anthropology, discourse analysis - (2024)

SH4_9 - Theoretical linguistics; computational linguistics - (2024)

Settore FRAN-01/B - Lingua, traduzione e linguistica francese

Settore GERM-01/C - Lingua, traduzione e linguistica tedesca

Settore LIFI-01/A - Linguistica italiana

Settore SPAN-01/C - Lingua, traduzione e linguistica spagnola
  • Use of cookies

Powered by VIVO | Designed by Cineca | 26.4.4.0