Audiovisual documents are a vital resource for future generations to preserve and recollect their past cultures, beliefs, and customs. A famous example is the LUCE Institute Archive, which provided an invaluable historical and cultural cross-section of the first half of the 20th century by preserving more than 77,000 online-accessible digitized films. Given the Cambrian explosion in the production of audiovisual documents witnessed by the last century, cultural heritage preservation faces new challenges in managing ever-larger digitized audiovisual documents and keeping them accessible. The RAI Italian TV digital archive contains more than 1.3 million hours of recorded TV and radio programs and 800,000 movies dating back to 1954 that still need to be completely cataloged. Overall, the rate of production of audiovisuals exceeds by far the resources required to build and maintain accessible archives. In this context, Artificial Intelligence models can help to increase the accessibility of audiovisual archives by automatically understanding their content, extracting information, and indexing them to be easily searchable. Existing methods can analyze visual content and retrieve knowledge based on user-defined queries, but are limited to analyzing static images and to recognizing and describing generic content which belongs to the English/American culture. This prevents their applicability to audiovisual and historical archives. The MUCES project will make a radical change by investigating and developing innovative Deep Learning models to make unlabeled audiovisual archives of the Italian cultural patrimony searchable by natural language and exemplar queries in a personalized manner. The project will develop, train, and publicly release models which are fully multi-modal and natively designed to work on videos and to exploit its inherent multi-modal nature by jointly considering motion, appearance, and audio; personalizable and adaptable to long-tail concepts with scarce annotation, making them suitable to deal with concepts from the Italian culture and specific to the cultural heritage domain; deployable in large-scale scenarios and designed to work efficiently on huge archives containing millions of videos. At the core of the project lies a new unifying synergy between cutting-edge research in Computer Vision, Machine Learning, and large-scale Content-Based Retrieval. The project brings together the research experiences and expertise of two internationally-recognized research teams: the AImageLab research group at UNIMORE and the Artificial Intelligence for Media and Humanities laboratory at ISTI CNR, encompassing years of expertise in Multimedia, Similarity Search, and Computer Vision. The project proposes foundational research with direct practical and industrial exploitation. We foresee a significant benefit for society as well as in paving the way to new research directions in several areas of AI.