AITopics | Rota, Paolo

Collaborating Authors

Rota, Paolo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automatic benchmarking of large multimodal models via iterative experiment programming

Conti, Alessandro, Fini, Enrico, Rota, Paolo, Wang, Yiming, Mancini, Massimiliano, Ricci, Elisa

arXiv.org Artificial IntelligenceJun-18-2024

Assessing the capabilities of large multimodal models (LMMs) often requires the creation of ad-hoc evaluations. Currently, building new benchmarks requires tremendous amounts of manual work for each specific analysis. This makes the evaluation process tedious and costly. In this paper, we present APEx, Automatic Programming of Experiments, the first framework for automatic benchmarking of LMMs. Given a research question expressed in natural language, APEx leverages a large language model (LLM) and a library of pre-specified tools to generate a set of experiments for the model at hand, and progressively compile a scientific report. The report drives the testing procedure: based on the current status of the investigation, APEx chooses which experiments to perform and whether the results are sufficient to draw conclusions. Finally, the LLM refines the report, presenting the results to the user in natural language. Thanks to its modularity, our framework is flexible and extensible as new tools become available. Empirically, APEx reproduces the findings of existing studies while allowing for arbitrary analyses and hypothesis testing.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.12321

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adv-KD: Adversarial Knowledge Distillation for Faster Diffusion Sampling

Mekonnen, Kidist Amde, Dall'Asen, Nicola, Rota, Paolo

arXiv.org Artificial IntelligenceMay-31-2024

Diffusion Probabilistic Models (DPMs) have emerged as a powerful class of deep generative models, achieving remarkable performance in image synthesis tasks. However, these models face challenges in terms of widespread adoption due to their reliance on sequential denoising steps during sample generation. This dependence leads to substantial computational requirements, making them unsuitable for resource-constrained or real-time processing systems. To address these challenges, we propose a novel method that integrates denoising phases directly into the model's architecture, thereby reducing the need for resource-intensive computations. Our approach combines diffusion models with generative adversarial networks (GANs) through knowledge distillation, enabling more efficient training and evaluation. By utilizing a pre-trained diffusion model as a teacher model, we train a student model through adversarial learning, employing layerwise transformations for denoising and submodules for predicting the teacher model's output at various points in time. This integration significantly reduces the number of parameters and denoising steps required, leading to improved sampling speed at test time. We validate our method with extensive experiments, demonstrating comparable performance with reduced computational requirements compared to existing approaches. By enabling the deployment of diffusion models on resource-constrained devices, our research mitigates their computational burden and paves the way for wider accessibility and practical use across the research community and end-users. Our code is publicly available at https://github.com/kidist-amde/Adv-KD

artificial intelligence, machine learning, student model, (18 more...)

arXiv.org Artificial Intelligence

2405.20675

Country: Europe > Italy (0.28)

Genre: Research Report > Promising Solution (0.48)

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Socially Pertinent Robots in Gerontological Healthcare

Alameda-Pineda, Xavier, Addlesee, Angus, García, Daniel Hernández, Reinke, Chris, Arias, Soraya, Arrigoni, Federica, Auternaud, Alex, Blavette, Lauriane, Beyan, Cigdem, Camara, Luis Gomez, Cohen, Ohad, Conti, Alessandro, Dacunha, Sébastien, Dondrup, Christian, Ellinson, Yoav, Ferro, Francesco, Gannot, Sharon, Gras, Florian, Gunson, Nancie, Horaud, Radu, D'Incà, Moreno, Kimouche, Imad, Lemaignan, Séverin, Lemon, Oliver, Liotard, Cyril, Marchionni, Luca, Moradi, Mordehay, Pajdla, Tomas, Pino, Maribel, Polic, Michal, Py, Matthieu, Rado, Ariel, Ren, Bin, Ricci, Elisa, Rigaud, Anne-Sophie, Rota, Paolo, Romeo, Marta, Sebe, Nicu, Sieińska, Weronika, Tandeitnik, Pinchas, Tonini, Francesco, Turro, Nicolas, Wintz, Timothée, Yu, Yanchao

arXiv.org Artificial IntelligenceApr-11-2024

Despite the many recent achievements in developing and deploying social robotics, there are still many underexplored environments and applications for which systematic evaluation of such systems by end-users is necessary. While several robotic platforms have been used in gerontological healthcare, the question of whether or not a social interactive robot with multi-modal conversational capabilities will be useful and accepted in real-life facilities is yet to be answered. This paper is an attempt to partially answer this question, via two waves of experiments with patients and companions in a day-care gerontological facility in Paris with a full-sized humanoid robot endowed with social and conversational interaction capabilities. The software architecture, developed during the H2020 SPRING project, together with the experimental protocol, allowed us to evaluate the acceptability (AES) and usability (SUS) with more than 60 end-users. Overall, the users are receptive to this technology, especially when the robot perception and action skills are robust to environmental clutter and flexible to handle a plethora of different interactions.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2404.0756

Country:

North America > Canada > Ontario (0.14)
Europe > France > Île-de-France > Paris > Paris (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.93)
(4 more...)

Add feedback

Rotation Synchronization via Deep Matrix Factorization

Tejus, Gk, Zara, Giacomo, Rota, Paolo, Fusiello, Andrea, Ricci, Elisa, Arrigoni, Federica

arXiv.org Artificial IntelligenceMay-9-2023

In this paper we address the rotation synchronization problem, where the objective is to recover absolute rotations starting from pairwise ones, where the unknowns and the measures are represented as nodes and edges of a graph, respectively. This problem is an essential task for structure from motion and simultaneous localization and mapping. We focus on the formulation of synchronization via neural networks, which has only recently begun to be explored in the literature. Inspired by deep matrix completion, we express rotation synchronization in terms of matrix factorization with a deep neural network. Our formulation exhibits implicit regularization properties and, more importantly, is unsupervised, whereas previous deep approaches are supervised. Our experiments show that we achieve comparable accuracy to the closest competitors in most scenes, while working under weaker assumptions.

artificial intelligence, machine learning, rotation, (19 more...)

arXiv.org Artificial Intelligence

2305.05268

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Deep Unsupervised Key Frame Extraction for Efficient Video Classification

Tang, Hao, Ding, Lei, Wu, Songsong, Ren, Bin, Sebe, Nicu, Rota, Paolo

arXiv.org Artificial IntelligenceNov-12-2022

Video processing and analysis have become an urgent task since a huge amount of videos (e.g., Youtube, Hulu) are uploaded online every day. The extraction of representative key frames from videos is very important in video processing and analysis since it greatly reduces computing resources and time. Although great progress has been made recently, large-scale video classification remains an open problem, as the existing methods have not well balanced the performance and efficiency simultaneously. To tackle this problem, this work presents an unsupervised method to retrieve the key frames, which combines Convolutional Neural Network (CNN) and Temporal Segment Density Peaks Clustering (TSDPC). The proposed TSDPC is a generic and powerful framework and it has two advantages compared with previous works, one is that it can calculate the number of key frames automatically. The other is that it can preserve the temporal information of the video. Thus it improves the efficiency of video classification. Furthermore, a Long Short-Term Memory network (LSTM) is added on the top of the CNN to further elevate the performance of classification. Moreover, a weight fusion strategy of different input networks is presented to boost the performance. By optimizing both video classification and key frame extraction simultaneously, we achieve better classification performance and higher efficiency. We evaluate our method on two popular datasets (i.e., HMDB51 and UCF101) and the experimental results consistently demonstrate that our strategy achieves competitive performance and efficiency compared with the state-of-the-art approaches.

artificial intelligence, key frame, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2211.06742

Country:

North America > United States (0.28)
Europe > Switzerland (0.28)

Genre: Research Report > Promising Solution (0.66)

Industry:

Media > Television (0.34)
Media > Film (0.34)
Leisure & Entertainment (0.34)
Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cluster-level pseudo-labelling for source-free cross-domain facial expression recognition

Conti, Alessandro, Rota, Paolo, Wang, Yiming, Ricci, Elisa

arXiv.org Artificial IntelligenceOct-11-2022

Automatically understanding emotions from visual data is a fundamental task for human behaviour understanding. While models devised for Facial Expression Recognition (FER) have demonstrated excellent performances on many datasets, they often suffer from severe performance degradation when trained and tested on different datasets due to domain shift. In addition, as face images are considered highly sensitive data, the accessibility to large-scale datasets for model training is often denied. In this work, we tackle the above-mentioned problems by proposing the first Source-Free Unsupervised Domain Adaptation (SFUDA) method for FER. Our method exploits self-supervised pretraining to learn good feature representations from the target data and proposes a novel and robust cluster-level pseudo-labelling strategy that accounts for in-cluster statistics. We validate the effectiveness of our method in four adaptation setups, proving that it consistently outperforms existing SFUDA methods when applied to FER, and is on par with methods addressing FER in the UDA setting.

artificial intelligence, facial expression recognition, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2210.05246

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)

Add feedback