Le Havre
Deep Reinforcement Learning with anticipatory reward in LSTM for Collision Avoidance of Mobile Robots
Poulet, Olivier, Guinand, Frédéric, Guérin, François
This article proposes a collision risk anticipation method based on short-term prediction of the agents position. A Long Short-Term Memory (LSTM) model, trained on past trajectories, is used to estimate the next position of each robot. This prediction allows us to define an anticipated collision risk by dynamically modulating the reward of a Deep Q-Learning Network (DQN) agent. The approach is tested in a constrained environment, where two robots move without communication or identifiers. Despite a limited sampling frequency (1 Hz), the results show a significant decrease of the collisions number and a stability improvement. The proposed method, which is computationally inexpensive, appears particularly attractive for implementation on embedded systems.
From thermodynamics to protein design: Diffusion models for biomolecule generation towards autonomous protein engineering
Li, Wen-ran, Cadet, Xavier F., Medina-Ortiz, David, Davari, Mehdi D., Sowdhamini, Ramanathan, Damour, Cedric, Li, Yu, Miranville, Alain, Cadet, Frederic
Protein design with desirable properties has been a significant challenge for many decades. Generative artificial intelligence is a promising approach and has achieved great success in various protein generation tasks. Notably, diffusion models stand out for their robust mathematical foundations and impressive generative capabilities, offering unique advantages in certain applications such as protein design. In this review, we first give the definition and characteristics of diffusion models and then focus on two strategies: Denoising Diffusion Probabilistic Models and Score-based Generative Models, where DDPM is the discrete form of SGM. Furthermore, we discuss their applications in protein design, peptide generation, drug discovery, and protein-ligand interaction. Finally, we outline the future perspectives of diffusion models to advance autonomous protein design and engineering. The E(3) group consists of all rotations, reflections, and translations in three-dimensions. The equivariance on the E(3) group can keep the physical stability of the frame of each amino acid as much as possible, and we reflect on how to keep the diffusion model E(3) equivariant for protein generation.
EAMA : Entity-Aware Multimodal Alignment Based Approach for News Image Captioning
Zhang, Junzhe, Zhang, Huixuan, Yin, Xunjian, Wan, Xiaojun
News image captioning requires model to generate an informative caption rich in entities, with the news image and the associated news article. Though Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in addressing various vision-language tasks, our research finds that current MLLMs still bear limitations in handling entity information on news image captioning task. Besides, while MLLMs have the ability to process long inputs, generating high-quality news image captions still requires a trade-off between sufficiency and conciseness of textual input information. To explore the potential of MLLMs and address problems we discovered, we propose : an Entity-Aware Multimodal Alignment based approach for news image captioning. Our approach first aligns the MLLM through Balance Training Strategy with two extra alignment tasks: Entity-Aware Sentence Selection task and Entity Selection task, together with News Image Captioning task, to enhance its capability in handling multimodal entity information. The aligned MLLM will utilizes the additional entity-related information it explicitly extracts to supplement its textual input while generating news image captions. Our approach achieves better results than all previous models in CIDEr score on GoodNews dataset (72.33 -> 88.39) and NYTimes800k dataset (70.83 -> 85.61).
Customizable Avatars with Dynamic Facial Action Coded Expressions (CADyFACE) for Improved User Engagement
Witherow, Megan A., Butler, Crystal, Shields, Winston J., Ilgin, Furkan, Diawara, Norou, Keener, Janice, Harrington, John W., Iftekharuddin, Khan M.
Customizable 3D avatar-based facial expression stimuli may improve user engagement in behavioral biomarker discovery and therapeutic intervention for autism, Alzheimer's disease, facial palsy, and more. However, there is a lack of customizable avatar-based stimuli with Facial Action Coding System (FACS) action unit (AU) labels. Therefore, this study focuses on (1) FACS-labeled, customizable avatar-based expression stimuli for maintaining subjects' engagement, (2) learning-based measurements that quantify subjects' facial responses to such stimuli, and (3) validation of constructs represented by stimulus-measurement pairs. We propose Customizable Avatars with Dynamic Facial Action Coded Expressions (CADyFACE) labeled with AUs by a certified FACS expert. To measure subjects' AUs in response to CADyFACE, we propose a novel Beta-guided Correlation and Multi-task Expression learning neural network (BeCoME-Net) for multi-label AU detection. The beta-guided correlation loss encourages feature correlation with AUs while discouraging correlation with subject identities for improved generalization. We train BeCoME-Net for unilateral and bilateral AU detection and compare with state-of-the-art approaches. To assess construct validity of CADyFACE and BeCoME-Net, twenty healthy adult volunteers complete expression recognition and mimicry tasks in an online feasibility study while webcam-based eye-tracking and video are collected. We test validity of multiple constructs, including face preference during recognition and AUs during mimicry.
Deep Adaptation of Adult-Child Facial Expressions by Fusing Landmark Features
Witherow, Megan A., Samad, Manar D., Diawara, Norou, Bar, Haim Y., Iftekharuddin, Khan M.
Imaging of facial affects may be used to measure psychophysiological attributes of children through their adulthood, especially for monitoring lifelong conditions like Autism Spectrum Disorder. Deep convolutional neural networks have shown promising results in classifying facial expressions of adults. However, classifier models trained with adult benchmark data are unsuitable for learning child expressions due to discrepancies in psychophysical development. Similarly, models trained with child data perform poorly in adult expression classification. We propose domain adaptation to concurrently align distributions of adult and child expressions in a shared latent space to ensure robust classification of either domain. Furthermore, age variations in facial images are studied in age-invariant face recognition yet remain unleveraged in adult-child expression classification. We take inspiration from multiple fields and propose deep adaptive FACial Expressions fusing BEtaMix SElected Landmark Features (FACE-BE-SELF) for adult-child facial expression classification. For the first time in the literature, a mixture of Beta distributions is used to decompose and select facial features based on correlations with expression, domain, and identity factors. We evaluate FACE-BE-SELF on two pairs of adult-child data sets. Our proposed FACE-BE-SELF approach outperforms adult-child transfer learning and other baseline domain adaptation methods in aligning latent representations of adult and child expressions.
Perspectives and Ethics of the Autonomous Artificial Thinking Systems
The feasibility of autonomous artificial thinking systems needs to compare the way th e human beings acquire their information and develops the thought with the current capacities of the autonomous information systems. Our model uses four hierarchies: the hierarchy of information systems, the cognitive hierarchy, the linguistic hierarchy and t he digital informative hierarchy that combines artificial intelligence, the power of computers models, methods and tools to develop autonomous information systems. The question of the capability of autonomous system to provide a form of artificial thought arises with the ethical consequences on the social life and the perspec tive of transhumanism.