AITopics

2503.15108

Country: Europe > France (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceFeb-14-2025

Demographic User Modeling for Social Robotics with Multimodal Pre-trained Models

Rahimi, Hamed, Abrini, Mouad, Khoramshahi, Mahdi, Chetouani, Mohamed

This paper investigates the performance of multimodal pre-trained models in user profiling tasks based on visual-linguistic demographic data. These models are critical for adapting to the needs and preferences of human users in social robotics, thereby providing personalized responses and enhancing interaction quality. First, we introduce two datasets specifically curated to represent demographic characteristics derived from user facial images. Next, we evaluate the performance of a prominent contrastive multimodal pre-trained model, CLIP, on these datasets, both in its out-of-the-box state and after fine-tuning. Initial results indicate that CLIP performs suboptimal in matching images to demographic descriptions without fine-tuning. Although fine-tuning significantly enhances its predictive capacity, the model continues to exhibit limitations in effectively generalizing subtle demographic nuances. To address this, we propose adopting a masked image modeling strategy to improve generalization and better capture subtle demographic attributes. This approach offers a pathway for enhancing demographic sensitivity in multimodal user modeling tasks.

large language model, machine learning, natural language, (17 more...)

2502.10642

Country: Europe > France (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

arXiv.org Artificial IntelligenceFeb-14-2025

USER-VLM 360: Personalized Vision Language Models with User-aware Tuning for Social Human-Robot Interactions

Rahimi, Hamed, Bahaj, Adil, Abrini, Mouad, Khoramshahi, Mahdi, Ghogho, Mounir, Chetouani, Mohamed

The integration of vision-language models into robotic systems constitutes a significant advancement in enabling machines to interact with their surroundings in a more intuitive manner. While VLMs offer rich multimodal reasoning, existing approaches lack user-specific adaptability, often relying on generic interaction paradigms that fail to account for individual behavioral, contextual, or socio-emotional nuances. When customization is attempted, ethical concerns arise from unmitigated biases in user data, risking exclusion or unfair treatment. To address these dual challenges, we propose User-VLM 360{\deg}, a holistic framework integrating multimodal user modeling with bias-aware optimization. Our approach features: (1) user-aware tuning that adapts interactions in real time using visual-linguistic signals; (2) bias mitigation via preference optimization; and (3) curated 360{\deg} socio-emotive interaction datasets annotated with demographic, emotion, and relational metadata. Evaluations across eight benchmarks demonstrate state-of-the-art results: +35.3% F1 in personalized VQA, +47.5% F1 in facial features understanding, 15% bias reduction, and 30X speedup over baselines. Ablation studies confirm component efficacy, and deployment on the Pepper robot validates real-time adaptability across diverse users. We open-source parameter-efficient 3B/10B models and an ethical verification framework for responsible adaptation.

large language model, machine learning, natural language, (17 more...)

2502.10636

Country:

Europe > Spain (0.14)
Europe > France (0.14)
Africa > Middle East > Morocco (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJan-29-2025

Inferring Implicit Goals Across Differing Task Models

Tulli, Silvia, Vasileiou, Stylianos Loukas, Chetouani, Mohamed, Sreedharan, Sarath

This should be all well and good, provided value-aligned behavior is to not only account for the human bottleneck states are also bottleneck states for the the specified user objectives but also any implicit agent. Otherwise, the agent must make an effort to figure out or unspecified user requirements. The existence what the user's underlying subgoals may be. of such implicit requirements could be particularly To see how such problems may arise, consider an agent common in settings where the user's understanding tasked with guiding a tourist to a famous art museum. The of the task model may differ from the agent's estimate tourist simply says, "Get me a plan to get to the art museum," of the model. Under this scenario, the user unaware of the city's metro system and expecting an may incorrectly expect some agent behavior to be above-ground route passing certain landmarks. The agent, inevitable or guaranteed. This paper addresses such however, might plan a route using the metro system. For the expectation mismatch in the presence of differing agent's metro route, bottlenecks migh include entering the models by capturing the possibility of unspecified metro, making transfers, and exiting at the correct station.

artificial intelligence, machine learning, subgoal, (18 more...)

2501.17704

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

arXiv.org Artificial IntelligenceApr-7-2024

Legibot: Generating Legible Motions for Service Robots Using Cost-Based Local Planners

Amirian, Javad, Abrini, Mouad, Chetouani, Mohamed

With the increasing presence of social robots in various environments and applications, there is an increasing need for these robots to exhibit socially-compliant behaviors. Legible motion, characterized by the ability of a robot to clearly and quickly convey intentions and goals to the individuals in its vicinity, through its motion, holds significant importance in this context. This will improve the overall user experience and acceptance of robots in human environments. In this paper, we introduce a novel approach to incorporate legibility into local motion planning for mobile robots. This can enable robots to generate legible motions in real-time and dynamic environments. To demonstrate the effectiveness of our proposed methodology, we also provide a robotic stack designed for deploying legibility-aware motion planning in a social robot, by integrating perception and localization components.

artificial intelligence, observer, robot, (17 more...)

2404.051

Country: Europe > France (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Consumer Products & Services > Restaurants (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.92)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.71)

arXiv.org Artificial IntelligenceSep-29-2023

Utility-based Adaptive Teaching Strategies using Bayesian Theory of Mind

Grislain, Clémence, Caselles-Dupré, Hugo, Sigaud, Olivier, Chetouani, Mohamed

Good teachers always tailor their explanations to the learners. Cognitive scientists model this process under the rationality principle: teachers try to maximise the learner's utility while minimising teaching costs. To this end, human teachers seem to build mental models of the learner's internal state, a capacity known as Theory of Mind (ToM). Inspired by cognitive science, we build on Bayesian ToM mechanisms to design teacher agents that, like humans, tailor their teaching strategies to the learners. Our ToM-equipped teachers construct models of learners' internal states from observations and leverage them to select demonstrations that maximise the learners' rewards while minimising teaching costs. Our experiments in simulated environments demonstrate that learners taught this way are more efficient than those taught in a learner-agnostic way. This effect gets stronger when the teacher's model of the learner better aligns with the actual learner's state, either using a more accurate prior or after accumulating observations of the learner's behaviour. This work is a first step towards social machines that teach us and each other, see https://teacher-with-tom.github.io.

artificial intelligence, learner, machine learning, (17 more...)

2309.17275

Country: Europe > France (0.14)

Genre:

Research Report > Experimental Study (0.70)
Research Report > New Finding (0.47)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.66)

Enhancing Agent Communication and Learning through Action and Language

Caselles-Dupré, Hugo, Sigaud, Olivier, Chetouani, Mohamed

action and language, enhancing agent communication and learning

We introduce a novel category of GC-agents capable of functioning as both teachers and learners. Leveraging action-based demonstrations and language-based instructions, these agents enhance communication efficiency. We investigate the incorporation of pedagogy and pragmatism, essential elements in human communication and goal achievement, enhancing the agents' teaching and learning capabilities. Furthermore, we explore the impact of combining communication modes (action and language) on learning outcomes, highlighting the benefits of a multi-modal approach.

2308.10842

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.40)

Pragmatically Learning from Pedagogical Demonstrations in Multi-Goal Environments

Caselles-Dupré, Hugo, Sigaud, Olivier, Chetouani, Mohamed

Learning from demonstration methods usually leverage close to optimal demonstrations to accelerate training. By contrast, when demonstrating a task, human teachers deviate from optimal demonstrations and pedagogically modify their behavior by giving demonstrations that best disambiguate the goal they want to demonstrate. Analogously, human learners excel at pragmatically inferring the intent of the teacher, facilitating communication between the two agents. These mechanisms are critical in the few demonstrations regime, where inferring the goal is more difficult. In this paper, we implement pedagogy and pragmatism mechanisms by leveraging a Bayesian model of Goal Inference from demonstrations (BGI). We highlight the benefits of this model in multi-goal teacher-learner setups with two artificial agents that learn with goal-conditioned Reinforcement Learning. We show that combining BGI-agents (a pedagogical teacher and a pragmatic learner) results in faster learning and reduced goal ambiguity over standard learning from demonstrations, especially in the few demonstrations regime. We provide the code for our experiments (https://github.com/Caselles/NeurIPS22-demonstrations-pedagogy-pragmatism), as well as an illustrative video explaining our approach (https://youtu.be/V4n16IjkNyw).

artificial intelligence, machine learning, pedagogical demonstration, (2 more...)

2206.04546

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Overcoming Referential Ambiguity in Language-Guided Goal-Conditioned Reinforcement Learning

Caselles-Dupré, Hugo, Sigaud, Olivier, Chetouani, Mohamed

artificial intelligence, language-guided goal-conditioned reinforcement learning, overcoming referential ambiguity

Teaching an agent to perform new tasks using natural language can easily be hindered by ambiguities in interpretation. When a teacher provides an instruction to a learner about an object by referring to its features, the learner can misunderstand the teacher's intentions, for instance if the instruction ambiguously refer to features of the object, a phenomenon called referential ambiguity. We study how two concepts derived from cognitive sciences can help resolve those referential ambiguities: pedagogy (selecting the right instructions) and pragmatism (learning the preferences of the other agents using inductive reasoning). We apply those ideas to a teacher/learner setup with two artificial agents on a simulated robotic task (block-stacking). We show that these concepts improve sample efficiency for training the learner.

2209.12758

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Pedagogical Demonstrations and Pragmatic Learning in Artificial Tutor-Learner Interactions

Caselles-Dupré, Hugo, Chetouani, Mohamed, Sigaud, Olivier

artificial intelligence, artificial tutor-learner interaction, pedagogical demonstration and pragmatic learning

When demonstrating a task, human tutors pedagogically modify their behavior by either "showing" the task rather than just "doing" it (exaggerating on relevant parts of the demonstration) or by giving demonstrations that best disambiguate the communicated goal. Analogously, human learners pragmatically infer the communicative intent of the tutor: they interpret what the tutor is trying to teach them and deduce relevant information for learning. Without such mechanisms, traditional Learning from Demonstration (LfD) algorithms will consider such demonstrations as sub-optimal. In this paper, we investigate the implementation of such mechanisms in a tutor-learner setup where both participants are artificial agents in an environment with multiple goals. Using pedagogy from the tutor and pragmatism from the learner, we show substantial improvements over standard learning from demonstrations.

2203.00111

Genre: Research Report (0.40)

Industry: Education (0.53)

Technology: Information Technology > Artificial Intelligence (0.73)