AITopics | Mathur, Leena

Collaborating Authors

Mathur, Leena

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Social Genome: Grounded Social Reasoning Abilities of Multimodal Models

Mathur, Leena, Qian, Marian, Liang, Paul Pu, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceMar-6-2025

Social reasoning abilities are crucial for AI systems to effectively interpret and respond to multimodal human communication and interaction within social contexts. We introduce Social Genome, the first benchmark for fine-grained, grounded social reasoning abilities of multimodal models. Social Genome contains 272 videos of interactions and 1,486 human-annotated reasoning traces related to inferences about these interactions. These traces contain 5,777 reasoning steps that reference evidence from visual cues, verbal cues, vocal cues, and external knowledge (contextual knowledge external to videos). Social Genome is also the first modeling challenge to study external knowledge in social reasoning. Social Genome computes metrics to holistically evaluate semantic and structural qualities of model-generated social reasoning traces. We demonstrate the utility of Social Genome through experiments with state-of-the-art models, identifying performance gaps and opportunities for future research to improve the grounded social reasoning abilities of multimodal models.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.15109

Country:

North America > United States > Massachusetts (0.14)
Europe > Middle East > Malta (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

HEMM: Holistic Evaluation of Multimodal Foundation Models

Liang, Paul Pu, Goindani, Akshay, Chafekar, Talha, Mathur, Leena, Yu, Haofei, Salakhutdinov, Ruslan, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceJul-3-2024

Multimodal foundation models that can holistically process text alongside images, video, audio, and other sensory modalities are increasingly used in a variety of real-world applications. However, it is challenging to characterize and study progress in multimodal foundation models, given the range of possible modeling decisions, tasks, and domains. In this paper, we introduce Holistic Evaluation of Multimodal Models (HEMM) to systematically evaluate the capabilities of multimodal foundation models across a set of 3 dimensions: basic skills, information flow, and real-world use cases. Basic multimodal skills are internal abilities required to solve problems, such as learning interactions across modalities, fine-grained alignment, multi-step reasoning, and the ability to handle external knowledge. Information flow studies how multimodal content changes during a task through querying, translation, editing, and fusion. Use cases span domain-specific challenges introduced in real-world multimedia, affective computing, natural sciences, healthcare, and human-computer interaction applications. Through comprehensive experiments across the 30 tasks in HEMM, we (1) identify key dataset dimensions (e.g., basic skills, information flows, and use cases) that pose challenges to today's models, and (2) distill performance trends regarding how different modeling dimensions (e.g., scale, pre-training data, multimodal alignment, pre-training, and instruction tuning objectives) influence performance. Our conclusions regarding challenging multimodal interactions, use cases, and tasks requiring reasoning and external knowledge, the benefits of data and model scale, and the impacts of instruction tuning yield actionable insights for future work in multimodal foundation models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2407.03418

Country:

North America > United States (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Media > Film (1.00)
Information Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.93)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Advancing Social Intelligence in AI Agents: Technical Challenges and Open Questions

Mathur, Leena, Liang, Paul Pu, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceApr-16-2024

Building socially-intelligent AI agents (Social-AI) is a multidisciplinary, multimodal research goal that involves creating agents that can sense, perceive, reason about, learn from, and respond to affect, behavior, and cognition of other agents (human or artificial). Progress towards Social-AI has accelerated in the past decade across several computing communities, including natural language processing, machine learning, robotics, human-machine interaction, computer vision, and speech. Natural language processing, in particular, has been prominent in Social-AI research, as language plays a key role in constructing the social world. In this position paper, we identify a set of underlying technical challenges and open questions for researchers across computing communities to advance Social-AI. We anchor our discussion in the context of social intelligence concepts and prior progress in Social-AI research.

artificial intelligence, interaction, natural language, (14 more...)

arXiv.org Artificial Intelligence

2404.11023

Country:

North America > United States (0.67)
Europe > United Kingdom > England (0.14)

Genre:

Instructional Material (0.93)
Research Report (0.81)
Overview (0.67)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

SOTOPIA: Interactive Evaluation for Social Intelligence in Language Agents

Zhou, Xuhui, Zhu, Hao, Mathur, Leena, Zhang, Ruohong, Yu, Haofei, Qi, Zhengyang, Morency, Louis-Philippe, Bisk, Yonatan, Fried, Daniel, Neubig, Graham, Sap, Maarten

arXiv.org Artificial IntelligenceOct-17-2023

Humans are social beings; we pursue social goals in our daily interactions, which is a crucial aspect of social intelligence. Yet, AI systems' abilities in this realm remain elusive. We present SOTOPIA, an open-ended environment to simulate complex social interactions between artificial agents and evaluate their social intelligence. In our environment, agents role-play and interact under a wide variety of scenarios; they coordinate, collaborate, exchange, and compete with each other to achieve complex social goals. We simulate the role-play interaction between LLM-based agents and humans within this task space and evaluate their performance with a holistic evaluation framework called SOTOPIA-Eval. With SOTOPIA, we find significant differences between these models in terms of their social intelligence, and we identify a subset of SOTOPIA scenarios, SOTOPIA-hard, that is generally challenging for all models. We find that on this subset, GPT-4 achieves a significantly lower goal completion rate than humans and struggles to exhibit social commonsense reasoning and strategic communication skills. These findings demonstrate SOTOPIA's promise as a general platform for research on evaluating and improving social intelligence in artificial agents.

large language model, machine learning, natural language, (7 more...)

arXiv.org Artificial Intelligence

2310.11667

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)

Add feedback

Expanding the Role of Affective Phenomena in Multimodal Interaction Research

Mathur, Leena, Matarić, Maja J, Morency, Louis-Philippe

arXiv.org Artificial IntelligenceMay-18-2023

In parallel to the aforementioned progress in the affective sciences, recent decades of computer science research have laid foundations In recent decades, the field of affective computing has made substantial in affective computing [19, 49, 52], with substantial progress progress in advancing the ability of AI systems to recognize in advancing the ability of AI systems to estimate affective phenomena and express affective phenomena, such as affect and emotions, during in humans. After affective phenomena have been predicted by human-human and human-machine interactions. This paper an AI system, we believe those predictions can be used to enhance describes our examination of research at the intersection of multimodal the system's understanding of human social behaviors and cognitive interaction and affective computing, with the objective of states, towards more socially-intelligent AI. We were, therefore, observing trends and identifying understudied areas. We examined motivated to explore the question: How, and to what extent, have over 16,000 papers from selected conferences in multimodal interaction, affective phenomena been used by AI systems in multimodal interaction affective computing, and natural language processing: ACM research to enhance machine understanding of human social International Conference on Multimodal Interaction, AAAC International behaviors and cognitive states?

affective phenomenon, artificial intelligence, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.10827

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (1.00)

Add feedback