AITopics

Text-to-Image (T2I) models have achieved remarkable success in generating visual content from text inputs. Although multiple safety alignment strategies have been proposed to prevent harmful outputs, they often lead to overly cautious behavior -- rejecting even benign prompts -- a phenomenon known as $\textit{over-refusal}$ that reduces the practical utility of T2I models. Despite over-refusal having been observed in practice, there is no large-scale benchmark that systematically evaluates this phenomenon for T2I models. In this paper, we present an automatic workflow to construct synthetic evaluation data, resulting in OVERT ($\textbf{OVE}$r-$\textbf{R}$efusal evaluation on $\textbf{T}$ext-to-image models), the first large-scale benchmark for assessing over-refusal behaviors in T2I models. OVERT includes 4,600 seemingly harmful but benign prompts across nine safety-related categories, along with 1,785 genuinely harmful prompts (OVERT-unsafe) to evaluate the safety-utility trade-off. Using OVERT, we evaluate several leading T2I models and find that over-refusal is a widespread issue across various categories (Figure 1), underscoring the need for further research to enhance the safety alignment of T2I models without compromising their functionality. As a preliminary attempt to reduce over-refusal, we explore prompt rewriting; however, we find it often compromises faithfulness to the meaning of the original prompts. Finally, we demonstrate the flexibility of our generation framework in accommodating diverse safety requirements by generating customized evaluation data adapting to user-defined policies.

category, large language model, machine learning, (19 more...)

2505.21347

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.69)
Law (0.69)
Leisure & Entertainment > Games > Computer Games (0.46)
Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)

Dinh, Tu Anh, Schumacher, Philipp Nicolas, Niehues, Jan

From Slides to Chatbots: Enhancing Large Language Models with University Course Materials

Large Language Models (LLMs) have advanced rapidly in recent years. One application of LLMs is to support student learning in educational settings. However, prior work has shown that LLMs still struggle to answer questions accurately within university-level computer science courses. In this work, we investigate how incorporating university course materials can enhance LLM performance in this setting. A key challenge lies in leveraging diverse course materials such as lecture slides and transcripts, which differ substantially from typical textual corpora: slides also contain visual elements like images and formulas, while transcripts contain spoken, less structured language. We compare two strategies, Retrieval-Augmented Generation (RAG) and Continual Pre-Training (CPT), to extend LLMs with course-specific knowledge. For lecture slides, we further explore a multi-modal RAG approach, where we present the retrieved content to the generator in image form. Our experiments reveal that, given the relatively small size of university course materials, RAG is more effective and efficient than CPT. Moreover, incorporating slides as images in the multi-modal setting significantly improves performance over text-only retrieval. These findings highlight practical strategies for developing AI assistants that better support learning and teaching, and we hope they inspire similar efforts in other educational contexts.

large language model, machine learning, natural language, (19 more...)

2510.22272

Country:

Europe (0.46)
North America > United States (0.28)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Curriculum (0.74)
Education > Educational Setting (0.74)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Leveraging Large Language Models to Identify Conversation Threads in Collaborative Learning

Ravi, Prerna, Lee, Dong Won, Flamia, Beatriz, David, Jasmine, Hanks, Brandon, Breazeal, Cynthia, Anderson, Emma, Lin, Grace

Understanding how ideas develop and flow in small-group conversations is critical for analyzing collaborative learning. A key structural feature of these interactions is threading, the way discourse talk naturally organizes into interwoven topical strands that evolve over time. While threading has been widely studied in asynchronous text settings, detecting threads in synchronous spoken dialogue remains challenging due to overlapping turns and implicit cues. At the same time, large language models (LLMs) show promise for automating discourse analysis but often struggle with long-context tasks that depend on tracing these conversational links. In this paper, we investigate whether explicit thread linkages can improve LLM-based coding of relational moves in group talk. We contribute a systematic guidebook for identifying threads in synchronous multi-party transcripts and benchmark different LLM prompting strategies for automated threading. We then test how threading influences performance on downstream coding of conversational analysis frameworks, that capture core collaborative actions such as agreeing, building, and eliciting. Our results show that providing clear conversational thread information improves LLM coding performance and underscores the heavy reliance of downstream analysis on well-structured dialogue. We also discuss practical trade-offs in time and cost, emphasizing where human-AI hybrid approaches can yield the best value. Together, this work advances methods for combining LLMs and robust conversational thread structures to make sense of complex, real-time group interactions.

large language model, machine learning, utterance, (19 more...)

2510.22844

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

A Survey of Data Agents: Emerging Paradigm or Overstated Hype?

Zhu, Yizhang, Wang, Liangwei, Yang, Chenyu, Lin, Xiaotian, Li, Boyan, Zhou, Wei, Liu, Xinyu, Peng, Zhangyang, Luo, Tianqi, Li, Yu, Chai, Chengliang, Chen, Chong, Di, Shimin, Fan, Ju, Sun, Ji, Tang, Nan, Tsung, Fugee, Wang, Jiannan, Wu, Chenglin, Xu, Yanwei, Zhang, Shaolei, Zhang, Yong, Zhou, Xuanhe, Li, Guoliang, Luo, Yuyu

The rapid advancement of large language models (LLMs) has spurred the emergence of data agents--autonomous systems designed to orchestrate Data + AI ecosystems for tackling complex data-related tasks. However, the term "data agent" currently suffers from terminological ambiguity and inconsistent adoption, conflating simple query responders with sophisticated autonomous architectures. This terminological ambiguity fosters mismatched user expectations, accountability challenges, and barriers to industry growth. Inspired by the SAE J3016 standard for driving automation, this survey introduces the first systematic hierarchical taxonomy for data agents, comprising six levels that delineate and trace progressive shifts in autonomy, from manual operations (L0) to a vision of generative, fully autonomous data agents (L5), thereby clarifying capability boundaries and responsibility allocation. Through this lens, we offer a structured review of existing research arranged by increasing autonomy, encompassing specialized data agents for data management, preparation, and analysis, alongside emerging efforts toward versatile, comprehensive systems with enhanced autonomy. We further analyze critical evolutionary leaps and technical gaps for advancing data agents, especially the ongoing L2-to-L3 transition, where data agents evolve from procedural execution to autonomous orchestration. Finally, we conclude with a forward-looking roadmap, envisioning the advent of proactive, generative data agents.

large language model, machine learning, natural language, (17 more...)

2510.23587

Country:

Asia > China (0.93)
North America > United States (0.68)
Asia > Middle East > UAE (0.27)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
(3 more...)

Li, Andrew C., Klassen, Toryn Q., Wang, Andrew, Alamdari, Parand A., McIlraith, Sheila A.

Ground-Compose-Reinforce: Grounding Language in Agentic Behaviours using Limited Data

Grounding language in perception and action is a key challenge when building situated agents that can interact with humans, or other agents, via language. In the past, addressing this challenge has required manually designing the language grounding or curating massive datasets that associate language with the environment. We propose Ground-Compose-Reinforce, an end-to-end, neurosymbolic framework for training RL agents directly from high-level task specifications--without manually designed reward functions or other domain-specific oracles, and without massive datasets. These task specifications take the form of Reward Machines, automata-based representations that capture high-level task structure and are in some cases autoformalizable from natural language. Critically, we show that Reward Machines can be grounded using limited data by exploiting compositionality. Experiments in a custom Meta-World domain with only 350 labelled pretraining trajectories show that our framework faithfully elicits complex behaviours from high-level specifications--including behaviours that never appear in pretraining--while non-compositional approaches fail.

large language model, machine learning, reinforcement learning, (18 more...)

2507.10741

Country: North America > Canada > Ontario (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Education (0.46)
Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving

Tang, Xiangru, Qin, Tianrui, Peng, Tianhao, Zhou, Ziyang, Shao, Daniel, Du, Tingting, Wei, Xinming, Xia, Peng, Wu, Fang, Zhu, He, Zhang, Ge, Liu, Jiaheng, Wang, Xingyao, Hong, Sirui, Wu, Chenglin, Cheng, Hao, Wang, Chi, Zhou, Wangchunshu

AI agent frameworks operate in isolation, forcing agents to rediscover solutions and repeat mistakes across different systems. Despite valuable problem-solving experiences accumulated by frameworks like smolagents, OpenHands, and OWL, this knowledge remains trapped within individual systems, preventing the emergence of collective intelligence. Current memory systems focus on individual agents or framework-specific demonstrations, failing to enable cross-architecture knowledge transfer. We introduce AGENT KB, a universal memory infrastructure enabling seamless experience sharing across heterogeneous agent frameworks without retraining. AGENT KB aggregates trajectories into a structured knowledge base and serves lightweight APIs. At inference time, hybrid retrieval operates through two stages: planning seeds agents with cross-domain workflows, while feedback applies targeted diagnostic fixes. A disagreement gate ensures retrieved knowledge enhances rather than disrupts reasoning, addressing knowledge interference in cross-framework transfer. We validate AGENT KB across major frameworks on GAIA, Humanity's Last Exam, GPQA, and SWE-bench. Results show substantial improvements across diverse model families: compared to baseline pass@1, smolagents with AGENT KB achieve up to 18.7pp gains at pass@3 (55.2% -> 73.9%), while OpenHands improves 4.0pp on SWE-bench pass@1 (24.3% -> 28.3%). Similar improvements are observed across all base model families. Ablations confirm that hybrid retrieval and feedback stages are essential, with automatically generated experiences matching manual curation. This establishes the foundation for collective agent intelligence through shared memory infrastructures.

knowledge management, large language model, machine learning, (19 more...)

2507.06229

Genre:

Workflow (0.90)
Research Report > New Finding (0.48)

Industry:

Information Technology (0.67)
Education (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Improving the Distributional Alignment of LLMs using Supervision

Kambhatla, Gauri, Gautam, Sanjana, Zhang, Angela, Liu, Alex, Srinivasan, Ravi, Li, Junyi Jessy, Lease, Matthew

The ability to accurately align LLMs with human population groups on subjective questions would have great value. In this work, we show that use of simple supervision can greatly improve language model alignment with diverse population groups more consistently, as measured over three datasets spanning various topics. Beyond evaluating average alignment, we also report how alignment varies across specific groups. Our broad findings provide insights into the distributional alignment of LLMs with diverse population groups. By conducting evaluation over many LLMs and prompting strategies, along with open-sourcing our work, we provide a benchmark to stimulate future research.

artificial intelligence, large language model, natural language, (16 more...)

2507.00439

Country:

North America > United States (1.00)
Europe (1.00)
Africa (1.00)
Asia > Middle East > UAE (0.28)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.92)

Industry:

Health & Medicine (1.00)
Government (1.00)
Education > Educational Setting (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

MoORE: SVD-based Model MoE-ization for Conflict- and Oblivion-Resistant Multi-Task Adaptation

Yuan, Shen, Zheng, Yin, Wang, Taifeng, Liu, Binbin, Xu, Hongteng

Adapting large-scale foundation models in multi-task scenarios often suffers from task conflict and oblivion. To mitigate such issues, we propose a novel ''model MoE-ization'' strategy that leads to a conflict- and oblivion-resistant multi-task adaptation method. Given a weight matrix of a pre-trained model, our method applies SVD to it and introduces a learnable router to adjust its singular values based on tasks and samples. Accordingly, the weight matrix becomes a Mixture of Orthogonal Rank-one Experts (MoORE), in which each expert corresponds to the outer product of a left singular vector and the corresponding right one. We can improve the model capacity by imposing a learnable orthogonal transform on the right singular vectors. Unlike low-rank adaptation (LoRA) and its MoE-driven variants, MoORE guarantees the experts' orthogonality and maintains the column space of the original weight matrix. These two properties make the adapted model resistant to the conflicts among the new tasks and the oblivion of its original tasks, respectively. Experiments on various datasets demonstrate that MoORE outperforms existing multi-task adaptation methods consistently, showing its superiority in terms of conflict- and oblivion-resistance. The code of the experiments is available at https://github.com/DaShenZi721/MoORE.

large language model, machine learning, moore, (17 more...)

2506.14436

Country:

Asia (0.28)
North America > United States > Minnesota (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

KungfuBot: Physics-Based Humanoid Whole-Body Control for Learning Highly-Dynamic Skills

Xie, Weiji, Han, Jinrui, Zheng, Jiakun, Li, Huanyu, Liu, Xinzhe, Shi, Jiyuan, Zhang, Weinan, Bai, Chenjia, Li, Xuelong

Humanoid robots are promising to acquire various skills by imitating human behaviors. However, existing algorithms are only capable of tracking smooth, low-speed human motions, even with delicate reward and curriculum design. This paper presents a physics-based humanoid control framework, aiming to master highly-dynamic human behaviors such as Kungfu and dancing through multi-steps motion processing and adaptive motion tracking. For motion processing, we design a pipeline to extract, filter out, correct, and retarget motions, while ensuring compliance with physical constraints to the maximum extent. For motion imitation, we formulate a bi-level optimization problem to dynamically adjust the tracking accuracy tolerance based on the current tracking error, creating an adaptive curriculum mechanism. We further construct an asymmetric actor-critic framework for policy training. In experiments, we train whole-body control policies to imitate a set of highly-dynamic motions. Our method achieves significantly lower tracking errors than existing approaches and is successfully deployed on the Unitree G1 robot, demonstrating stable and expressive behaviors. The project page is https://kungfu-bot.github.io.

artificial intelligence, arxiv preprint arxiv, mechanism, (16 more...)

2506.12851

Country: Asia > China (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry:

Information Technology (0.67)
Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (0.61)

Behavioral alignment in social networks

Xia, Yu, McAvoy, Alex, Su, Qi

The orderly behaviors observed in large-scale groups, such as fish schooling and the organized movement of crowds, are both ubiquitous and essential for the survival and stability of these systems. Understanding how such complex collective behaviors emerge from simple local interactions and behavioral adjustments is a significant scientific challenge. Historically, research has predominantly focused on imitation and social learning, where individuals adopt the strategies of more successful peers to refine their behavior. However, in recent years, an alternative learning approach based on self-exploration and introspective learning has garnered increasing attention. In this paradigm, individuals assess their own circumstances and select strategies that best align with their specific conditions. Two examples are coordination and anti-coordination, where individuals align with and diverge from the local majority, respectively. In this study, we analyze networked systems of coordinating and anti-coordinating individuals, exploring the combined effects of system dynamics, network structure, and behavioral patterns. We address several practical questions, including the number of equilibria, their characteristics, the equilibrium time, and the resilience of the system. We find that the number of equilibrium states can be extremely large, even increasing exponentially with minor alterations to the network structure. Moreover, the network structure has a significant impact on the average equilibrium time. Despite the complexity of these findings, we find that variations can be captured by a single, simple network characteristic (the average path length), which we illustrate in both synthetic and empirical networks.

artificial intelligence, evolutionary algorithm, machine learning, (15 more...)

doi: 10.1093/nsr/nwaf403

2506.00046

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.34)

Industry:

Education > Educational Setting (0.46)
Information Technology > Services (0.41)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.34)