AITopics

Embodied intelligence systems, which enhance agent capabilities through continuous environment interactions, have garnered significant attention from both academia and industry. Vision-Language-Action models, inspired by advancements in large foundation models, serve as universal robotic control frameworks that substantially improve agent-environment interaction capabilities in embodied intelligence systems. This expansion has broadened application scenarios for embodied AI robots. This survey comprehensively reviews VLA models for embodied manipulation. Firstly, it chronicles the developmental trajectory of VLA architectures. Subsequently, we conduct a detailed analysis of current research across 5 critical dimensions: VLA model structures, training datasets, pre-training methods, post-training methods, and model evaluation. Finally, we synthesize key challenges in VLA development and real-world deployment, while outlining promising future research directions.

large language model, machine learning, natural language, (18 more...)

2508.15201

Country:

Europe (1.00)
Asia > China (1.00)
North America > United States > California (0.28)

Genre:

Research Report (0.50)
Overview (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(2 more...)

When Truth Is Overridden: Uncovering the Internal Origins of Sycophancy in Large Language Models

Wang, Keyu, Li, Jin, Yang, Shu, Zhang, Zhuoran, Wang, Di

Large Language Models (LLMs) often exhibit sycophantic behavior, agreeing with user-stated opinions even when those contradict factual knowledge. While prior work has documented this tendency, the internal mechanisms that enable such behavior remain poorly understood. In this paper, we provide a mechanistic account of how sycophancy arises within LLMs. We first systematically study how user opinions induce sycophancy across different model families. We find that simple opinion statements reliably induce sycophancy, whereas user expertise framing has a negligible impact. Through logit-lens analysis and causal activation patching, we identify a two-stage emergence of sycophancy: (1) a late-layer output preference shift and (2) deeper representational divergence. We also verify that user authority fails to influence behavior because models do not encode it internally. In addition, we examine how grammatical perspective affects sycophantic behavior, finding that first-person prompts (``I believe...'') consistently induce higher sycophancy rates than third-person framings (``They believe...'') by creating stronger representational perturbations in deeper layers. These findings highlight that sycophancy is not a surface-level artifact but emerges from a structural override of learned knowledge in deeper layers, with implications for alignment and truthful AI systems.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2508.02087

Genre: Research Report > New Finding (0.93)

Industry: Education > Curriculum > Subject-Specific Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

ReliableMath: Benchmark of Reliable Mathematical Reasoning on Large Language Models

Xue, Boyang, Zhu, Qi, Wang, Rui, Wang, Sheng, Wang, Hongru, Hu, Minda, Mi, Fei, Wang, Yasheng, Shang, Lifeng, Liu, Qun, Wong, Kam-Fai

Although demonstrating remarkable performance on reasoning tasks, Large Language Models (LLMs) still tend to fabricate unreliable responses when confronted with problems that are unsolvable or beyond their capability, severely undermining the reliability. Prior studies of LLM reliability have primarily focused on knowledge tasks to identify unanswerable questions, while mathematical reasoning tasks have remained unexplored due to the dearth of unsolvable math problems. To systematically investigate LLM reliability in mathematical reasoning tasks, we formulate the reliability evaluation for both solvable and unsolvable problems. We then develop a ReliableMath dataset which incorporates open-source solvable problems and high-quality unsolvable problems synthesized by our proposed construction workflow with human evaluations. Experiments are conducted on various LLMs with several key findings uncovered. LLMs fail to directly identify unsolvable problems and always generate fabricated responses. When instructing LLMs to indicate unsolvability using a reliable prompt, the reliability of larger-sized LLMs remains on solvable problems, but notably improves on unsolvable problems yet still falls short of solvable problems. However, small LLMs rarely show any progress despite employing reliable prompts. Therefore, we further propose an alignment strategy to enhance small LLMs' reliability, which can significantly improve LLM reliability performances on both in-domain and out-of-domain tasks.

large language model, machine learning, natural language, (21 more...)

2507.03133

Country: North America > Mexico (0.28)

Genre:

Research Report (0.50)
Workflow (0.49)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.52)

Tanjim, Tauhid, George, Jonathan St., Ching, Kevin, Taylor, Angelique

Help or Hindrance: Understanding the Impact of Robot Communication in Action Teams

The human-robot interaction (HRI) field has recognized the importance of enabling robots to interact with teams. Human teams rely on effective communication for successful collaboration in time-sensitive environments. Robots can play a role in enhancing team coordination through real-time assistance. Despite significant progress in human-robot teaming research, there remains an essential gap in how robots can effectively communicate with action teams using multimodal interaction cues in time-sensitive environments. This study addresses this knowledge gap in an experimental in-lab study to investigate how multimodal robot communication in action teams affects workload and human perception of robots. We explore team collaboration in a medical training scenario where a robotic crash cart (RCC) provides verbal and non-verbal cues to help users remember to perform iterative tasks and search for supplies. Our findings show that verbal cues for object search tasks and visual cues for task reminders reduce team workload and increase perceived ease of use and perceived usefulness more effectively than a robot with no feedback. Our work contributes to multimodal interaction research in the HRI field, highlighting the need for more human-robot teaming research to understand best practices for integrating collaborative robots in time-sensitive environments such as in hospitals, search and rescue, and manufacturing applications.

artificial intelligence, communication, robot, (15 more...)

doi: 10.1109/RO-MAN63969.2025.11217909

2506.08892

Country: North America > United States (0.47)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.94)

Industry:

Health & Medicine > Health Care Providers & Services (0.67)
Education > Educational Setting > Higher Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.82)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.64)

Mao, Mingyang, Perez-Cabarcas, Mariela M., Kallakuri, Utteja, Waytowich, Nicholas R., Lin, Xiaomin, Mohsenin, Tinoosh

Multi-RAG: A Multimodal Retrieval-Augmented Generation System for Adaptive Video Understanding

To effectively engage in human society, the ability to adapt, filter information, and make informed decisions in ever-changing situations is critical. As robots and intelligent agents become more integrated into human life, there is a growing opportunity-and need-to offload the cognitive burden on humans to these systems, particularly in dynamic, information-rich scenarios. To fill this critical need, we present Multi-RAG, a multimodal retrieval-augmented generation system designed to provide adaptive assistance to humans in information-intensive circumstances. Our system aims to improve situational understanding and reduce cognitive load by integrating and reasoning over multi-source information streams, including video, audio, and text. As an enabling step toward long-term human-robot partnerships, Multi-RAG explores how multimodal information understanding can serve as a foundation for adaptive robotic assistance in dynamic, human-centered situations. To evaluate its capability in a realistic human-assistance proxy task, we benchmarked Multi-RAG on the MMBench-Video dataset, a challenging multimodal video understanding benchmark. Our system achieves superior performance compared to existing open-source video large language models (Video-LLMs) and large vision-language models (LVLMs), while utilizing fewer resources and less input data. The results demonstrate Multi- RAG's potential as a practical and efficient foundation for future human-robot adaptive assistance systems in dynamic, real-world contexts.

information, large language model, machine learning, (17 more...)

2505.2399

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Government > Military (0.68)
Health & Medicine (0.47)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

The New YorkerNov-12-2025, 11:00:00 GMT

How the Supreme Court Defines Liberty

Recent memoirs by the Justices reveal how a new vision of restraint has led to radical outcomes. To understand how grudging Amy Coney Barrett's new book is when it comes to revealing personal details, consider that one of the family members the Supreme Court Justice most often refers to is a great-grandmother who died five years before she was born. On Barrett's desk at home, she recounts in " Listening to the Law," she keeps a photograph of her great-grandmother's one-story house, where, as a widow during the Great Depression, she raised some of her thirteen children and took in other needy relatives. "Looking at the photo reminds me of a woman who stretched herself beyond all reasonable capacity," Barrett explains. "I'm not sure that I'll be able to manage my life with the same grace that she had. But she motivates me to keep trying." For Barrett, the mother of seven children, that effort entails setting her alarm for 5 "Our kids get up at six thirty during the school year, so I start early if I want to accomplish anything on my own to-do list," she writes. This is what passes for disclosure from Barrett; she measures out the details of her life with coffee spoons, careful not to spill.

artificial intelligence, barrett, book review, (17 more...)

The New Yorker

Country: North America > United States (1.00)

Genre: Summary/Review (1.00)

Industry:

Law > Government & the Courts (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence (0.47)

Tertulino, Rodrigo, Almeida, Ricardo

Privacy-Preserving Personalization in Education: A Federated Recommender System for Student Performance Prediction

The increasing digitalization of education presents unprecedented opportunities for data-driven personalization, but it also introduces significant challenges to student data privacy. Conventional recommender systems rely on centralized data, a paradigm often incompatible with modern data protection regulations. A novel privacy-preserving recommender system is proposed and evaluated to address this critical issue using Federated Learning (FL). The approach utilizes a Deep Neural Network (DNN) with rich, engineered features from the large-scale ASSISTments educational dataset. A rigorous comparative analysis of federated aggregation strategies was conducted, identifying FedProx as a significantly more stable and effective method for handling heterogeneous student data than the standard FedAvg baseline. The optimized federated model achieves a high-performance F1-Score of 76.28%, corresponding to 92% of the performance of a powerful, centralized XGBoost model. These findings validate that a federated approach can provide highly effective content recommendations without centralizing sensitive student data. Consequently, our work presents a viable and robust solution to the personalization-privacy dilemma in modern educational platforms.

artificial intelligence, data mining, machine learning, (19 more...)

2509.10516

Country:

North America > United States (0.28)
South America > Brazil (0.28)
Asia (0.28)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Rustam, Furqan, Obaidat, Islam, Jurcut, Anca Delia

MULTI-LF: A Continuous Learning Framework for Real-Time Malicious Traffic Detection in Multi-Environment Networks

Multi-environment (M-En) networks integrate diverse traffic sources, including Internet of Things (IoT) and traditional computing systems, creating complex and evolving conditions for malicious traffic detection. Existing machine learning (ML)-based approaches, typically trained on static single-domain datasets, often fail to generalize across heterogeneous network environments. To address this gap, we develop a realistic Docker-NS3-based testbed that emulates both IoT and traditional traffic conditions, enabling the generation and capture of live, labeled network flows. The resulting M-En Dataset combines this traffic with curated public PCAP traces to provide comprehensive coverage of benign and malicious behaviors. Building on this foundation, we propose Multi-LF, a real-time continuous learning framework that combines a lightweight model (M1) for rapid detection with a deeper model (M2) for high-confidence refinement and adaptation. A confidence-based coordination mechanism enhances efficiency without compromising accuracy, while weight interpolation mitigates catastrophic forgetting during continuous updates. Features extracted at 1-second intervals capture fine-grained temporal patterns, enabling early recognition of evolving attack behaviors. Implemented and evaluated within the Docker-NS3 testbed on live traffic, Multi-LF achieves an accuracy of 0.999 while requiring human intervention for only 0.0026 percent of packets, demonstrating its effectiveness and practicality for real-time malicious traffic detection in heterogeneous network environments.

machine learning, real time system, traffic, (21 more...)

2504.11575

Country: Europe (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.93)
Education > Educational Setting > Continuing Education (0.62)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Costa, Davi Bastos, Alves, Felippe, Vicente, Renato

Moral Susceptibility and Robustness under Persona Role-Play in Large Language Models

Large language models (LLMs) increasingly operate in social contexts, motivating analysis of how they express and shift moral judgments. In this work, we investigate the moral response of LLMs to persona role-play, prompting a LLM to assume a specific character. Using the Moral Foundations Questionnaire (MFQ), we introduce a benchmark that quantifies two properties: moral susceptibility and moral robustness, defined from the variability of MFQ scores across and within personas, respectively. We find that, for moral robustness, model family accounts for most of the variance, while model size shows no systematic effect. The Claude family is, by a significant margin, the most robust, followed by Gemini and GPT-4 models, with other families exhibiting lower robustness. In contrast, moral susceptibility exhibits a mild family effect but a clear within-family size effect, with larger variants being more susceptible. Moreover, robustness and susceptibility are positively correlated, an association that is more pronounced at the family level. Additionally, we present moral foundation profiles for models without persona role-play and for personas averaged across models. Together, these analyses provide a systematic view of how persona conditioning shapes moral behavior in large language models.

large language model, machine learning, susceptibility, (20 more...)

2511.08565

Country: North America > United States (0.46)

Genre:

Questionnaire & Opinion Survey (0.89)
Research Report (0.66)

Industry:

Banking & Finance (1.00)
Government (0.68)
Law (0.68)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

A Matter of Interest: Understanding Interestingness of Math Problems in Humans and Language Models

Mishra, Shubhra, Machino, Yuka, Poesia, Gabriel, Jiang, Albert, Hsu, Joy, Weller, Adrian, Mishra, Challenger, Broman, David, Tenenbaum, Joshua B., Jamnik, Mateja, Zhang, Cedegao E., Collins, Katherine M.

The evolution of mathematics has been guided in part by interestingness. From researchers choosing which problems to tackle next, to students deciding which ones to engage with, people's choices are often guided by judgments about how interesting or challenging problems are likely to be. As AI systems, such as LLMs, increasingly participate in mathematics with people -- whether for advanced research or education -- it becomes important to understand how well their judgments align with human ones. Our work examines this alignment through two empirical studies of human and LLM assessment of mathematical interestingness and difficulty, spanning a range of mathematical experience. We study two groups: participants from a crowdsourcing platform and International Math Olympiad competitors. We show that while many LLMs appear to broadly agree with human notions of interestingness, they mostly do not capture the distribution observed in human judgments. Moreover, most LLMs only somewhat align with why humans find certain math problems interesting, showing weak correlation with human-selected interestingness rationales. Together, our findings highlight both the promises and limitations of current LLMs in capturing human interestingness judgments for mathematical AI thought partnerships.

large language model, machine learning, natural language, (20 more...)

2511.08548

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)