AITopics

We study multi-modal summarization for instructional videos, whose goal is to provide users an efficient way to learn skills in the form of text instructions and key video frames. We observe that existing benchmarks focus on generic semantic-level video summarization, and are not suitable for providing step-by-step executable instructions and illustrations, both of which are crucial for instructional videos. We propose a novel benchmark for user interface (UI) instructional video summarization to fill the gap. We collect a dataset of 2,413 UI instructional videos, which spans over 167 hours. These videos are manually annotated for video segmentation, text summarization, and video summarization, which enable the comprehensive evaluations for concise and executable video summarization. We conduct extensive experiments on our collected MS4UI dataset, which suggest that state-of-the-art multi-modal summarization methods struggle on UI video summarization, and highlight the importance of new methods for UI instructional video summarization.

artificial intelligence, machine learning, natural language, (17 more...)

2506.12623

Country: Europe > Switzerland (0.28)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Media (1.00)
Education > Educational Technology > Audio & Video (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Human Computer Interaction (0.85)

MVP-CBM:Multi-layer Visual Preference-enhanced Concept Bottleneck Model for Explainable Medical Image Classification

Wang, Chunjiang, Zhang, Kun, Liu, Yandong, He, Zhiyang, Tao, Xiaodong, Zhou, S. Kevin

The concept bottleneck model (CBM), as a technique improving interpretability via linking predictions to human-understandable concepts, makes high-risk and life-critical medical image classification credible. Typically, existing CBM methods associate the final layer of visual encoders with concepts to explain the model's predictions. However, we empirically discover the phenomenon of concept preference variation, that is, the concepts are preferably associated with the features at different layers than those only at the final layer; yet a blind last-layer-based association neglects such a preference variation and thus weakens the accurate correspondences between features and concepts, impairing model interpretability. To address this issue, we propose a novel Multi-layer Visual Preference-enhanced Concept Bottleneck Model (MVP-CBM), which comprises two key novel modules: (1) intra-layer concept preference modeling, which captures the preferred association of different concepts with features at various visual layers, and (2) multi-layer concept sparse activation fusion, which sparsely aggregates concept activations from multiple layers to enhance performance. Thus, by explicitly modeling concept preferences, MVP-CBM can comprehensively leverage multi-layer visual information to provide a more nuanced and accurate explanation of model decisions. Extensive experiments on several public medical classification benchmarks demonstrate that MVP-CBM achieves state-of-the-art accuracy and interoperability, verifying its superiority. Code is available at https://github.com/wcj6/MVP-CBM.

artificial intelligence, image understanding, machine learning, (18 more...)

2506.12568

Country: Asia > China (0.15)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.91)
Instructional Material > Online (0.81)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Feeling Machines: Ethics, Culture, and the Rise of Emotional AI

Chavan, Vivek, Cenaj, Arsen, Shen, Shuyuan, Bar, Ariane, Binwani, Srishti, Del Becaro, Tommaso, Funk, Marius, Greschner, Lynn, Hung, Roberto, Klein, Stina, Kleiner, Romina, Krause, Stefanie, Olbrych, Sylwia, Parmar, Vishvapalsinhji, Sarafraz, Jaleh, Soroko, Daria, Don, Daksitha Withanage, Zhou, Chang, Vu, Hoang Thuy Duong, Semnani, Parastoo, Weinhardt, Daniel, Andre, Elisabeth, Krüger, Jörg, Fresquet, Xavier

This paper explores the growing presence of emotionally responsive artificial intelligence through a critical and interdisciplinary lens. Bringing together the voices of early-career researchers from multiple fields, it explores how AI systems that simulate or interpret human emotions are reshaping our interactions in areas such as education, healthcare, mental health, caregiving, and digital life. The analysis is structured around four central themes: the ethical implications of emotional AI, the cultural dynamics of human-machine interaction, the risks and opportunities for vulnerable populations, and the emerging regulatory, design, and technical considerations. The authors highlight the potential of affective AI to support mental well-being, enhance learning, and reduce loneliness, as well as the risks of emotional manipulation, over-reliance, misrepresentation, and cultural bias. Key challenges include simulating empathy without genuine understanding, encoding dominant sociocultural norms into AI systems, and insufficient safeguards for individuals in sensitive or high-risk contexts. Special attention is given to children, elderly users, and individuals with mental health challenges, who may interact with AI in emotionally significant ways. However, there remains a lack of cognitive or legal protections which are necessary to navigate such engagements safely. The report concludes with ten recommendations, including the need for transparency, certification frameworks, region-specific fine-tuning, human oversight, and longitudinal research. A curated supplementary section provides practical tools, models, and datasets to support further work in this domain.

ai system, machine learning, natural language, (19 more...)

2506.12437

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report (0.82)
Instructional Material > Course Syllabus & Notes (0.48)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(5 more...)

A Memetic Walrus Algorithm with Expert-guided Strategy for Adaptive Curriculum Sequencing

Huang, Qionghao, Lu, Lingnuo, Wu, Xuemei, Jiang, Fan, Wang, Xizhe, Wang, Xun

Adaptive Curriculum Sequencing (ACS) is essential for personalized online learning, yet current approaches struggle to balance complex educational constraints and maintain optimization stability. This paper proposes a Memetic Walrus Optimizer (MWO) that enhances optimization performance through three key innovations: (1) an expert-guided strategy with aging mechanism that improves escape from local optima; (2) an adaptive control signal framework that dynamically balances exploration and exploitation; and (3) a three-tier priority mechanism for generating educationally meaningful sequences. We formulate ACS as a multi-objective optimization problem considering concept coverage, time constraints, and learning style compatibility. Experiments on the OULAD dataset demonstrate MWO's superior performance, achieving 95.3% difficulty progression rate (compared to 87.2% in baseline methods) and significantly better convergence stability (standard deviation of 18.02 versus 28.29-696.97 in competing algorithms). Additional validation on benchmark functions confirms MWO's robust optimization capability across diverse scenarios. The results demonstrate MWO's effectiveness in generating personalized learning sequences while maintaining computational efficiency and solution quality.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

2506.13092

Country: Asia > China > Zhejiang Province (0.14)

Genre:

Instructional Material (0.96)
Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Flow-Based Policy for Online Reinforcement Learning

Lv, Lei, Li, Yunfei, Luo, Yu, Sun, Fuchun, Kong, Tao, Xu, Jiafeng, Ma, Xiao

We present \textbf{FlowRL}, a novel framework for online reinforcement learning that integrates flow-based policy representation with Wasserstein-2-regularized optimization. We argue that in addition to training signals, enhancing the expressiveness of the policy class is crucial for the performance gains in RL. Flow-based generative models offer such potential, excelling at capturing complex, multimodal action distributions. However, their direct application in online RL is challenging due to a fundamental objective mismatch: standard flow training optimizes for static data imitation, while RL requires value-based policy optimization through a dynamic buffer, leading to difficult optimization landscapes. FlowRL first models policies via a state-dependent velocity field, generating actions through deterministic ODE integration from noise. We derive a constrained policy search objective that jointly maximizes Q through the flow policy while bounding the Wasserstein-2 distance to a behavior-optimal policy implicitly derived from the replay buffer. This formulation effectively aligns the flow optimization with the RL objective, enabling efficient and value-aware policy learning despite the complexity of the policy class. Empirical evaluations on DMControl and Humanoidbench demonstrate that FlowRL achieves competitive performance in online reinforcement learning benchmarks.

arxiv preprint arxiv, machine learning, reinforcement learning, (15 more...)

2506.12811

Genre:

Research Report (1.00)
Instructional Material > Online (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Perdomo, Juan Carlos, Recht, Benjamin

In Defense of Defensive Forecasting

arXiv.org Machine LearningJun-16-2025

This tutorial provides a survey of algorithms for Defensive Forecasting, where predictions are derived not by prognostication but by correcting past mistakes. Pioneered by Vovk, Defensive Forecasting frames the goal of prediction as a sequential game, and derives predictions to minimize metrics no matter what outcomes occur. We present an elementary introduction to this general theory and derive simple, near-optimal algorithms for online learning, calibration, prediction with expert advice, and online conformal prediction.

artificial intelligence, machine learning, prediction, (18 more...)

arXiv.org Machine Learning

2506.11848

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.88)

Industry:

Leisure & Entertainment (0.45)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.66)

Haynes, Judson Leroy Dean IV

Enter: Graduated Realism: A Pedagogical Framework for AI-Powered Avatars in Virtual Reality Teacher Training

Virtual Reality simulators offer a powerful tool for teacher training, yet the integration of AI-powered student avatars presents a critical challenge: determining the optimal level of avatar realism for effective pedagogy. This literature review examines the evolution of avatar realism in VR teacher training, synthesizes its theoretical implications, and proposes a new pedagogical framework to guide future design. Through a systematic review, this paper traces the progression from human-controlled avatars to generative AI prototypes. Applying learning theories like Cognitive Load Theory, we argue that hyper-realism is not always optimal, as high-fidelity avatars can impose excessive extraneous cognitive load on novices, a stance supported by recent empirical findings. A significant gap exists between the technological drive for photorealism and the pedagogical need for scaffolded learning. To address this gap, we propose Graduated Realism, a framework advocating for starting trainees with lower-fidelity avatars and progressively increasing behavioral complexity as skills develop. To make this computationally feasible, we outline a novel single-call architecture, Crazy Slots, which uses a probabilistic engine and a Retrieval-Augmented Generation database to generate authentic, real-time responses without the latency and cost of multi-step reasoning models. This review provides evidence-based principles for designing the next generation of AI simulators, arguing that a pedagogically grounded approach to realism is essential for creating scalable and effective teacher education tools.

large language model, machine learning, natural language, (22 more...)

2506.1189

Country:

North America > United States (0.28)
Europe (0.28)
Asia (0.28)

Genre:

Research Report (1.00)
Instructional Material (1.00)
Overview (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Teacher Education (1.00)
Education > Educational Setting > Higher Education (0.46)
Education > Educational Setting > Corporate Training (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Schönlieb, Carola-Bibiane, Shumaylov, Zakhar

Data-driven approaches to inverse problems

Inverse problems are concerned with the reconstruction of unknown physical quantities using indirect measurements and are fundamental across diverse fields such as medical imaging, remote sensing, and material sciences. These problems serve as critical tools for visualizing internal structures beyond what is visible to the naked eye, enabling quantification, diagnosis, prediction, and discovery. However, most inverse problems are ill-posed, necessitating robust mathematical treatment to yield meaningful solutions. While classical approaches provide mathematically rigorous and computationally stable solutions, they are constrained by the ability to accurately model solution properties and implement them efficiently. A more recent paradigm considers deriving solutions to inverse problems in a data-driven manner. Instead of relying on classical mathematical modeling, this approach utilizes highly over-parameterized models, typically deep neural networks, which are adapted to specific inverse problems using carefully selected training data. Current approaches that follow this new paradigm distinguish themselves through solution accuracy paired with computational efficiency that was previously inconceivable. These notes offer an introduction to this data-driven paradigm for inverse problems. The first part of these notes will provide an introduction to inverse problems, discuss classical solution strategies, and present some applications. The second part will delve into modern data-driven approaches, with a particular focus on adversarial regularization and provably convergent linear plug-and-play denoisers. Throughout the presentation of these methodologies, their theoretical properties will be discussed, and numerical examples will be provided. The lecture series will conclude with a discussion of open problems and future perspectives in the field.

artificial intelligence, inverse problem, machine learning, (18 more...)

2506.11732

Country:

Europe > United Kingdom > England (0.28)
North America > United States > California (0.27)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Energy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(5 more...)

A Step-by-Step Guide to Creating a Robust Autonomous Drone Testing Pipeline

Jiang, Yupeng, Deng, Yao, Schroder, Sebastian, Liang, Linfeng, Gambhir, Suhaas, James, Alice, Seth, Avishkar, Pirrie, James, Zhang, Yihao, Zheng, Xi

Autonomous drones are rapidly reshaping industries ranging from aerial delivery and infrastructure inspection to environmental monitoring and disaster response. Ensuring the safety, reliability, and efficiency of these systems is paramount as they transition from research prototypes to mission-critical platforms. This paper presents a step-by-step guide to establishing a robust autonomous drone testing pipeline, covering each critical stage: Software-in-the-Loop (SIL) Simulation Testing, Hardware-in-the-Loop (HIL) Testing, Controlled Real-World Testing, and In-Field Testing. Using practical examples, including the marker-based autonomous landing system, we demonstrate how to systematically verify drone system behaviors, identify integration issues, and optimize performance. Furthermore, we highlight emerging trends shaping the future of drone testing, including the integration of Neurosymbolic and LLMs, creating co-simulation environments, and Digital Twin-enabled simulation-based testing techniques. By following this pipeline, developers and researchers can achieve comprehensive validation, minimize deployment risks, and prepare autonomous drones for safe and reliable real-world operations.

artificial intelligence, drone, module, (16 more...)

2506.114

Country: North America > United States (0.67)

Genre:

Workflow (1.00)
Instructional Material > Training Manual (0.61)

Industry:

Transportation > Air (1.00)
Media (1.00)
Aerospace & Defense > Aircraft (1.00)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Debiasing Online Preference Learning via Preference Feature Preservation

Kim, Dongyoung, Yoon, Jinsung, Shin, Jinwoo, Kim, Jaehyung

Recent preference learning frameworks for large language models (LLMs) simplify human preferences with binary pairwise comparisons and scalar rewards. This simplification could make LLMs' responses biased to mostly preferred features, and would be exacerbated during the iterations of online preference learning steps. To address these challenges, we propose a novel framework coined PFP (Preference Feature Preservation). The key idea of PFP is maintaining the distribution of human preference features and utilizing such rich signals throughout the online preference learning process. Specifically, PFP first extract preference features from offline pairwise human preference data and trains a feature classifier. Then, using trained classifier and the distribution preserving optimization, PFP maps appropriate preference features for a new input instruction during online learning. Lastly, PFP trains LLM using the existing preference learning method, by incorporating the preference feature into system prompts and enabling LLM to explicitly handle various human preferences. Our experiments demonstrate that PFP successfully mitigates the bias in preference features during online learning, and hence achieves superior performance compared to previous preference learning methods on standard benchmarks to evaluate LLM alignment.

large language model, machine learning, natural language, (17 more...)

2506.11098

Country: North America > United States > Michigan (0.28)

Genre:

Research Report > New Finding (0.93)
Instructional Material > Online (0.82)

Industry:

Information Technology (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)