AITopics

Conventional maneuver prediction methods use some sort of classification model on temporal trajectory data to predict behavior of agents over a set time horizon. Despite of having the best precision and recall, these models cannot predict a lane change accurately unless they incorporate information about the entire scene. Level-k game theory can leverage the human-like hierarchical reasoning to come up with the most rational decisions each agent can make in a group. This can be leveraged to model interactions between different vehicles in presence of each other and hence compute the most rational decisions each agent would make. The result of game theoretic evaluation can be used as a "prior" or combined with a traditional motion-based classification model to achieve more accurate predictions. The proposed approach assumes that the states of the vehicles around the target lead vehicle are known. The module will output the most rational maneuver prediction of the target vehicle based on an online optimization solution. These predictions are instrumental in decision making systems like Adaptive Cruise Control (ACC) or Traxen's iQ-Cruise further improving the resulting fuel savings.

artificial intelligence, machine learning, vehicle, (13 more...)

2509.21873

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.34)

Towards Minimal Causal Representations for Human Multimodal Language Understanding

Jiang, Menghua, Jiang, Yuncheng, Hu, Haifeng, Mai, Sijie

Human Multimodal Language Understanding (MLU) aims to infer human intentions by integrating related cues from heterogeneous modalities. Existing works predominantly follow a ``learning to attend" paradigm, which maximizes mutual information between data and labels to enhance predictive performance. However, such methods are vulnerable to unintended dataset biases, causing models to conflate statistical shortcuts with genuine causal features and resulting in degraded out-of-distribution (OOD) generalization. To alleviate this issue, we introduce a Causal Multimodal Information Bottleneck (CaMIB) model that leverages causal principles rather than traditional likelihood. Concretely, we first applies the information bottleneck to filter unimodal inputs, removing task-irrelevant noise. A parameterized mask generator then disentangles the fused multimodal representation into causal and shortcut subrepresentations. To ensure global consistency of causal features, we incorporate an instrumental variable constraint, and further adopt backdoor adjustment by randomly recombining causal and shortcut features to stabilize causal estimation. Extensive experiments on multimodal sentiment analysis, humor detection, and sarcasm detection, along with OOD test sets, demonstrate the effectiveness of CaMIB. Theoretical and empirical analyses further highlight its interpretability and soundness.

artificial intelligence, machine learning, natural language, (17 more...)

2509.21805

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Kakhi, Kourosh, Khosravi, Abbas, Alizadehsani, Roohallah, Acharyab, U. Rajendra

Exploring the Relationships Between Physiological Signals During Automated Fatigue Detection

Background: Fatigue detection through physiological signals has gained growing relevance across safety-critical domains such as transportation, healthcare, and human performance monitoring. While many studies focus on individual modalities (e.g., EEG or ECG), limited attention has been given to investigating statistical relationships between signal pairs as a means to enhance classification robustness. This study aims to explore how inter-signal statistical features correlation, cross-correlation, and covariance across multiple physiological signals can support fatigue state prediction. Methodology: Using the DROZY dataset, we extracted pairwise statistical features from four physiological signals: ECG, EMG, EOG, and EEG. Fifteen distinct signal combinations were evaluated, covering uni-modal to multi-modal configurations. Feature extraction emphasized statistical relationships between signals rather than raw amplitude characteristics. The extracted features were fed into four supervised machine learning classifiers: Decision Tree (DT), Random Forest (RF), Logistic Regression (LR), and XGBoost (XGB). Performance was assessed using accuracy, precision, recall, and area under the curve (AUC). Additionally, SHAP (SHapley Additive exPlanations) values were computed to evaluate feature importance and interpret model behavior. Results: Among all classifiers and signal combinations, XGBoost applied to the EMG| EEG combination achieved the highest classification performance, with an accuracy of 0.888 and an AUC of 0.975. SHAP-based ranking revealed that the correlation between ECG and EOG-H was the most influential feature across models. Feature interaction plots indicated non-linear relationships between statistical measures and fatigue levels. The multi-signal approach consistently outperformed single-signal models, with combinations involving EEG and EMG contributing most significantly to predictive power.

artificial intelligence, deep learning, machine learning, (18 more...)

2509.21794

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Chochlakis, Georgios, Trager, Jackson, Jhaveri, Vedant, Ravichandran, Nikhil, Potamianos, Alexandros, Narayanan, Shrikanth

Semantic F1 Scores: Fair Evaluation Under Fuzzy Class Boundaries

We propose Semantic F1 Scores, novel evaluation metrics for subjective or fuzzy multi-label classification that quantify semantic relatedness between predicted and gold labels. Unlike the conventional F1 metrics that treat semantically related predictions as complete failures, Semantic F1 incorporates a label similarity matrix to compute soft precision-like and recall-like scores, from which the Semantic F1 scores are derived. Unlike existing similarity-based metrics, our novel two-step precision-recall formulation enables the comparison of label sets of arbitrary sizes without discarding labels or forcing matches between dissimilar labels. By granting partial credit for semantically related but nonidentical labels, Semantic F1 better reflects the realities of domains marked by human disagreement or fuzzy category boundaries. In this way, it provides fairer evaluations: it recognizes that categories overlap, that annotators disagree, and that downstream decisions based on similar predictions lead to similar outcomes. Through theoretical justification and extensive empirical validation on synthetic and real data, we show that Semantic F1 demonstrates greater interpretability and ecological validity. Because it requires only a domain-appropriate similarity matrix, which is robust to misspecification, and not a rigid ontology, it is applicable across tasks and modalities.

information retrieval, large language model, machine learning, (21 more...)

2509.21633

Country: North America > United States > California (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
(2 more...)

Zhu, Yi, Guimarães, Heitor R., Pimentel, Arthur, Falk, Tiago

AUDDT: Audio Unified Deepfake Detection Benchmark Toolkit

With the prevalence of artificial intelligence (AI)-generated content, such as audio deepfakes, a large body of recent work has focused on developing deepfake detection techniques. However, most models are evaluated on a narrow set of datasets, leaving their generalization to real-world conditions uncertain. In this paper, we systematically review 28 existing audio deepfake datasets and present an open-source benchmarking toolkit called AUDDT (https://github.com/MuSAELab/AUDDT). The goal of this toolkit is to automate the evaluation of pretrained detectors across these 28 datasets, giving users direct feedback on the advantages and shortcomings of their deepfake detectors. We start by showcasing the usage of the developed toolkit, the composition of our benchmark, and the breakdown of different deepfake subgroups. Next, using a widely adopted pretrained deepfake detector, we present in- and out-of-domain detection results, revealing notable differences across conditions and audio manipulation types. Lastly, we also analyze the limitations of these existing datasets and their gap relative to practical deployment scenarios.

artificial intelligence, dataset, machine learning, (17 more...)

2509.21597

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Rudra, Manjari, Magleby, Daniel, Sikdar, Sujoy

C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset

Questions in political interviews and hearings serve strategic purposes beyond information gathering including advancing partisan narratives and shaping public perceptions. However, these strategic aspects remain understudied due to the lack of large-scale datasets for studying such discourse. Congressional hearings provide an especially rich and tractable site for studying political questioning: Interactions are structured by formal rules, witnesses are obliged to respond, and members with different political affiliations are guaranteed opportunities to ask questions, enabling comparisons of behaviors across the political spectrum. We develop a pipeline to extract question-answer pairs from unstructured hearing transcripts and construct a novel dataset of committee hearings from the 108th--117th Congress. Our analysis reveals systematic differences in questioning strategies across parties, by showing the party affiliation of questioners can be predicted from their questions alone. Our dataset and methods not only advance the study of congressional politics, but also provide a general framework for analyzing question-answering across interview-like settings.

large language model, machine learning, utterance, (21 more...)

2509.21548

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Mitton, Joshua, Bhattacharyya, Prarthana, Abboud, Ralph, Woodhead, Simon

Uncertainty-Aware Knowledge Tracing Models

The main focus of research on Knowledge Tracing (KT) models is on model developments with the aim of improving predictive accuracy. Most of these models make the most incorrect predictions when students choose a distractor, leading to student errors going undetected. We present an approach to add new capabilities to KT models by capturing predictive uncertainty and demonstrate that a larger predictive uncertainty aligns with model incorrect predictions. We show that uncertainty in KT models is informative and that this signal would be pedagogically useful for application in an educational learning platform that can be used in a limited resource setting where understanding student ability is necessary.

artificial intelligence, machine learning, student, (18 more...)

2509.21514

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.82)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

VideoJudge: Bootstrapping Enables Scalable Supervision of MLLM-as-a-Judge for Video Understanding

Waheed, Abdul, Wu, Zhen, Alharthi, Dareen, Kim, Seungone, Raj, Bhiksha

Precisely evaluating video understanding models remains challenging: commonly used metrics such as BLEU, ROUGE, and BERTScore fail to capture the fineness of human judgment, while obtaining such judgments through manual evaluation is costly. Recent work has explored using large language models (LLMs) or multimodal LLMs (MLLMs) as evaluators, but their extension to video understanding remains relatively unexplored. In this work, we introduce VideoJudge, a 3B and 7B-sized MLLM judge specialized to evaluate outputs from video understanding models (\textit{i.e.}, text responses conditioned on videos). To train VideoJudge, our recipe builds on the interplay between a generator and an evaluator: the generator is prompted to produce responses conditioned on a target rating, and responses not matching the evaluator's rating are discarded. Across three out of four meta-evaluation benchmarks, VideoJudge-7B outperforms larger MLLM judge baselines such as Qwen2.5-VL (32B and 72B). Notably, we find that LLM judges (Qwen3) models perform worse than MLLM judges (Qwen2.5-VL) and long chain-of-thought reasoning does not improve performance, indicating that providing video inputs is crucial for evaluation of video understanding tasks.

large language model, machine learning, natural language, (19 more...)