AITopics

The use of representative pre-crash scenarios is critical for assessing the safety impact of driving automation systems through simulation. However, a gap remains in the robust evaluation of the similarity between synthetic and real-world pre-crash scenarios and their crash characteristics. Without proper validation, it cannot be ensured that the synthetic test scenarios adequately represent real-world driving behaviors and crash characteristics. One reason for this validation gap is the lack of focus on methods to confirm that the synthetic test scenarios are practically equivalent to real-world ones, given the assessment scope. Traditional statistical methods, like significance testing, focus on detecting differences rather than establishing equivalence; since failure to detect a difference does not imply equivalence, they are of limited applicability for validating synthetic pre-crash scenarios and crash characteristics. This study addresses this gap by proposing an equivalence testing method based on the Bayesian Region of Practical Equivalence (ROPE) framework. This method is designed to assess the practical equivalence of scenario characteristics that are most relevant for the intended assessment, making it particularly appropriate for the domain of virtual safety assessments. We first review existing equivalence testing methods. Then we propose and demonstrate the Bayesian ROPE-based method by testing the equivalence of two rear-end pre-crash datasets. Our approach focuses on the most relevant scenario characteristics. Our analysis provides insights into the practicalities and effectiveness of equivalence testing in synthetic test scenario validation and demonstrates the importance of testing for improving the credibility of synthetic data for automated vehicle safety assessment, as well as the credibility of subsequent safety impact assessments.

artificial intelligence, assessment, machine learning, (16 more...)

2505.12827

Country:

Europe (0.69)
North America > United States (0.46)

Genre: Research Report > Experimental Study (0.94)

Industry:

Automobiles & Trucks (0.68)
Transportation > Ground > Road (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Ishida, Takuro, Furukawa, Tetsuo

Heterogeneous co-occurrence embedding for visual information exploration

This paper proposes an embedding method for co-occurrence data aimed at visual information exploration. We consider cases where co-occurrence probabilities are measured between pairs of elements from heterogeneous domains. The proposed method maps these heterogeneous elements into corresponding two-dimensional latent spaces, enabling visualization of asymmetric relationships between the domains. The key idea is to embed the elements in a way that maximizes their mutual information, thereby preserving the original dependency structure as much as possible. This approach can be naturally extended to cases involving three or more domains, using a generalization of mutual information known as total correlation. For inter-domain analysis, we also propose a visualization method that assigns colors to the latent spaces based on conditional probabilities, allowing users to explore asymmetric relationships interactively. We demonstrate the utility of the method through applications to an adjective-noun dataset, the NeurIPS dataset, and a subject-verb-object dataset, showcasing both intra- and inter-domain analysis.

data mining, machine learning, natural language, (19 more...)

2508.17663

Country: Asia > Japan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

A Systematic Literature Review on Multi-label Data Stream Classification

Freire-Oliveira, H., Paiva, E. R. F., Gama, J., Khan, L., Cerri, R.

Classification in the context of multi-label data streams represents a challenge that has attracted significant attention due to its high real-world applicability. However, this task faces problems inherent to dynamic environments, such as the continuous arrival of data at high speed and volume, changes in the data distribution (concept drift), the emergence of new labels (concept evolution), and the latency in the arrival of ground truth labels. This systematic literature review presents an in-depth analysis of multi-label data stream classification proposals. We characterize the latest methods in the literature, providing a comprehensive overview, building a thorough hierarchy, and discussing how the proposals approach each problem. Furthermore, we discuss the adopted evaluation strategies and analyze the methods' asymptotic complexity and resource consumption. Finally, we identify the main gaps and offer recommendations for future research directions in the field.

classifier, data mining, machine learning, (18 more...)

2508.17455

Country:

Europe (0.67)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(6 more...)

Muhammad, Aoun E, Yow, Kin-Choong, Bacanin-Dzakula, Nebojsa, Khan, Muhammad Attique

L-XAIDS: A LIME-based eXplainable AI framework for Intrusion Detection Systems

Recent developments in Artificial Intelligence (AI) and their applications in critical industries such as healthcare, fin-tech and cybersecurity have led to a surge in research in explainability in AI. Innovative research methods are being explored to extract meaningful insight from blackbox AI systems to make the decision-making technology transparent and interpretable. Explainability becomes all the more critical when AI is used in decision making in domains like fintech, healthcare and safety critical systems such as cybersecurity and autonomous vehicles. However, there is still ambiguity lingering on the reliable evaluations for the users and nature of transparency in the explanations provided for the decisions made by black-boxed AI. To solve the blackbox nature of Machine Learning based Intrusion Detection Systems, a framework is proposed in this paper to give an explanation for IDSs decision making. This framework uses Local Interpretable Model-Agnostic Explanations (LIME) coupled with Explain Like I'm five (ELI5) and Decision Tree algorithms to provide local and global explanations and improve the interpretation of IDSs. The local explanations provide the justification for the decision made on a specific input. Whereas, the global explanations provides the list of significant features and their relationship with attack traffic. In addition, this framework brings transparency in the field of ML driven IDS that might be highly significant for wide scale adoption of eXplainable AI in cyber-critical systems. Our framework is able to achieve 85 percent accuracy in classifying attack behaviour on UNSW-NB15 dataset, while at the same time displaying the feature significance ranking of the top 10 features used in the classification.

data mining, explanation, machine learning, (19 more...)

doi: 10.1007/s10586-025-05326-9

2508.17244

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)
Overview (0.92)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(8 more...)

Modeling Probabilistic Reduction using Information Theory and Naive Discriminative Learning

Stein, Anna, Tang, Kevin

This study compares probabilistic predictors based on information theory with Naive Discriminative Learning (NDL) predictors in modeling acoustic word duration, focusing on probabilistic reduction. We examine three models using the Buckeye corpus: one with NDL-derived predictors using information-theoretic formulas, one with traditional NDL predictors, and one with N-gram probabilistic predictors. Results show that the N-gram model outperforms both NDL models, challenging the assumption that NDL is more effective due to its cognitive motivation. However, incorporating information-theoretic formulas into NDL improves model performance over the traditional model. This research highlights a) the need to incorporate not only frequency and contextual predictability but also average contextual predictability, and b) the importance of combining information-theoretic metrics of predictability and information derived from discriminative learning in modeling acoustic reduction.

machine learning, natural language, predictor, (17 more...)

doi: 10.21437/Interspeech.2025-2246

2506.09641

Country: Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Information Management (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningAug-25-2025

Flow Matching-Based Generative Modeling for Efficient and Scalable Data Assimilation

Transue, Taos, Chen, Bohan, Takao, So, Wang, Bao

Data assimilation (DA) is the problem of sequentially estimating the state of a dynamical system from noisy observations. Recent advances in generative modeling have inspired new approaches to DA in high-dimensional nonlinear settings, especially the ensemble score filter (EnSF). However, these come at a significant computational burden due to slow sampling. In this paper, we introduce a new filtering framework based on flow matching (FM) -- called the ensemble flow filter (EnFF) -- to accelerate sampling and enable flexible design of probability paths. EnFF -- a training-free DA approach -- integrates MC estimators for the marginal FM vector field (VF) and a localized guidance to assimilate observations. EnFF has faster sampling and more flexibility in VF design compared to existing generative modeling for DA. Theoretically, we show that EnFF encompasses classical filtering methods such as the bootstrap particle filter and the ensemble Kalman filter as special cases. Experiments on high-dimensional filtering benchmarks demonstrate improved cost-accuracy tradeoffs and the ability to leverage larger ensembles than prior methods. Our results highlight the promise of FM as a scalable tool for filtering in high-dimensional applications that enable the use of large ensembles.

artificial intelligence, bayesian inference, machine learning, (12 more...)

arXiv.org Machine Learning

2508.13313

Country:

North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
North America > United States > North Carolina > Orange County > Chapel Hill (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Balcı, Ali Emre, Keyvan, Erhan Ege, Özkan, Emre

GPL-SLAM: A Laser SLAM Framework with Gaussian Process Based Extended Landmarks

arXiv.org Artificial IntelligenceAug-25-2025

We present a novel Simultaneous Localization and Mapping (SLAM) method that employs Gaussian Process (GP) based landmark (object) representations. Instead of conventional grid maps or point cloud registration, we model the environment on a per object basis using GP based contour representations. These contours are updated online through a recursive scheme, enabling efficient memory usage. The SLAM problem is formulated within a fully Bayesian framework, allowing joint inference over the robot pose and object based map. This representation provides semantic information such as the number of objects and their areas, while also supporting probabilistic measurement to object associations. Furthermore, the GP based contours yield confidence bounds on object shapes, offering valuable information for downstream tasks like safe navigation and exploration. We validate our method on synthetic and real world experiments, and show that it delivers accurate localization and mapping performance across diverse structured environments.

artificial intelligence, machine learning, representation, (15 more...)

2508.16459

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
(2 more...)

Xie, Shuilian, Imani, Mahdi, Dougherty, Edward R., Braga-Neto, Ulisses M.

A State-Space Approach to Nonstationary Discriminant Analysis

arXiv.org Artificial IntelligenceAug-25-2025

Classical discriminant analysis assumes identically distributed training data, yet in many applications observations are collected over time and the class-conditional distributions drift. This population drift renders stationary classifiers unreliable. We propose a principled, model-based framework that embeds discriminant analysis within state-space models to obtain nonstationary linear discriminant analysis (NSLDA) and nonstationary quadratic discriminant analysis (NSQDA). For linear-Gaussian dynamics, we adapt Kalman smoothing to handle multiple samples per time step and develop two practical extensions: (i) an expectation-maximization (EM) approach that jointly estimates unknown system parameters, and (ii) a Gaussian mixture model (GMM)-Kalman method that simultaneously recovers unobserved time labels and parameters, a scenario common in practice. To address nonlinear or non-Gaussian drift, we employ particle smoothing to estimate time-varying class centroids, yielding fully nonstationary discriminant rules. Extensive simulations demonstrate consistent improvements over stationary linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), and support vector machine (SVM) baselines, with robustness to noise, missing data, and class imbalance. This paper establishes a unified and data-efficient foundation for discriminant analysis under temporal distribution shift.

artificial intelligence, bayesian inference, machine learning, (16 more...)

2508.16073

Country: Europe (0.28)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Mohammed, Idrees, Hassani, Hossein

Mining Mental Health Signals: A Comparative Study of Four Machine Learning Methods for Depression Detection from Social Media Posts in Sorani Kurdish

arXiv.org Artificial IntelligenceAug-25-2025

Depression is a common mental health condition that can lead to hopelessness, loss of interest, self-harm, and even suicide. Early detection is challenging due to individuals not self-reporting or seeking timely clinical help. With the rise of social media, users increasingly express emotions online, offering new opportunities for detection through text analysis. While prior research has focused on languages such as English, no studies exist for Sorani Kurdish. This work presents a machine learning and Natural Language Processing (NLP) approach to detect depression in Sorani tweets. A set of depression-related keywords was developed with expert input to collect 960 public tweets from X (Twitter platform). The dataset was annotated into three classes: Shows depression, Not-show depression, and Suspicious by academics and final year medical students at the University of Kurdistan Hewlêr. Four supervised models, including Support Vector Machines, Multinomial Naive Bayes, Logistic Regression, and Random Forest, were trained and evaluated, with Random Forest achieving the highest performance accuracy and F1-score of 80%. This study establishes a baseline for automated depression detection in Kurdish language contexts.

artificial intelligence, deep learning, machine learning, (14 more...)