AITopics

2503.06635

Country:

Asia > Macao (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceMar-9-2025

When Unsupervised Domain Adaptation meets One-class Anomaly Detection: Addressing the Two-fold Unsupervised Curse by Leveraging Anomaly Scarcity

Mejri, Nesryne, Ghorbel, Enjie, Kacem, Anis, Chernakov, Pavel, Foteinopoulou, Niki, Aouada, Djamila

This paper introduces the first fully unsupervised domain adaptation (UDA) framework for unsupervised anomaly detection (UAD). The performance of UAD techniques degrades significantly in the presence of a domain shift, difficult to avoid in a real-world setting. While UDA has contributed to solving this issue in binary and multi-class classification, such a strategy is ill-posed in UAD. This might be explained by the unsupervised nature of the two tasks, namely, domain adaptation and anomaly detection. Herein, we first formulate this problem that we call the two-fold unsupervised curse. Then, we propose a pioneering solution to this curse, considered intractable so far, by assuming that anomalies are rare. Specifically, we leverage clustering techniques to identify a dominant cluster in the target feature space. Posed as the normal cluster, the latter is aligned with the source normal features. Concretely, given a one-class source set and an unlabeled target set composed mostly of normal data and some anomalies, we fit the source features within a hypersphere while jointly aligning them with the features of the dominant cluster from the target set. The paper provides extensive experiments and analysis on common adaptation benchmarks for anomaly detection, demonstrating the relevance of both the newly introduced paradigm and the proposed approach. The code will be made publicly available.

adaptation, anomaly detection, detection, (14 more...)

2502.21022

Country:

North America > United States > Arizona > Pima County > Tucson (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Greece (0.04)
Africa > Middle East > Tunisia > Manouba Governorate > Manouba (0.04)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceMar-8-2025

Multi-view Spectral Clustering on the Grassmannian Manifold With Hypergraph Representation

Yang, Murong, Ying, Shihui, Xu, Xin-Jian, Gao, Yue

Graph-based multi-view spectral clustering methods have achieved notable progress recently, yet they often fall short in either oversimplifying pairwise relationships or struggling with inefficient spectral decompositions in high-dimensional Euclidean spaces. In this paper, we introduce a novel approach that begins to generate hypergraphs by leveraging sparse representation learning from data points. Based on the generated hypergraph, we propose an optimization function with orthogonality constraints for multi-view hypergraph spectral clustering, which incorporates spectral clustering for each view and ensures consistency across different views. In Euclidean space, solving the orthogonality-constrained optimization problem may yield local maxima and approximation errors. Innovately, we transform this problem into an unconstrained form on the Grassmannian manifold. Finally, we devise an alternating iterative Riemannian optimization algorithm to solve the problem. To validate the effectiveness of the proposed algorithm, we test it on four real-world multi-view datasets and compare its performance with seven state-of-the-art multi-view clustering algorithms. The experimental results demonstrate that our method outperforms the baselines in terms of clustering performance due to its superior low-dimensional and resilient feature representation.

hypergraph, manifold, spectral, (16 more...)

2503.06066

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningMar-8-2025

Clustering-based Meta Bayesian Optimization with Theoretical Guarantee

Nguyen, Khoa, Huynh, Viet, Tran, Binh, Pham, Tri, Huynh, Tin, Nguyen, Thin

Bayesian Optimization (BO) is a well-established method for addressing black-box optimization problems. In many real-world scenarios, optimization often involves multiple functions, emphasizing the importance of leveraging data and learned functions from prior tasks to enhance efficiency in the current task. To expedite convergence to the global optimum, recent studies have introduced meta-learning strategies, collectively referred to as meta-BO, to incorporate knowledge from historical tasks. However, in practical settings, the underlying functions are often heterogeneous, which can adversely affect optimization performance for the current task. Additionally, when the number of historical tasks is large, meta-BO methods face significant scalability challenges. In this work, we propose a scalable and robust meta-BO method designed to address key challenges in heterogeneous and large-scale meta-tasks. Our approach (1) effectively partitions transferred meta-functions into highly homogeneous clusters, (2) learns the geometry-based surrogate prototype that capture the structural patterns within each cluster, and (3) adaptively synthesizes meta-priors during the online phase using statistical distance-based weighting policies. Experimental results on real-world hyperparameter optimization (HPO) tasks, combined with theoretical guarantees, demonstrate the robustness and effectiveness of our method in overcoming these challenges.

clustering-based meta bayesian optimization, gaussian process, posterior, (12 more...)

arXiv.org Machine Learning

2503.06093

Country:

Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.14)
Oceania > Australia > Western Australia > Perth (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceMar-7-2025

RURANET++: An Unsupervised Learning Method for Diabetic Macular Edema Based on SCSE Attention Mechanisms and Dynamic Multi-Projection Head Clustering

Yang, Wei, Zhu, Yiran, Shen, Jiayu, Tang, Yuhan, Pan, Chengchang, He, Hui, Su, Yan, Qi, Honggang

Diabetic Macular Edema (DME), a prevalent complication among diabetic patients, constitutes a major cause of visual impairment and blindness. Although deep learning has achieved remarkable progress in medical image analysis, traditional DME diagnosis still relies on extensive annotated data and subjective ophthalmologist assessments, limiting practical applications. To address this, we present RURANET++, an unsupervised learning-based automated DME diagnostic system. This framework incorporates an optimized U-Net architecture with embedded Spatial and Channel Squeeze & Excitation (SCSE) attention mechanisms to enhance lesion feature extraction. During feature processing, a pre-trained GoogLeNet model extracts deep features from retinal images, followed by PCA-based dimensionality reduction to 50 dimensions for computational efficiency. Notably, we introduce a novel clustering algorithm employing multi-projection heads to explicitly control cluster diversity while dynamically adjusting similarity thresholds, thereby optimizing intra-class consistency and inter-class discrimination. Experimental results demonstrate superior performance across multiple metrics, achieving maximum accuracy (0.8411), precision (0.8593), recall (0.8411), and F1-score (0.8390), with exceptional clustering quality. This work provides an efficient unsupervised solution for DME diagnosis with significant clinical implications.

accuracy, computational efficiency, diabetic macular edema, (11 more...)

2502.20224

Country:

Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.49)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Anton, Cristina, Smith, Iain

Cluster weighted models for functional data

arXiv.org Machine LearningMar-7-2025

We propose a method, funWeightClust, based on a family of parsimonious models for clustering heterogeneous functional linear regression data. These models extend cluster weighted models to functional data, and they allow for multivariate functional responses and predictors. The proposed methodology follows the approach used by the the functional high dimensional data clustering (funHDDC) method. We construct an expectation maximization (EM) algorithm for parameter estimation. Using simulated and benchmark data we show that funWeightClust outperforms funHDDC and several two-steps clustering methods. We also use funWeightClust to analyze traffic patterns in Edmonton, Canada.

algorithm, functional data, funweightclust, (16 more...)

arXiv.org Machine Learning

2503.05159

Country:

North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.34)
North America > United States > New York (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)

Genre: Research Report (0.50)

Industry: Transportation (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Zuziak, Maciej Krzysztof, Pellungrini, Roberto, Rinzivillo, Salvatore

One-Shot Clustering for Federated Learning

Federated Learning (FL) is a widespread and well adopted paradigm of decentralized learning that allows training one model from multiple sources without the need to directly transfer data between participating clients. Since its inception in 2015, it has been divided into numerous sub-fields that deal with application-specific issues, be it data heterogeneity or resource allocation. One such sub-field, Clustered Federated Learning (CFL), is dealing with the problem of clustering the population of clients into separate cohorts to deliver personalized models. Although few remarkable works have been published in this domain, the problem is still largely unexplored, as its basic assumption and settings are slightly different from standard FL. In this work, we present One-Shot Clustered Federated Learning (OCFL), a clustering-agnostic algorithm that can automatically detect the earliest suitable moment for clustering. Our algorithm is based on the computation of cosine similarity between gradients of the clients and a temperature measure that detects when the federated model starts to converge. We empirically evaluate our methodology by testing various one-shot clustering algorithms for over thirty different tasks on three benchmark datasets. Our experiments showcase the good performance of our approach when used to perform CFL in an automated manner without the need to adjust hyperparameters.

algorithm, artificial intelligence, machine learning, (16 more...)

doi: 10.1109/BigData62323.2024.10825763

2503.04231

Country:

Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.89)

Zheng, Laura, Araghi, Hamidreza Yaghoubi, Wu, Tony, Thalapanane, Sandeep, Zhou, Tianyi, Lin, Ming C.

Quantifying and Modeling Driving Styles in Trajectory Forecasting

Trajectory forecasting has become a popular deep learning task due to its relevance for scenario simulation for autonomous driving. Specifically, trajectory forecasting predicts the trajectory of a short-horizon future for specific human drivers in a particular traffic scenario. Robust and accurate future predictions can enable autonomous driving planners to optimize for low-risk and predictable outcomes for human drivers around them. Although some work has been done to model driving style in planning and personalized autonomous polices, a gap exists in explicitly modeling human driving styles for trajectory forecasting of human behavior. Human driving style is most certainly a correlating factor to decision making, especially in edge-case scenarios where risk is nontrivial, as justified by the large amount of traffic psychology literature on risky driving. So far, the current real-world datasets for trajectory forecasting lack insight on the variety of represented driving styles. While the datasets may represent real-world distributions of driving styles, we posit that fringe driving style types may also be correlated with edge-case safety scenarios. In this work, we conduct analyses on existing real-world trajectory datasets for driving and dissect these works from the lens of driving styles, which is often intangible and non-standardized.

dataset, scenario, trajectory, (16 more...)

2503.04994

Country: North America > United States > Maryland > Prince George's County > College Park (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.87)

Matrix Factorization for Inferring Associations and Missing Links

Barron, Ryan, Eren, Maksim E., Truong, Duc P., Matuszek, Cynthia, Wendelberger, James, Dorn, Mary F., Alexandrov, Boian

Missing link prediction is a method for network analysis, with applications in recommender systems, biology, social sciences, cybersecurity, information retrieval, and Artificial Intelligence (AI) reasoning in Knowledge Graphs. Missing link prediction identifies unseen but potentially existing connections in a network by analyzing the observed patterns and relationships. In proliferation detection, this supports efforts to identify and characterize attempts by state and non-state actors to acquire nuclear weapons or associated technology - a notoriously challenging but vital mission for global security. Dimensionality reduction techniques like Non-Negative Matrix Factorization (NMF) and Logistic Matrix Factorization (LMF) are effective but require selection of the matrix rank parameter, that is, of the number of hidden features, k, to avoid over/under-fitting. We introduce novel Weighted (WNMFk), Boolean (BNMFk), and Recommender (RNMFk) matrix factorization methods, along with ensemble variants incorporating logistic factorization, for link prediction. Our methods integrate automatic model determination for rank estimation by evaluating stability and accuracy using a modified bootstrap methodology and uncertainty quantification (UQ), assessing prediction reliability under random perturbations. We incorporate Otsu threshold selection and k-means clustering for Boolean matrix factorization, comparing them to coordinate descent-based Boolean thresholding. Our experiments highlight the impact of rank k selection, evaluate model performance under varying test-set sizes, and demonstrate the benefits of UQ for reliable predictions using abstention. We validate our methods on three synthetic datasets (Boolean and uniformly distributed) and benchmark them against LMF and symmetric LMF (symLMF) on five real-world protein-protein interaction networks, showcasing an improved prediction performance.

dataset, matrix factorization, prediction, (13 more...)

2503.0468

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Maryland > Baltimore County (0.14)
North America > United States > Maryland > Baltimore (0.14)
(7 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.92)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Military (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)
(2 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

A Graph-Partitioning Based Continuous Optimization Approach to Semi-supervised Clustering Problems

Liu, Wei, Liu, Xin, Ng, Michael K., Zhang, Zaikun

Semi-supervised clustering is a basic problem in various applications. Most existing methods require knowledge of the ideal cluster number, which is often difficult to obtain in practice. Besides, satisfying the must-link constraints is another major challenge for these methods. In this work, we view the semi-supervised clustering task as a partitioning problem on a graph associated with the given dataset, where the similarity matrix includes a scaling parameter to reflect the must-link constraints. Utilizing a relaxation technique, we formulate the graph partitioning problem into a continuous optimization model that does not require the exact cluster number, but only an overestimate of it. We then propose a block coordinate descent algorithm to efficiently solve this model, and establish its convergence result. Based on the obtained solution, we can construct the clusters that theoretically meet the must-link constraints under mild assumptions. Furthermore, we verify the effectiveness and efficiency of our proposed method through comprehensive numerical experiments.

algorithm, must-link constraint, theorem 2, (13 more...)

2503.04447

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > New York (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)