AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.64)

Neural Information Processing SystemsJan-25-2025, 18:28:49 GMT

Review for NeurIPS paper: Simple and Scalable Sparse k-means Clustering via Feature Ranking

The reviewers appreciate the algorithmic contributions of this paper and believe it will be on interest to the community.

feature ranking, scalable sparse k-means clustering, simple and scalable sparse, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Neural Information Processing SystemsJan-25-2025, 01:13:45 GMT

Review for NeurIPS paper: Sliding Window Algorithms for k-Clustering Problems

Summary and Contributions: The paper presents an algorithm for k-clustering in a sliding window streaming model, where k-clustering means the generalization of k-median and k-means to any fixed l_p-norm. The main theoretical result is an algorithm that achieves O(1)-approximation for points in arbitrary metric space and thus includes the prevalent of Euclidean metric, which is also used in the experimental evaluation. This algorithm is for sliding window streaming, where the algorithm repeatedly solves the clustering problem on the w most recent points in the stream (for parameter w). While the minimal requirement is to estimate the cost of a k-clustering, this algorithm also reports k center points. The usual motivation for this model is to allow old data to expire, and analyze only recent data.

algorithm, k-clustering problem, sliding window algorithm, (7 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.63)

Neural Information Processing SystemsJan-25-2025, 01:13:38 GMT

Review for NeurIPS paper: Sliding Window Algorithms for k-Clustering Problems

The paper considers k clustering problem in the sliding window streaming model. However, the authors are urged to consider the reviews when preparing the final version including explicitly stating a bound on the constant factor approximation obtained.

k-clustering problem, neurips paper, sliding window algorithm

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.84)

arXiv.org Artificial IntelligenceJan-25-2025

Causally-Aware Unsupervised Feature Selection Learning

Shen, Zongxin, Huang, Yanyong, Wang, Dongjie, Ma, Minbo, Lv, Fengmao, Li, Tianrui

Unsupervised feature selection (UFS) has recently gained attention for its effectiveness in processing unlabeled high-dimensional data. However, existing methods overlook the intrinsic causal mechanisms within the data, resulting in the selection of irrelevant features and poor interpretability. Additionally, previous graph-based methods fail to account for the differing impacts of non-causal and causal features in constructing the similarity graph, which leads to false links in the generated graph. To address these issues, a novel UFS method, called Causally-Aware UnSupErvised Feature Selection learning (CAUSE-FS), is proposed. CAUSE-FS introduces a novel causal regularizer that reweights samples to balance the confounding distribution of each treatment feature. This regularizer is subsequently integrated into a generalized unsupervised spectral regression model to mitigate spurious associations between features and clustering labels, thus achieving causal feature selection. Furthermore, CAUSE-FS employs causality-guided hierarchical clustering to partition features with varying causal contributions into multiple granularities. By integrating similarity graphs learned adaptively at different granularities, CAUSE-FS increases the importance of causal features when constructing the fused similarity graph to capture the reliable local structure of data. Extensive experimental results demonstrate the superiority of CAUSE-FS over state-of-the-art methods, with its interpretability further validated through feature visualization.

artificial intelligence, feature selection, machine learning, (19 more...)

2410.12224

Country:

North America > United States > Kansas > Douglas County > Lawrence (0.14)
Asia > Middle East > Jordan (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

LaHaye, Nicholas, Easley, Anistasija, Yun, Kyongsik, Lee, Huikyo, Linstead, Erik, Garay, Michael J., Kalashnikova, Olga V.

Development and Application of Self-Supervised Machine Learning for Smoke Plume and Active Fire Identification from the FIREX-AQ Datasets

arXiv.org Artificial IntelligenceJan-25-2025

Fire Influence on Regional to Global Environments and Air Quality (FIREX-AQ) was a field campaign aimed at better understanding the impact of wildfires and agricultural fires on air quality and climate. The FIREX-AQ campaign took place in August 2019 and involved two aircraft and multiple coordinated satellite observations. This study applied and evaluated a self-supervised machine learning (ML) method for the active fire and smoke plume identification and tracking in the satellite and sub-orbital remote sensing datasets collected during the campaign. Our unique methodology combines remote sensing observations with different spatial and spectral resolutions. The demonstrated approach successfully differentiates fire pixels and smoke plumes from background imagery, enabling the generation of a per-instrument smoke and fire mask product, as well as smoke and fire masks created from the fusion of selected data from independent instruments. This ML approach has a potential to enhance operational wildfire monitoring systems and improve decision-making in air quality management through fast smoke plume identification12 and tracking and could improve climate impact studies through fusion data from independent instruments.

artificial intelligence, instrument, machine learning, (14 more...)

2501.15343

Country: North America > United States > California (0.68)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.95)
Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.56)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Neural Information Processing SystemsJan-24-2025, 04:36:11 GMT

Reviews: k-Means Clustering of Lines for Big Data

The authors consider the problem of clustering a set of lines in R d. The goal is to minimize the k-means objective: given n lines L in R d find the best set of k points c1,...,ck in R d so as to minimize sum_{l in L} min_{ci} dist(ci, l) 2. This a clean, nicely motivated problem. The authors provide a coreset construction (namely a small size summary of the input so that any alpha-approximation for the summary yields an alpha(1 epsilon)-approximation for the entire input). This implies the first (1 epsilon)-approximation for the problem with running time nd exp(poly(k)) together with a streaming algorithm with similar running time and memory size 2 {poly(k)} log n. En route to the result the authors provide a bicriteria approximation algorithms: namely a solution that contains O(k (log n dk log k)) centers and whose cost is at most 4 times the cost of the optimal solution with k centers.

algorithm, big data, k-means clustering, (3 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.52)

Neural Information Processing SystemsJan-24-2025, 04:36:00 GMT

Reviews: k-Means Clustering of Lines for Big Data

This paper proposes an PTAS for k-means clustering of lines. The key contribution is the construction of a small coreset, on which brute force algorithms are run. The authors also extend this to the streaming setting. An important computer vision application is used as motivation. The authors should revise the final version to address the issues raised by the reviewers, and make it more readable to researchers in related but not in the exact area.

big data, k-means clustering

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Berahmand, Kamal, Saberi-Movahed, Farid, Sheikhpour, Razieh, Li, Yuefeng, Jalili, Mahdi

A Comprehensive Survey on Spectral Clustering with Graph Structure Learning

arXiv.org Artificial IntelligenceJan-24-2025

Spectral clustering is a powerful technique for clustering high-dimensional data, utilizing graph-based representations to detect complex, non-linear structures and non-convex clusters. The construction of a similarity graph is essential for ensuring accurate and effective clustering, making graph structure learning (GSL) central for enhancing spectral clustering performance in response to the growing demand for scalable solutions. Despite advancements in GSL, there is a lack of comprehensive surveys specifically addressing its role within spectral clustering. To bridge this gap, this survey presents a comprehensive review of spectral clustering methods, emphasizing on the critical role of GSL. We explore various graph construction techniques, including pairwise, anchor, and hypergraph-based methods, in both fixed and adaptive settings. Additionally, we categorize spectral clustering approaches into single-view and multi-view frameworks, examining their applications within one-step and two-step clustering processes. We also discuss multi-view information fusion techniques and their impact on clustering data. By addressing current challenges and proposing future research directions, this survey provides valuable insights for advancing spectral clustering methodologies and highlights the pivotal role of GSL in tackling large-scale and high-dimensional data clustering tasks.

artificial intelligence, machine learning, spectral, (18 more...)

2501.13597

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Oceania > Australia > Queensland (0.04)
Asia > Middle East > Iran > Kerman Province > Kerman (0.04)
(6 more...)

Genre: Overview (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Torri, Vittorio, Bottelli, Annamaria, Ercolanoni, Michele, Leoni, Olivia, Ieva, Francesca

NLP-based assessment of prescription appropriateness from Italian referrals

arXiv.org Artificial IntelligenceJan-24-2025

Objective: This study proposes a Natural Language Processing pipeline to evaluate prescription appropriateness in Italian referrals, where reasons for prescriptions are recorded only as free text, complicating automated comparisons with guidelines. The pipeline aims to derive, for the first time, a comprehensive summary of the reasons behind these referrals and a quantification of their appropriateness. While demonstrated in a specific case study, the approach is designed to generalize to other types of examinations. Methods: Leveraging embeddings from a transformer-based model, the proposed approach clusters referral texts, maps clusters to labels, and aligns these labels with existing guidelines. We present a case study on a dataset of 496,971 referrals, consisting of all referrals for venous echocolordopplers of the lower limbs between 2019 and 2021 in the Lombardy Region. A sample of 1,000 referrals was manually annotated to validate the results. Results: The pipeline exhibited high performance for referrals' reasons (Prec=92.43%, Rec=83.28%) and excellent results for referrals' appropriateness (Prec=93.58%, Rec=91.52%) on the annotated subset. Analysis of the entire dataset identified clusters matching guideline-defined reasons - both appropriate and inappropriate - as well as clusters not addressed in the guidelines. Overall, 34.32% of referrals were marked as appropriate, 34.07% inappropriate, 14.37% likely inappropriate, and 17.24% could not be mapped to guidelines. Conclusions: The proposed pipeline effectively assessed prescription appropriateness across a large dataset, serving as a valuable tool for health authorities. Findings have informed the Lombardy Region's efforts to strengthen recommendations and reduce the burden of inappropriate referrals.

artificial intelligence, machine learning, natural language, (20 more...)

2501.14701

Country:

Europe > Italy > Lombardy (0.45)
South America > Chile (0.04)
North America > United States (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
(9 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)