AITopics

2501.06222

Country: North America > United States (0.29)

Genre:

Overview (1.00)
Research Report > New Finding (0.88)

Industry:

Health & Medicine > Consumer Health (1.00)
Information Technology > Security & Privacy (0.93)
Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Artificial IntelligenceJan-6-2025

RAHN: A Reputation Based Hourglass Network for Web Service QoS Prediction

Chen, Xia, Du, Yugen, Tang, Guoxing, Luo, Yingwei, Ma, Benchi

As the homogenization of Web services becomes more and more common, the difficulty of service recommendation is gradually increasing. How to predict Quality of Service (QoS) more efficiently and accurately becomes an important challenge for service recommendation. Considering the excellent role of reputation and deep learning (DL) techniques in the field of QoS prediction, we propose a reputation and DL based QoS prediction network, RAHN, which contains the Reputation Calculation Module (RCM), the Latent Feature Extraction Module (LFEM), and the QoS Prediction Hourglass Network (QPHN). RCM obtains the user reputation and the service reputation by using a clustering algorithm and a Logit model. LFEM extracts latent features from known information to form an initial latent feature vector. QPHN aggregates latent feature vectors with different scales by using Attention Mechanism, and can be stacked multiple times to obtain the final latent feature vector for prediction. We evaluate RAHN on a real QoS dataset. The experimental results show that the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) of RAHN are smaller than the six baseline methods.

artificial intelligence, machine learning, prediction, (17 more...)

doi: 10.18293/SEKE2024-024

2501.02843

Country: Asia > China (0.15)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Ghosh, Debashis, Wang, Lei

Coarsened confounding for causal effects: a large-sample framework

arXiv.org Machine LearningJan-6-2025

There has been widespread use of causal inference methods for the rigorous analysis of observational studies and to identify policy evaluations. In this article, we consider coarsened exact matching, developed in Iacus et al. (2011). While they developed some statistical properties, in this article, we study the approach using asymptotics based on a superpopulation inferential framework. This methodology is generalized to what we termed as coarsened confounding, for which we propose two new algorithms. We develop asymptotic results for the average causal effect estimator as well as providing conditions for consistency. In addition, we provide an asymptotic justification for the variance formulae in Iacus et al. (2011). A bias correction technique is proposed, and we apply the proposed methodology to data from two well-known observational studies.

artificial intelligence, coarsened exact matching, machine learning, (14 more...)

arXiv.org Machine Learning

2501.03129

Country: North America > United States (1.00)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Singh, Priyanshu, Ahuja, Kapil

Chameleon2++: An Efficient Chameleon2 Clustering with Approximate Nearest Neighbors

arXiv.org Artificial IntelligenceJan-5-2025

Clustering algorithms are fundamental tools in data analysis, with hierarchical methods being particularly valuable for their flexibility. Chameleon is a widely used hierarchical clustering algorithm that excels at identifying high-quality clusters of arbitrary shapes, sizes, and densities. Chameleon2 is the most recent variant that has demonstrated significant improvements, but suffers from critical failings and there are certain improvements that can be made. The first failure we address is that the complexity of Chameleon2 is claimed to be $O(n^2)$, while we demonstrate that it is actually $O(n^2\log{n})$, with $n$ being the number of data points. Furthermore, we suggest improvements to Chameleon2 that ensure that the complexity remains $O(n^2)$ with minimal to no loss of performance. The second failing of Chameleon2 is that it lacks transparency and it does not provide the fine-tuned algorithm parameters used to obtain the claimed results. We meticulously provide all such parameter values to enhance replicability. The improvement which we make in Chameleon2 is that we replace the exact $k$-NN search with an approximate $k$-NN search. This further reduces the algorithmic complexity down to $O(n\log{n})$ without any performance loss. Here, we primarily configure three approximate nearest neighbor search algorithms (Annoy, FLANN and NMSLIB) to align with the overarching Chameleon2 clustering framework. Experimental evaluations on standard benchmark datasets demonstrate that the proposed Chameleon2++ algorithm is more efficient, robust, and computationally optimal.

artificial intelligence, data mining, machine learning, (20 more...)

2501.02612

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Delacour, Paul-Louis, Wahls, Sander, Spraggins, Jeffrey M., Migas, Lukasz, Van de Plas, Raf

Signal Recovery Using a Spiked Mixture Model

arXiv.org Machine LearningJan-3-2025

We introduce the spiked mixture model (SMM) to address the problem of estimating a set of signals from many randomly scaled and noisy observations. Subsequently, we design a novel expectation-maximization (EM) algorithm to recover all parameters of the SMM. Numerical experiments show that in low signal-to-noise ratio regimes, and for data types where the SMM is relevant, SMM surpasses the more traditional Gaussian mixture model (GMM) in terms of signal recovery performance. The broad relevance of the SMM and its corresponding EM recovery algorithm is demonstrated by applying the technique to different data types. The first case study is a biomedical research application, utilizing an imaging mass spectrometry dataset to explore the molecular content of a rat brain tissue section at micrometer scale. The second case study demonstrates SMM performance in a computer vision application, segmenting a hyperspectral imaging dataset into underlying patterns. While the measurement modalities differ substantially, in both case studies SMM is shown to recover signals that were missed by traditional methods such as k-means clustering and GMM.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

2501.0184

Country:

Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

MRG: A Multi-Robot Manufacturing Digital Scene Generation Method Using Multi-Instance Point Cloud Registration

Han, Songjie, Liu, Yinhua, Li, Yanzheng, Chen, Hua, Yang, Dongmei

A high-fidelity digital simulation environment is crucial for accurately replicating physical operational processes. However, inconsistencies between simulation and physical environments result in low confidence in simulation outcomes, limiting their effectiveness in guiding real-world production. Unlike the traditional step-by-step point cloud "segmentation-registration" generation method, this paper introduces, for the first time, a novel Multi-Robot Manufacturing Digital Scene Generation (MRG) method that leverages multi-instance point cloud registration, specifically within manufacturing scenes. Tailored to the characteristics of industrial robots and manufacturing settings, an instance-focused transformer module is developed to delineate instance boundaries and capture correlations between local regions. Additionally, a hypothesis generation module is proposed to extract target instances while preserving key features. Finally, an efficient screening and optimization algorithm is designed to refine the final registration results. Experimental evaluations on the Scan2CAD and Welding-Station datasets demonstrate that: (1) the proposed method outperforms existing multi-instance point cloud registration techniques; (2) compared to state-of-the-art methods, the Scan2CAD dataset achieves improvements in MR and MP by 12.15% and 17.79%, respectively; and (3) on the Welding-Station dataset, MR and MP are enhanced by 16.95% and 24.15%, respectively. This work marks the first application of multi-instance point cloud registration in manufacturing scenes, significantly advancing the precision and reliability of digital simulation environments for industrial applications.

correspondence, module, point cloud, (10 more...)

2501.02041

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (0.54)
Automobiles & Trucks (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Deep Clustering via Community Detection

Cheng, Tianyu, Chen, Qun

Deep clustering is an essential task in modern artificial intelligence, aiming to partition a set of data samples into a given number of homogeneous groups (i.e., clusters). Even though many Deep Neural Network (DNN) backbones and clustering strategies have been proposed for the task, achieving increasingly improved performance, deep clustering remains very challenging due to the lack of accurately labeled samples. In this paper, we propose a novel approach of deep clustering via community detection. It initializes clustering by detecting many communities, and then gradually expands clusters by community merging. Compared with the existing clustering strategies, community detection factors in the new perspective of cluster network analysis. As a result, it has the inherent benefit of high pseudo-label purity, which is critical to the performance of self-supervision. We have validated the efficacy of the proposed approach on benchmark image datasets. Our extensive experiments have shown that it can effectively improve the SOTA performance. Our ablation study also demonstrates that the new network perspective can effectively improve community pseudo-label purity, resulting in improved clustering performance.

artificial intelligence, data mining, machine learning, (18 more...)

2501.02036

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

LCFed: An Efficient Clustered Federated Learning Framework for Heterogeneous Data

Zhang, Yuxin, Chen, Haoyu, Lin, Zheng, Chen, Zhe, Zhao, Jin

Clustered federated learning (CFL) addresses the performance challenges posed by data heterogeneity in federated learning (FL) by organizing edge devices with similar data distributions into clusters, enabling collaborative model training tailored to each group. However, existing CFL approaches strictly limit knowledge sharing to within clusters, lacking the integration of global knowledge with intra-cluster training, which leads to suboptimal performance. Moreover, traditional clustering methods incur significant computational overhead, especially as the number of edge devices increases. In this paper, we propose LCFed, an efficient CFL framework to combat these challenges. By leveraging model partitioning and adopting distinct aggregation strategies for each sub-model, LCFed effectively incorporates global knowledge into intra-cluster co-training, achieving optimal training performance. Additionally, LCFed customizes a computationally efficient model similarity measurement method based on low-rank models, enabling real-time cluster updates with minimal computational overhead. Extensive experiments show that LCFed outperforms state-of-the-art benchmarks in both test accuracy and clustering computational efficiency.

artificial intelligence, learning, machine learning, (15 more...)

2501.0185

Country: Asia > China (0.30)

Genre: Research Report (0.50)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.48)

Zhang, Wenhao, Quintana, Matias, Miller, Clayton

Recommender systems and reinforcement learning for human-building interaction and context-aware support: A text mining-driven review of scientific literature

The indoor environment significantly impacts human health and well-being; enhancing health and reducing energy consumption in these settings is a central research focus. With the advancement of Information and Communication Technology (ICT), recommendation systems and reinforcement learning (RL) have emerged as promising approaches to induce behavioral changes to improve the indoor environment and energy efficiency of buildings. This study aims to employ text mining and Natural Language Processing (NLP) techniques to thoroughly examine the connections among these approaches in the context of human-building interaction and occupant context-aware support. The study analyzed 27,595 articles from the ScienceDirect database, revealing extensive use of recommendation systems and RL for space optimization, location recommendations, and personalized control suggestions. Furthermore, this review underscores the vast potential for expanding recommender systems and RL applications in buildings and indoor environments. Fields ripe for innovation include predictive maintenance, building-related product recommendation, and optimization of environments tailored for specific needs, such as sleep and productivity enhancements based on user feedback. The study also notes the limitations of the method in capturing subtle academic nuances. Future improvements could involve integrating and fine-tuning pre-trained language models to better interpret complex texts.

machine learning, recommender system, reinforcement learning, (20 more...)

doi: 10.1016/j.enbuild.2024.115247

2411.08734

Country:

North America > United States (0.93)
Asia (0.68)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.48)
Research Report > New Finding (0.34)

Industry:

Information Technology (1.00)
Health & Medicine > Consumer Health (1.00)
Energy > Power Industry (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJan-2-2025

Text Clustering as Classification with LLMs

Huang, Chen, He, Guoxiu

Text clustering remains valuable in real-world applications where manual labeling is cost-prohibitive. It facilitates efficient organization and analysis of information by grouping similar texts based on their representations. However, implementing this approach necessitates fine-tuned embedders for downstream data and sophisticated similarity metrics. To address this issue, this study presents a novel framework for text clustering that effectively leverages the in-context learning capacity of Large Language Models (LLMs). Instead of fine-tuning embedders, we propose to transform the text clustering into a classification task via LLM. First, we prompt LLM to generate potential labels for a given dataset. Second, after integrating similar labels generated by the LLM, we prompt the LLM to assign the most appropriate label to each sample in the dataset. Our framework has been experimentally proven to achieve comparable or superior performance to state-of-the-art clustering methods that employ embeddings, without requiring complex fine-tuning or clustering algorithms. We make our code available to the public for utilization at https://github.com/ECNU-Text-Computing/Text-Clustering-via-LLM.

dataset, llm, proceedings, (16 more...)

2410.00927

Country:

Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)