AITopics

2301.12671

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.63)

arXiv.org Artificial IntelligenceMay-16-2023

Clustering-Aware Negative Sampling for Unsupervised Sentence Representation

Deng, Jinghao, Wan, Fanqi, Yang, Tao, Quan, Xiaojun, Wang, Rui

Contrastive learning has been widely studied in sentence representation learning. However, earlier works mainly focus on the construction of positive examples, while in-batch samples are often simply treated as negative examples. This approach overlooks the importance of selecting appropriate negative examples, potentially leading to a scarcity of hard negatives and the inclusion of false negatives. To address these issues, we propose ClusterNS (Clustering-aware Negative Sampling), a novel method that incorporates cluster information into contrastive learning for unsupervised sentence representation learning. We apply a modified K-means clustering algorithm to supply hard negatives and recognize in-batch false negatives during training, aiming to solve the two issues in one unified framework. Experiments on semantic textual similarity (STS) tasks demonstrate that our proposed ClusterNS compares favorably with baselines in unsupervised sentence representation learning. Our code has been made publicly available.

artificial intelligence, machine learning, natural language, (18 more...)

2305.09892

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(16 more...)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Fang, Si-Guo, Huang, Dong, Wang, Chang-Dong, Lai, Jian-Huang

One-step Bipartite Graph Cut: A Normalized Formulation and Its Application to Scalable Subspace Clustering

arXiv.org Artificial IntelligenceMay-12-2023

The bipartite graph structure has shown its promising ability in facilitating the subspace clustering and spectral clustering algorithms for large-scale datasets. To avoid the post-processing via k-means during the bipartite graph partitioning, the constrained Laplacian rank (CLR) is often utilized for constraining the number of connected components (i.e., clusters) in the bipartite graph, which, however, neglects the distribution (or normalization) of these connected components and may lead to imbalanced or even ill clusters. Despite the significant success of normalized cut (Ncut) in general graphs, it remains surprisingly an open problem how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity. In this paper, we first characterize a novel one-step bipartite graph cut (OBCut) criterion with normalized constraints, and theoretically prove its equivalence to a trace maximization problem. Then we extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled in a unified objective function, and an alternating optimization algorithm is further designed to solve it in linear time. Experiments on a variety of general and large-scale datasets demonstrate the effectiveness and scalability of our approach.

bipartite graph, data mining, machine learning, (17 more...)

2305.07386

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Education (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Min, Hancheng, Mallada, Enrique

Learning Coherent Clusters in Weakly-Connected Network Systems

arXiv.org Artificial IntelligenceMay-12-2023

We propose a structure-preserving model-reduction methodology for large-scale dynamic networks with tightly-connected components. First, the coherent groups are identified by a spectral clustering algorithm on the graph Laplacian matrix that models the network feedback. Then, a reduced network is built, where each node represents the aggregate dynamics of each coherent group, and the reduced network captures the dynamic coupling between the groups. We provide an upper bound on the approximation error when the network graph is randomly generated from a weight stochastic block model. Finally, numerical experiments align with and validate our theoretical findings.

artificial intelligence, diag, machine learning, (16 more...)

2211.15301

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Abbasi, Fateme, Banerjee, Sandip, Byrka, Jarosław, Chalermsook, Parinya, Gadekar, Ameet, Khodamoradi, Kamyar, Marx, Dániel, Sharma, Roohani, Spoerhase, Joachim

Parameterized Approximation for Robust Clustering in Discrete Geometric Spaces

arXiv.org Artificial IntelligenceMay-12-2023

We consider the well-studied Robust $(k, z)$-Clustering problem, which generalizes the classic $k$-Median, $k$-Means, and $k$-Center problems. Given a constant $z\ge 1$, the input to Robust $(k, z)$-Clustering is a set $P$ of $n$ weighted points in a metric space $(M,\delta)$ and a positive integer $k$. Further, each point belongs to one (or more) of the $m$ many different groups $S_1,S_2,\ldots,S_m$. Our goal is to find a set $X$ of $k$ centers such that $\max_{i \in [m]} \sum_{p \in S_i} w(p) \delta(p,X)^z$ is minimized. This problem arises in the domains of robust optimization [Anthony, Goyal, Gupta, Nagarajan, Math. Oper. Res. 2010] and in algorithmic fairness. For polynomial time computation, an approximation factor of $O(\log m/\log\log m)$ is known [Makarychev, Vakilian, COLT $2021$], which is tight under a plausible complexity assumption even in the line metrics. For FPT time, there is a $(3^z+\epsilon)$-approximation algorithm, which is tight under GAP-ETH [Goyal, Jaiswal, Inf. Proc. Letters, 2023]. Motivated by the tight lower bounds for general discrete metrics, we focus on \emph{geometric} spaces such as the (discrete) high-dimensional Euclidean setting and metrics of low doubling dimension, which play an important role in data analysis applications. First, for a universal constant $\eta_0 >0.0006$, we devise a $3^z(1-\eta_{0})$-factor FPT approximation algorithm for discrete high-dimensional Euclidean spaces thereby bypassing the lower bound for general metrics. We complement this result by showing that even the special case of $k$-Center in dimension $\Theta(\log n)$ is $(\sqrt{3/2}- o(1))$-hard to approximate for FPT algorithms. Finally, we complete the FPT approximation landscape by designing an FPT $(1+\epsilon)$-approximation scheme (EPAS) for the metric of sub-logarithmic doubling dimension.

algorithm, artificial intelligence, machine learning, (18 more...)

2305.07316

Country:

Europe > Germany > Saarland > Saarbrücken (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Government > Regional Government (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

A data-driven rutting depth short-time prediction model with metaheuristic optimization for asphalt pavements based on RIOHTrack

Li, Zhuoxuan, Korovin, Iakov, Shi, Xinli, Gorbachev, Sergey, Gorbacheva, Nadezhda, Huang, Wei, Cao, Jinde

Rutting of asphalt pavements is a crucial design criterion in various pavement design guides. A good road transportation base can provide security for the transportation of oil and gas in road transportation. This study attempts to develop a robust artificial intelligence model to estimate different asphalt pavements' rutting depth clips, temperature, and load axes as primary characteristics. The experiment data were obtained from 19 asphalt pavements with different crude oil sources on a 2.038 km long full-scale field accelerated pavement test track (RIOHTrack, Road Track Institute) in Tongzhou, Beijing. In addition, this paper also proposes to build complex networks with different pavement rutting depths through complex network methods and the Louvain algorithm for community detection. The most critical structural elements can be selected from different asphalt pavement rutting data, and similar structural elements can be found. An extreme learning machine algorithm with residual correction (RELM) is designed and optimized using an independent adaptive particle swarm algorithm. The experimental results of the proposed method are compared with several classical machine learning algorithms, with predictions of Average Root Mean Squared Error, Average Mean Absolute Error, and Average Mean Absolute Percentage Error for 19 asphalt pavements reaching 1.742, 1.363, and 1.94\% respectively. The experiments demonstrate that the RELM algorithm has an advantage over classical machine learning methods in dealing with non-linear problems in road engineering. Notably, the method ensures the adaptation of the simulated environment to different levels of abstraction through the cognitive analysis of the production environment parameters.

artificial intelligence, evolutionary algorithm, machine learning, (20 more...)

doi: 10.1109/JAS.2023.123192

2305.06707

Country:

Europe (0.93)
Asia > China > Beijing > Beijing (0.24)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.66)

Industry:

Transportation > Ground > Road (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Son, Seok Bin, Park, Soohyun, Kim, Joongheon

Entropy-Aware Similarity for Balanced Clustering: A Case Study with Melanoma Detection

Clustering data is an unsupervised learning approach that aims to divide a set of data points into multiple groups. It is a crucial yet demanding subject in machine learning and data mining. Its successful applications span various fields. However, conventional clustering techniques necessitate the consideration of balance significance in specific applications. Therefore, this paper addresses the challenge of imbalanced clustering problems and presents a new method for balanced clustering by utilizing entropy-aware similarity, which can be defined as the degree of balances. We have coined the term, entropy-aware similarity for balanced clustering (EASB), which maximizes balance during clustering by complementary clustering of unbalanced data and incorporating entropy in a novel similarity formula that accounts for both angular differences and distances. The effectiveness of the proposed approach is evaluated on actual melanoma medial data, specifically the International Skin Imaging Collaboration (ISIC) 2019 and 2020 challenge datasets, to demonstrate how it can successfully cluster the data while preserving balance. Lastly, we can confirm that the proposed method exhibited outstanding performance in detecting melanoma, comparing to classical methods.

artificial intelligence, hospital, machine learning, (11 more...)

2305.15417

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > South Korea > Seoul > Seoul (0.06)
North America > United States > District of Columbia > Washington (0.04)
(4 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (1.00)
Health & Medicine > Therapeutic Area > Dermatology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Fukumoto, Katsuki, Yamada, Koki, Tanaka, Yuichi, Wai, Hoi-To

Clustering of Time-Varying Graphs Based on Temporal Label Smoothness

We propose a node clustering method for time-varying graphs based on the assumption that the cluster labels are changed smoothly over time. Clustering is one of the fundamental tasks in many science and engineering fields including signal processing, machine learning, and data mining. Although most existing studies focus on the clustering of nodes in static graphs, we often encounter time-varying graphs for time-series data, e.g., social networks, brain functional connectivity, and point clouds. In this paper, we formulate a node clustering of time-varying graphs as an optimization problem based on spectral clustering, with a smoothness constraint of the node labels. We solve the problem with a primal-dual splitting algorithm. Experiments on synthetic and real-world time-varying graphs are performed to validate the effectiveness of the proposed approach.

artificial intelligence, data mining, machine learning, (18 more...)

2305.06576

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.40)

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Deep Multi-View Subspace Clustering with Anchor Graph

Cui, Chenhang, Ren, Yazhou, Pu, Jingyu, Pu, Xiaorong, He, Lifang

Deep multi-view subspace clustering (DMVSC) has recently attracted increasing attention due to its promising performance. However, existing DMVSC methods still have two issues: (1) they mainly focus on using autoencoders to nonlinearly embed the data, while the embedding may be suboptimal for clustering because the clustering objective is rarely considered in autoencoders, and (2) existing methods typically have a quadratic or even cubic complexity, which makes it challenging to deal with large-scale data. To address these issues, in this paper we propose a novel deep multi-view subspace clustering method with anchor graph (DMCAG). To be specific, DMCAG firstly learns the embedded features for each view independently, which are used to obtain the subspace representations. To significantly reduce the complexity, we construct an anchor graph with small size for each view. Then, spectral clustering is performed on an integrated anchor graph to obtain pseudo-labels. To overcome the negative impact caused by suboptimal embedded features, we use pseudo-labels to refine the embedding process to make it more suitable for the clustering task. Pseudo-labels and embedded features are updated alternately. Furthermore, we design a strategy to keep the consistency of the labels based on contrastive learning to enhance the clustering performance. Empirical studies on real-world datasets show that our method achieves superior clustering performance over other state-of-the-art methods.

artificial intelligence, deep learning, machine learning, (17 more...)

2305.06939

Country:

North America > United States (0.29)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Sichuan Province > Chengdu (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.35)

Rethinking k-means from manifold learning perspective

Gao, Quanxue, Wang, Qianqian, Lu, Han, Xia, Wei, Gao, Xinbo

Although numerous clustering algorithms have been developed, many existing methods still leverage k-means technique to detect clusters of data points. However, the performance of k-means heavily depends on the estimation of centers of clusters, which is very difficult to achieve an optimal solution. Another major drawback is that it is sensitive to noise and outlier data. In this paper, from manifold learning perspective, we rethink k-means and present a new clustering algorithm which directly detects clusters of data without mean estimation. Specifically, we construct distance matrix between data points by Butterworth filter such that distance between any two data points in the same clusters equals to a small constant, while increasing the distance between other data pairs from different clusters. To well exploit the complementary information embedded in different views, we leverage the tensor Schatten p-norm regularization on the 3rd-order tensor which consists of indicator matrices of different views. Finally, an efficient alternating algorithm is derived to optimize our model. The constructed sequence was proved to converge to the stationary KKT point. Extensive experimental results indicate the superiority of our proposed method.

artificial intelligence, machine learning, matrix, (18 more...)

2305.07213

Country:

Asia > China > Chongqing Province > Chongqing (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
Europe > France (0.04)
(4 more...)

Genre: Research Report (0.40)

Industry: Education (0.61)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)