AITopics

2006.08177

Country:

South America > Colombia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Lang, Andreas, Schubert, Erich

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

arXiv.org Machine LearningJun-23-2020

BIRCH clustering is a widely known approach for clustering, that has influenced much subsequent research and commercial products. The key contribution of BIRCH is the Clustering Feature tree (CF-Tree), which is a compressed representation of the input data. As new data arrives, the tree is eventually rebuilt to increase the compression. Afterward, the leaves of the tree are used for clustering. Because of the data compression, this method is very scalable. The idea has been adopted for example for k-means, data stream, and density-based clustering. Clustering features used by BIRCH are simple summary statistics that can easily be updated with new data: the number of points, the linear sums, and the sum of squared values. Unfortunately, how the sum of squares is then used in BIRCH is prone to catastrophic cancellation. We introduce a replacement cluster feature that does not have this numeric problem, that is not much more expensive to maintain, and which makes many computations simpler and hence more efficient. These cluster features can also easily be used in other work derived from BIRCH, such as algorithms for streaming data. In the experiments, we demonstrate the numerical problem and compare the performance of the original algorithm compared to the improved cluster features.

artificial intelligence, birch, machine learning, (16 more...)

2006.12881

Country:

Europe > United Kingdom (0.28)
North America > United States > Wisconsin > Dane County > Madison (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningJun-23-2020

Revealing consensus and dissensus between network partitions

Peixoto, Tiago P.

Community detection methods attempt to divide a network into groups of nodes that share similar properties, thus revealing its large-scale structure. A major challenge when employing such methods is that they are often degenerate, typically yielding a complex landscape of competing answers. As an attempt to extract understanding from a population of alternative solutions, many methods exist to establish a consensus among them in the form of a single partition "point estimate" that summarizes the whole distribution. Here we show that it is in general not possible to obtain a consistent answer from such point estimates when the underlying distribution is too heterogeneous. As an alternative, we provide a comprehensive set of methods designed to characterize and summarize complex populations of partitions in a manner that captures not only the existing consensus, but also the dissensus between elements of the population. Our approach is able to model mixed populations of partitions where multiple consensuses can coexist, representing different competing hypotheses for the network structure. We also show how our methods can be used to compare pairs of partitions, how they can be generalized to hierarchical divisions, and be used to perform statistical model selection between competing hypotheses.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2005.13977

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom (0.14)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Communications (0.92)
(2 more...)

Distributional Individual Fairness in Clustering

Anderson, Nihesh, Bera, Suman K., Das, Syamantak, Liu, Yang

In this paper, we initiate the study of fair clustering that ensures distributional similarity among similar individuals. In response to improving fairness in machine learning, recent papers have investigated fairness in clustering algorithms and have focused on the paradigm of statistical parity/group fairness. These efforts attempt to minimize bias against some protected groups in the population. However, to the best of our knowledge, the alternative viewpoint of individual fairness, introduced by Dwork et al. (ITCS 2012) in the context of classification, has not been considered for clustering so far. Similar to Dwork et al., we adopt the individual fairness notion which mandates that similar individuals should be treated similarly for clustering problems. We use the notion of $f$-divergence as a measure of statistical similarity that significantly generalizes the ones used by Dwork et al. We introduce a framework for assigning individuals, embedded in a metric space, to probability distributions over a bounded number of cluster centers. The objective is to ensure (a) low cost of clustering in expectation and (b) individuals that are close to each other in a given fairness space are mapped to statistically similar distributions. We provide an algorithm for clustering with $p$-norm objective ($k$-center, $k$-means are special cases) and individual fairness constraints with provable approximation guarantee. We extend this framework to include both group fairness and individual fairness inside the protected groups. Finally, we observe conditions under which individual fairness implies group fairness. We present extensive experimental evidence that justifies the effectiveness of our approach.

constraint, fairness, individual fairness, (15 more...)

2006.12589

Country:

Asia > Japan (0.04)
Asia > China > Heilongjiang Province > Daqing (0.04)

Genre: Research Report (0.82)

Industry:

Banking & Finance (0.46)
Law (0.46)
Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

An Efficient Smoothing Proximal Gradient Algorithm for Convex Clustering

Zhou, Xin, Du, Chunlei, Cai, Xiaodong

Cluster analysis organizes data into sensible groupings and is one of fundamental modes of understanding and learning. The widely used K-means and hierarchical clustering methods can be dramatically suboptimal due to local minima. Recently introduced convex clustering approach formulates clustering as a convex optimization problem and ensures a globally optimal solution. However, the state-of-the-art convex clustering algorithms, based on the alternating direction method of multipliers (ADMM) or the alternating minimization algorithm (AMA), require large computation and memory space, which limits their applications. In this paper, we develop a very efficient smoothing proximal gradient algorithm (Sproga) for convex clustering. Our Sproga is faster than ADMM- or AMA-based convex clustering algorithms by one to two orders of magnitude. The memory space required by Sproga is less than that required by ADMM and AMA by at least one order of magnitude. Computer simulations and real data analysis show that Sproga outperforms several well known clustering algorithms including K-means and hierarchical clustering. The efficiency and superior performance of our algorithm will help convex clustering to find its wide application.

algorithm, artificial intelligence, machine learning, (15 more...)

2006.12592

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Florida > Miami-Dade County > Coral Gables (0.04)
(4 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

A Multiscale Graph Convolutional Network Using Hierarchical Clustering

Lipov, Alex, Liò, Pietro

The information contained in hierarchical topology, intrinsic to many networks, is currently underutilised. A novel architecture is explored which exploits this information through a multiscale decomposition. A dendrogram is produced by a Girvan-Newman hierarchical clustering algorithm. It is segmented and fed through graph convolutional layers, allowing the architecture to learn multiple scale latent space representations of the network, from fine to coarse grained. The architecture is tested on a benchmark citation network, demonstrating competitive performance. Given the abundance of hierarchical networks, possible applications include quantum molecular property prediction, protein interface prediction and multiscale computational substrates for partial differential equations.

architecture, artificial intelligence, machine learning, (13 more...)

2006.12542

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Information Mandala: Statistical Distance Matrix with Clustering

Lu, Xin

In machine learning, observation features are measured in a metric space to obtain their distance function for optimization. Given similar features that are statistically sufficient as a population, a statistical distance between two probability distributions can be calculated for more precise learning. Provided the observed features are multi-valued, the statistical distance function is still efficient. However, due to its scalar output, it cannot be applied to represent detailed distances between feature elements. To resolve this problem, this paper extends the traditional statistical distance to a matrix form, called a statistical distance matrix. In experiments, the proposed approach performs well in object recognition tasks and clearly and intuitively represents the dissimilarities between cat and dog images in the CIFAR dataset, even when directly calculated using the image pixels. By using the hierarchical clustering of the statistical distance matrix, the image pixels can be separated into several clusters that are geometrically arranged around a center like a Mandala pattern. The statistical distance matrix with clustering, called the Information Mandala, is beyond ordinary saliency maps and can help to understand the basic principles of the convolution neural network.

artificial intelligence, distance matrix, machine learning, (13 more...)

2006.04017

Country:

North America > United States > New York (0.04)
Asia > India > West Bengal > Kolkata (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Paoletti, Giancarlo, Cavazza, Jacopo, Beyan, Cigdem, Del Bue, Alessio

Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

arXiv.org Artificial IntelligenceJun-21-2020

Despite the fact that subspace clustering has become a powerful Given a trimmed sequence, in which a single action or activity technique for problems such as face clustering or digit is assumed to be present, the final goal of HAR is to correctly recognition, its applicability to the problems like skeletonbased classifying it. Although significant progresses have been made HAR was only explored by a limited number of works in the last years, accurate action recognition in videos is still a [7], [8], [9]. This is due to many operative limitations including challenging task because of the complexity of the visual data how to handle the temporal dimensions, the inherent noise e.g., due to varying camera viewpoints, occlusions and abrupt present in the skeletal data and the related computational changes in lighting conditions.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICPR48806.2021.9412060

2006.11812

Country:

Europe > Italy (0.04)
Africa > Central African Republic > Ombella-M'Poko > Bimbo (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Abbasi, Mohsen, Bhaskara, Aditya, Venkatasubramanian, Suresh

Fair clustering via equitable group representations

arXiv.org Machine LearningJun-19-2020

What does it mean for a clustering to be fair? One popular approach seeks to ensure that each cluster contains groups in (roughly) the same proportion in which they exist in the population. The normative principle at play is balance: any cluster might act as a representative of the data, and thus should reflect its diversity. But clustering also captures a different form of representativeness. A core principle in most clustering problems is that a cluster center should be representative of the cluster it represents, by being "close" to the points associated with it. This is so that we can effectively replace the points by their cluster centers without significant loss in fidelity, and indeed is a common "use case" for clustering. For such a clustering to be fair, the centers should "represent" different groups equally well. We call such a clustering a group-representative clustering. In this paper, we study the structure and computation of group-representative clusterings. We show that this notion naturally parallels the development of fairness notions in classification, with direct analogs of ideas like demographic parity and equal opportunity. We demonstrate how these notions are distinct from and cannot be captured by balance-based notions of fairness. We present approximation algorithms for group representative $k$-median clustering and couple this with an empirical evaluation on various real-world data sets.

algorithm, average cost, representation, (14 more...)

2006.11009

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > Utah (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Jiang, Tao, Vavasis, Stephen

On identifying clusters from sum-of-norms clustering computation

arXiv.org Machine LearningJun-19-2020

Sum-of-norms clustering is a clustering formulation based on convex optimization that automatically induces hierarchy. Multiple algorithms have been proposed to solve the optimization problem: subgradient descent by Hocking et al.\ \cite{hocking}, ADMM and ADA by Chi and Lange\ \cite{Chi}, stochastic incremental algorithm by Panahi et al.\ \cite{Panahi} and semismooth Newton-CG augmented Lagrangian method by Yuan et al.\ \cite{dsun1}. All algorithms yield approximate solutions, even though an exact solution is demanded to determine the correct cluster assignment. The purpose of this paper is to close the gap between the output from existing algorithms and the exact solution to the optimization problem. We present a clustering test which identifies and certifies the correct cluster assignment from an approximate solution yielded by any primal-dual algorithm. The test may not succeed if the approximation is inaccurate. However, we show the correct cluster assignment is guaranteed to be found by a symmetric primal-dual path following algorithm after sufficiently many iterations, provided that the model parameter $\lambda$ avoids a finite number of bad values. Numerical experiments are implemented to support our results.

algorithm, artificial intelligence, machine learning, (17 more...)

2006.11355

Country:

North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Asia > China > Jiangsu Province > Yancheng (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)