AITopics | k-median algorithm

Collaborating Authors

k-median algorithm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Clustering via Concave Minimization

Neural Information Processing SystemsApr-6-2023, 18:17:09 GMT

The problem of assigning m points in the n-dimensional real space Rn to k clusters is formulated as that of determining k centers in Rn such that the sum of distances of each point to the nearest center is minimized. If a polyhedral distance is used, the problem can be formulated as that of minimizing a piecewise-linear concave function on a polyhedral set which is shown to be equivalent to a bilinear program: minimizing a bilinear function on a polyhe(cid:173) dral set. A fast finite k-Median Algorithm consisting of solving few linear programs in closed form leads to a stationary point of the bilinear program. Computational testing on a number of real(cid:173) world databases was carried out. On the Wisconsin Diagnostic Breast Cancer (WDBC) database, k-Median training set correct(cid:173) ness was comparable to that of the k-Mean Algorithm, however its testing set correctness was better.

algorithm, concave minimization, database, (7 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.43)

Add feedback

Improved Learning-augmented Algorithms for k-means and k-medians Clustering

Nguyen, Thy, Chaturvedi, Anamay, Nguyen, Huy Lê

arXiv.org Artificial IntelligenceMar-1-2023

We consider the problem of clustering in the learning-augmented setting, where we are given a data set in $d$-dimensional Euclidean space, and a label for each data point given by an oracle indicating what subsets of points should be clustered together. This setting captures situations where we have access to some auxiliary information about the data set relevant for our clustering objective, for instance the labels output by a neural network. Following prior work, we assume that there are at most an $\alpha \in (0,c)$ for some $c<1$ fraction of false positives and false negatives in each predicted cluster, in the absence of which the labels would attain the optimal clustering cost $\mathrm{OPT}$. For a dataset of size $m$, we propose a deterministic $k$-means algorithm that produces centers with improved bound on clustering cost compared to the previous randomized algorithm while preserving the $O( d m \log m)$ runtime. Furthermore, our algorithm works even when the predictions are not very accurate, i.e. our bound holds for $\alpha$ up to $1/2$, an improvement over $\alpha$ being at most $1/7$ in the previous work. For the $k$-medians problem we improve upon prior work by achieving a biquadratic improvement in the dependence of the approximation factor on the accuracy parameter $\alpha$ to get a cost of $(1+O(\alpha))\mathrm{OPT}$, while requiring essentially just $O(md \log^3 m/\alpha)$ runtime.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2210.17028

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.89)

Add feedback

Optimal Time Bounds for Approximate Clustering

Mettu, Ramgopal, Plaxton, Greg

arXiv.org Machine LearningDec-12-2012

Clustering is a fundamental problem in unsupervised learning, and has been studied widely both as a problem of learning mixture models and as an optimization problem. In this paper, we study clustering with respect the emph{k-median} objective function, a natural formulation of clustering in which we attempt to minimize the average distance to cluster centers. One of the main contributions of this paper is a simple but powerful sampling technique that we call emph{successive sampling} that could be of independent interest. We show that our sampling procedure can rapidly identify a small set of points (of size just O(klog{n/k})) that summarize the input points for the purpose of clustering. Using successive sampling, we develop an algorithm for the k-median problem that runs in O(nk) time for a wide range of values of k and is guaranteed, with high probability, to return a solution with cost at most a constant factor times optimal. We also establish a lower bound of Omega(nk) on any randomized constant-factor approximation algorithm for the k-median problem that succeeds with even a negligible (say 1/100) probability. Thus we establish a tight time bound of Theta(nk) for the k-median problem for a wide range of values of k. The best previous upper bound for the problem was O(nk), where the O-notation hides polylogarithmic factors in n and k. The best previous lower bound of O(nk) applied only to deterministic k-median algorithms. While we focus our presentation on the k-median objective, all our upper bounds are valid for the k-means objective as well. In this context our algorithm compares favorably to the widely used k-means heuristic, which requires O(nk) time for just one iteration and provides no useful approximation guarantees.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1301.0587

Country: North America > United States > Texas (0.28)

Genre:

Research Report (0.50)
Workflow (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A fast and recursive algorithm for clustering large datasets with $k$-medians

Cardot, Hervé, Cénac, Peggy, Monnez, Jean-Marie

arXiv.org Machine LearningOct-18-2011

Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which are known to have better performances, and a data-driven procedure that allows automatic selection of the value of the descent step is proposed. The performance of the averaged sequential estimator is compared on a simulation study, both in terms of computation speed and accuracy of the estimations, with more classical partitioning techniques such as $k$-means, trimmed $k$-means and PAM (partitioning around medoids). Finally, this new online clustering technique is illustrated on determining television audience profiles with a sample of more than 5000 individual television audiences measured every minute over a period of 24 hours.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1101.4179

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Clustering via Concave Minimization

Bradley, Paul S., Mangasarian, Olvi L., Street, W. Nick

Neural Information Processing SystemsDec-31-1997

There are many approaches to this problem, including statistical [9], machine learning [7], integer and mathematical programming [18,1]. In this paper we concentrate on a simple concave minimization formulation of the problem that leads to a finite and fast algorithm.

algorithm, correctness, k-median algorithm, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Clustering via Concave Minimization

Bradley, Paul S., Mangasarian, Olvi L., Street, W. Nick

Neural Information Processing SystemsDec-31-1997

algorithm, correctness, k-median algorithm, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Oklahoma > Payne County > Stillwater (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Orange County > Irvine (0.04)

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Clustering via Concave Minimization

Bradley, Paul S., Mangasarian, Olvi L., Street, W. Nick

Neural Information Processing SystemsDec-31-1997

If a polyhedral distance is used, the problem can be formulated as that of minimizing a piecewise-linear concave function on a polyhedral set which is shown to be equivalent to a bilinear program: minimizing a bilinear function on a polyhedral set.A fast finite k-Median Algorithm consisting of solving few linear programs in closed form leads to a stationary point of the bilinear program. Computational testing on a number of realworld databaseswas carried out. On the Wisconsin Diagnostic Breast Cancer (WDBC) database, k-Median training set correctness wascomparable to that of the k-Mean Algorithm, however its testing set correctness was better. Additionally, on the Wisconsin Prognostic Breast Cancer (WPBC) database, distinct and clinically importantsurvival curves were extracted by the k-Median Algorithm, whereas the k-Mean Algorithm failed to obtain such distinct survival curves for the same database.

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin (0.55)
North America > United States > Oklahoma > Payne County > Stillwater (0.14)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback