AITopics | Papailiopoulos, Dimitris

Collaborating Authors

Papailiopoulos, Dimitris

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Parallel Correlation Clustering on Big Graphs

Pan, Xinghao, Papailiopoulos, Dimitris, Oymak, Samet, Recht, Benjamin, Ramchandran, Kannan, Jordan, Michael I.

Neural Information Processing SystemsDec-31-2015

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, in practice KwikCluster requires a large number of clustering rounds, a potential bottleneck for large graphs.We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds, and provably achieve nearly linear speedups. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3 approximation ratio.We provide extensive experimental results for both algorithms, where we outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15x speedup.

artificial intelligence, clusterwild, data mining, (16 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Orthogonal NMF through Subspace Exploration

Asteris, Megasthenis, Papailiopoulos, Dimitris, Dimakis, Alexandros G.

Neural Information Processing SystemsDec-31-2015

Orthogonal Nonnegative Matrix Factorization {(ONMF)} aims to approximate a nonnegative matrix as the product of two $k$-dimensional nonnegative factors, one of which has orthonormal columns. It yields potentially useful data representations as superposition of disjoint parts, while it has been shown to work well for clustering tasks where traditional methods underperform. Existing algorithms rely mostly on heuristics, which despite their good empirical performance, lack provable performance guarantees.We present a new ONMF algorithm with provable approximation guarantees.For any constant dimension~$k$, we obtain an additive EPTAS without any assumptions on the input. Our algorithm relies on a novel approximation to the related Nonnegative Principal Component Analysis (NNPCA) problem; given an arbitrary data matrix, NNPCA seeks $k$ nonnegative components that jointly capture most of the variance. Our NNPCA algorithm is of independent interest and generalizes previous work that could only obtain guarantees for a single component. We evaluate our algorithms on several real and synthetic datasets and show that their performance matches or outperforms the state of the art.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Sparse PCA via Bipartite Matchings

Asteris, Megasthenis, Papailiopoulos, Dimitris, Kyrillidis, Anastasios, Dimakis, Alexandros G.

arXiv.org Machine LearningAug-3-2015

We consider the following multi-component sparse PCA problem: given a set of data points, we seek to extract a small number of sparse components with disjoint supports that jointly capture the maximum possible variance. These components can be computed one by one, repeatedly solving the single-component problem and deflating the input data matrix, but as we show this greedy procedure is suboptimal. We present a novel algorithm for sparse PCA that jointly optimizes multiple disjoint components. The extracted features capture variance that lies within a multiplicative factor arbitrarily close to 1 from the optimal. Our algorithm is combinatorial and computes the desired components by solving multiple instances of the bipartite maximum weight matching problem. Its complexity grows as a low order polynomial in the ambient dimension of the input data matrix, but exponentially in its rank. However, it can be effectively applied on a low-dimensional sketch of the data; this allows us to obtain polynomial-time approximation guarantees via spectral bounds. We evaluate our algorithm on real data-sets and empirically demonstrate that in many cases it outperforms existing, deflation-based approaches.

algorithm, artificial intelligence, optimization problem, (16 more...)

arXiv.org Machine Learning

1508.00625

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

On the Worst-Case Approximability of Sparse PCA

Chan, Siu On, Papailiopoulos, Dimitris, Rubinstein, Aviad

arXiv.org Machine LearningJul-21-2015

It is well known that Sparse PCA (Sparse Principal Component Analysis) is NP-hard to solve exactly on worst-case instances. What is the complexity of solving Sparse PCA approximately? Our contributions include: 1) a simple and efficient algorithm that achieves an $n^{-1/3}$-approximation; 2) NP-hardness of approximation to within $(1-\varepsilon)$, for some small constant $\varepsilon > 0$; 3) SSE-hardness of approximation to within any constant factor; and 4) an $\exp\exp\left(\Omega\left(\sqrt{\log \log n}\right)\right)$ ("quasi-quasi-polynomial") gap for the standard semidefinite program.

artificial intelligence, machine learning, matrix, (18 more...)

arXiv.org Machine Learning

1507.0595

Country:

Asia (0.68)
North America > United States > Massachusetts (0.14)
North America > United States > Nevada (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

Add feedback

Parallel Correlation Clustering on Big Graphs

Pan, Xinghao, Papailiopoulos, Dimitris, Oymak, Samet, Recht, Benjamin, Ramchandran, Kannan, Jordan, Michael I.

arXiv.org Machine LearningJul-20-2015

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run in a polylogarithmic number of rounds and achieve nearly linear speedups, provably. C4 uses concurrency control to enforce serializability of a parallel clustering process, and guarantees a 3-approximation ratio. ClusterWild! is a coordination free algorithm that abandons consistency for the benefit of better scaling; this leads to a provably small loss in the 3-approximation ratio. We provide extensive experimental results for both algorithms, where we outperform the state of the art, both in terms of clustering accuracy and running time. We show that our algorithms can cluster billion-edge graphs in under 5 seconds on 32 cores, while achieving a 15x speedup.

artificial intelligence, data mining, vertex, (15 more...)

arXiv.org Machine Learning

1507.05086

Country: Asia (0.14)

Genre: Research Report (0.81)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Communications (0.67)

Add feedback

Provable Deterministic Leverage Score Sampling

Papailiopoulos, Dimitris, Kyrillidis, Anastasios, Boutsidis, Christos

arXiv.org Machine LearningJun-2-2014

We explain theoretically a curious empirical phenomenon: "Approximating a matrix by deterministically selecting a subset of its columns with the corresponding largest leverage scores results in a good low-rank matrix surrogate". To obtain provable guarantees, previous work requires randomized sampling of the columns with probabilities proportional to their leverage scores. In this work, we provide a novel theoretical analysis of deterministic leverage score sampling. We show that such deterministic sampling can be provably as accurate as its randomized counterparts, if the leverage scores follow a moderately steep power-law decay. We support this power-law assumption by providing empirical evidence that such decay laws are abundant in real-world data sets. We then demonstrate empirically the performance of deterministic leverage score sampling, which many times matches or outperforms the state-of-the-art techniques.

algorithm, artificial intelligence, health & medicine, (16 more...)

arXiv.org Machine Learning

1404.153

Country: North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Communications (0.68)
Information Technology > Data Science (0.68)

Add feedback