AITopics | Daskalakis, Constantinos

Collaborating Authors

Daskalakis, Constantinos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Converse to Banach's Fixed Point Theorem and its CLS Completeness

Daskalakis, Constantinos, Tzamos, Christos, Zampetakis, Manolis

arXiv.org Machine LearningFeb-13-2018

Banach's fixed point theorem for contraction maps has been widely used to analyze the convergence of iterative methods in non-convex problems. It is a common experience, however, that iterative maps fail to be globally contracting under the natural metric in their domain, making the applicability of Banach's theorem limited. We explore how generally we can apply Banach's fixed point theorem to establish the convergence of iterative methods when pairing it with carefully designed metrics. Our first result is a strong converse of Banach's theorem, showing that it is a universal analysis tool for establishing global convergence of iterative methods to unique fixed points, and for bounding their convergence rate. In other words, we show that, whenever an iterative map globally converges to a unique fixed point, there exists a metric under which the iterative map is contracting and which can be used to bound the number of iterations until convergence. We illustrate our approach in the widely used power method, providing a new way of bounding its convergence rate through contraction arguments. We next consider the computational complexity of Banach's fixed point theorem. Making the proof of our converse theorem constructive, we show that computing a fixed point whose existence is guaranteed by Banach's fixed point theorem is CLS-complete. We thus provide the first natural complete problem for the class CLS, which was defined in [Daskalakis, Papadimitriou 2011] to capture the complexity of problems such as P-matrix LCP, computing KKT-points, and finding mixed Nash equilibria in congestion and network coordination games.

artificial intelligence, banach, machine learning, (16 more...)

arXiv.org Machine Learning

1702.07339

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Training GANs with Optimism

Daskalakis, Constantinos, Ilyas, Andrew, Syrgkanis, Vasilis, Zeng, Haoyang

arXiv.org Machine LearningFeb-13-2018

We address the issue of limit cycling behavior in training Generative Adversarial Networks and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs. Recent theoretical results have shown that optimistic mirror decent (OMD) can enjoy faster regret rates in the context of zero-sum games. WGANs is exactly a context of solving a zero-sum game with simultaneous no-regret dynamics. Moreover, we show that optimistic mirror decent addresses the limit cycling problem in training WGANs. We formally show that in the case of bi-linear zero-sum games the last iterate of OMD dynamics converges to an equilibrium, in contrast to GD dynamics which are bound to cycle. We also portray the huge qualitative difference between GD and OMD dynamics with toy examples, even when GD is modified with many adaptations proposed in the recent literature, such as gradient penalty or momentum. We apply OMD WGAN training to a bioinformatics problem of generating DNA sequences. We observe that models trained with OMD achieve consistently smaller KL divergence with respect to the true underlying distribution, than models trained with GD variants. Finally, we introduce a new algorithm, Optimistic Adam, which is an optimistic variant of Adam. We apply it to WGAN training on CIFAR10 and observe improved performance in terms of inception score as compared to Adam.

health & medicine, neural network, zero-sum game, (18 more...)

arXiv.org Machine Learning

1711.00141

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Concentration of Multilinear Functions of the Ising Model with Applications to Network Data

Daskalakis, Constantinos, Dikkala, Nishanth, Kamath, Gautam

Neural Information Processing SystemsDec-31-2017

We prove near-tight concentration of measure for polynomial functions of the Ising model, under high temperature, improving the radius of concentration guaranteed by known results by polynomial factors in the dimension (i.e.~the number of nodes in the Ising model). We show that our results are optimal up to logarithmic factors in the dimension. We obtain our results by extending and strengthening the exchangeable-pairs approach used to prove concentration of measure in this setting by Chatterjee. We demonstrate the efficacy of such functions as statistics for testing the strength of interactions in social networks in both synthetic and real world data.

artificial intelligence, ising model, us government, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.86)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)
Information Technology (0.71)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

The Robust Manifold Defense: Adversarial Training using Generative Models

Ilyas, Andrew, Jalal, Ajil, Asteri, Eirini, Daskalakis, Constantinos, Dimakis, Alexandros G.

arXiv.org Machine LearningDec-26-2017

Deep neural networks are demonstrating excellent performance on several classical vision problems. However, these networks are vulnerable to adversarial examples, minutely modified images that induce arbitrary attacker-chosen output from the network. We propose a mechanism to protect against these adversarial inputs based on a generative model of the data. We introduce a pre-processing step that projects on the range of a generative model using gradient descent before feeding an input into a classifier. We show that this step provides the classifier with robustness against first-order, substitute model, and combined adversarial attacks. Using a min-max formulation, we show that there may exist adversarial examples even in the range of the generator, natural-looking images extremely close to the decision boundary for which the classifier has unjustifiedly high confidence. We show that adversarial training on the generative manifold can be used to make a classifier that is robust to these attacks. Finally, we show how our method can be applied even without a pre-trained generative model using a recent method called the deep image prior. We evaluate our method on MNIST, CelebA and Imagenet and show robustness against the current state of the art attacks.

classifier, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1712.09196

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Add feedback

Concentration of Multilinear Functions of the Ising Model with Applications to Network Data

Daskalakis, Constantinos, Dikkala, Nishanth, Kamath, Gautam

arXiv.org Machine LearningOct-11-2017

We prove near-tight concentration of measure for polynomial functions of the Ising model under high temperature. For any degree $d$, we show that a degree-$d$ polynomial of a $n$-spin Ising model exhibits exponential tails that scale as $\exp(-r^{2/d})$ at radius $r=\tilde{\Omega}_d(n^{d/2})$. Our concentration radius is optimal up to logarithmic factors for constant $d$, improving known results by polynomial factors in the number of spins. We demonstrate the efficacy of polynomial functions as statistics for testing the strength of interactions in social networks in both synthetic and real world data.

artificial intelligence, ising model, us government, (18 more...)

arXiv.org Machine Learning

1710.0417

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.14)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Media > Music (0.93)
Information Technology (0.70)

Technology:

Information Technology > Communications > Networks (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Ten Steps of EM Suffice for Mixtures of Two Gaussians

Daskalakis, Constantinos, Tzamos, Christos, Zampetakis, Manolis

arXiv.org Machine LearningJun-5-2017

The Expectation-Maximization (EM) algorithm is a widely used method for maximum likelihood estimation in models with latent variables. For estimating mixtures of Gaussians, its iteration can be viewed as a soft version of the k-means clustering algorithm. Despite its wide use and applications, there are essentially no known convergence guarantees for this method. We provide global convergence guarantees for mixtures of two Gaussians with known covariance matrices. We show that the population version of EM, where the algorithm is given access to infinitely many samples from the mixture, converges geometrically to the correct mean vectors, and provide simple, closed-form expressions for the convergence rate. As a simple illustration, we show that, in one dimension, ten steps of the EM algorithm initialized at infinity result in less than 1\% error estimation of the means. In the finite sample regime, we show that, under a random initialization, $\tilde{O}(d/\epsilon^2)$ samples suffice to compute the unknown vectors to within $\epsilon$ in Mahalanobis distance, where $d$ is the dimension. In particular, the error rate of the EM based estimator is $\tilde{O}\left(\sqrt{d \over n}\right)$ where $n$ is the number of samples, which is optimal up to logarithmic factors.

artificial intelligence, iteration, machine learning, (18 more...)

arXiv.org Machine Learning

1609.00368

Country: North America > United States > California (0.14)

Genre:

Research Report (0.50)
Workflow (0.46)

Add feedback

Square Hellinger Subadditivity for Bayesian Networks and its Applications to Identity Testing

Daskalakis, Constantinos, Pan, Qinxuan

arXiv.org Machine LearningDec-9-2016

We show that the square Hellinger distance between two Bayesian networks on the same directed graph, $G$, is subadditive with respect to the neighborhoods of $G$. Namely, if $P$ and $Q$ are the probability distributions defined by two Bayesian networks on the same DAG, our inequality states that the square Hellinger distance, $H^2(P,Q)$, between $P$ and $Q$ is upper bounded by the sum, $\sum_v H^2(P_{\{v\} \cup \Pi_v}, Q_{\{v\} \cup \Pi_v})$, of the square Hellinger distances between the marginals of $P$ and $Q$ on every node $v$ and its parents $\Pi_v$ in the DAG. Importantly, our bound does not involve the conditionals but the marginals of $P$ and $Q$. We derive a similar inequality for more general Markov Random Fields. As an application of our inequality, we show that distinguishing whether two Bayesian networks $P$ and $Q$ on the same (but potentially unknown) DAG satisfy $P=Q$ vs $d_{\rm TV}(P,Q)>\epsilon$ can be performed from $\tilde{O}(|\Sigma|^{3/4(d+1)} \cdot n/\epsilon^2)$ samples, where $d$ is the maximum in-degree of the DAG and $\Sigma$ the domain of each variable of the Bayesian networks. If $P$ and $Q$ are defined on potentially different and potentially unknown trees, the sample complexity becomes $\tilde{O}(|\Sigma|^{4.5} n/\epsilon^2)$, whose dependence on $n, \epsilon$ is optimal up to logarithmic factors. Lastly, if $P$ and $Q$ are product distributions over $\{0,1\}^n$ and $Q$ is known, the sample complexity becomes $O(\sqrt{n}/\epsilon^2)$, which is optimal up to constant factors.

artificial intelligence, bayesian inference, hellinger distance, (18 more...)

arXiv.org Machine Learning

1612.03164

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Optimal Testing for Properties of Distributions

Acharya, Jayadev, Daskalakis, Constantinos, Kamath, Gautam

Neural Information Processing SystemsDec-31-2015

Given samples from an unknown distribution, p, is it possible to distinguish whether p belongs to some class of distributions C versus p being far from every distribution in C? This fundamental question has receivedtremendous attention in Statistics, albeit focusing onasymptotic analysis, as well as in Computer Science, wherethe emphasis has been on small sample size and computationalcomplexity. Nevertheless, even for basic classes ofdistributions such as monotone, log-concave, unimodal, and monotone hazard rate, the optimal sample complexity is unknown.We provide a general approach via which we obtain sample-optimal and computationally efficient testers for all these distribution families. At the core of our approach is an algorithm which solves the following problem:Given samplesfrom an unknown distribution p, and a known distribution q, are p and q close in Chi^2-distance, or far in total variation distance?The optimality of all testers is established by providing matching lower bounds. Finally, a necessary building block for our tester and important byproduct of our work are the first known computationally efficient proper learners for discretelog-concave and monotone hazard rate distributions. We exhibit the efficacy of our testers via experimental analysis.

algorithm, artificial intelligence, survey article, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)

Add feedback

Optimum Statistical Estimation with Strategic Data Sources

Cai, Yang, Daskalakis, Constantinos, Papadimitriou, Christos H.

arXiv.org Machine LearningApr-24-2015

We propose an optimum mechanism for providing monetary incentives to the data sources of a statistical estimator such as linear regression, so that high quality data is provided at low cost, in the sense that the sum of payments and estimation error is minimized. The mechanism applies to a broad range of estimators, including linear and polynomial regression, kernel regression, and, under some additional assumptions, ridge regression. It also generalizes to several objectives, including minimizing estimation error subject to budget constraints. Besides our concrete results for regression problems, we contribute a mechanism design framework through which to design and analyze statistical estimators whose examples are supplied by workers with cost for labeling said examples.

artificial intelligence, game theory, mechanism, (20 more...)

arXiv.org Machine Learning

1408.2539

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Data Science > Data Mining (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback