AITopics | Caramanis, Constantine

Collaborating Authors

Caramanis, Constantine

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EM Converges for a Mixture of Many Linear Regressions

Kwon, Jeongyeol, Caramanis, Constantine

arXiv.org Machine LearningMay-28-2019

We study the convergence of the Expectation-Maximization (EM) algorithm for mixtures of linear regressions with an arbitrary number $k$ of components. We show that as long as signal-to-noise ratio (SNR) is more than $\tilde{O}(k^2)$, well-initialized EM converges to the true regression parameters. Previous results for $k \geq 3$ have only established local convergence for the noiseless setting, i.e., where SNR is infinitely large. Our results establish a near optimal statistical error rate of $\tilde{O}(\sigma \sqrt{k^2 d/n})$ for (sample-splitting) finite-sample EM with $k$ components, where $d$ is dimension, $n$ is the number of samples, and $\sigma$ is the variance of noise. In particular, our results imply exact recovery as $\sigma \rightarrow 0$, in contrast to most previous local convergence results for EM, where the statistical error scaled with the norm of parameters. Standard moment-method approaches suffice to guarantee we are in the region where our local convergence guarantees apply.

artificial intelligence, exp, machine learning, (17 more...)

arXiv.org Machine Learning

1905.12106

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.90)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.70)

Add feedback

Learning Graphs from Noisy Epidemic Cascades

Hoffmann, Jessica, Caramanis, Constantine

arXiv.org Machine LearningMar-6-2019

We consider the problem of learning the weighted edges of a graph by observing the noisy times of infection for multiple epidemic cascades on this graph. Past work has considered this problem when the cascade information, i.e., infection times, are known exactly. Though the noisy setting is well motivated by many epidemic processes (e.g., most human epidemics), to the best of our knowledge, very little is known about when it is solvable. Previous work on the no-noise setting critically uses the ordering information. If noise can reverse this -- a node's reported (noisy) infection time comes after the reported infection time of some node it infected -- then we are unable to see how previous results can be extended. We therefore tackle two versions of the noisy setting: the limited-noise setting, where we know noisy times of infections, and the extreme-noise setting, in which we only know whether or not a node was infected. We provide a polynomial time algorithm for recovering the structure of bidirectional trees in the extreme-noise setting, and show our algorithm matches lower bounds established in the no-noise setting, and hence is optimal. We extend our results for general degree-bounded graphs, where again we show that our (poly-time) algorithm can recover the structure of the graph with optimal sample complexity. We also provide the first efficient algorithm to learn the weights of the bidirectional tree in the limited-noise setting. Finally, we give a polynomial time algorithm for learning the weights of general bounded-degree graphs in the limited-noise setting. This algorithm extends to general graphs (at the price of exponential running time), proving the problem is solvable in the general case. All our algorithms work for any noise distribution, without any restriction on the variance.

graph, health & medicine, optimization problem, (21 more...)

arXiv.org Machine Learning

1903.0265

Genre: Research Report (0.84)

Industry:

Health & Medicine > Public Health (0.48)
Health & Medicine > Epidemiology (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Robust estimation of tree structured Gaussian Graphical Model

Katiyar, Ashish, Hoffmann, Jessica, Caramanis, Constantine

arXiv.org Machine LearningJan-25-2019

Consider jointly Gaussian random variables whose conditional independence structure is specified by a graphical model. If we observe realizations of the variables, we can compute the covariance matrix, and it is well known that the support of the inverse covariance matrix corresponds to the edges of the graphical model. Instead, suppose we only have noisy observations. If the noise at each node is independent, we can compute the sum of the covariance matrix and an unknown diagonal. The inverse of this sum is (in general) dense. We ask: can the original independence structure be recovered? We address this question for tree structured graphical models. We prove that this problem is unidentifiable, but show that this unidentifiability is limited to a small class of candidate trees. We further present additional constraints under which the problem is identifiable. Finally, we provide an O(n^3) algorithm to find this equivalence class of trees.

artificial intelligence, equivalence cluster, machine learning, (18 more...)

arXiv.org Machine Learning

1901.0877

Country:

North America > United States > Texas (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Systems & Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

High Dimensional Robust Estimation of Sparse Models via Trimmed Hard Thresholding

Liu, Liu, Li, Tianyang, Caramanis, Constantine

arXiv.org Machine LearningJan-24-2019

We study the problem of sparsity constrained $M$-estimation with arbitrary corruptions to both {\em explanatory and response} variables in the high-dimensional regime, where the number of variables $d$ is larger than the sample size $n$. Our main contribution is a highly efficient gradient-based optimization algorithm that we call Trimmed Hard Thresholding -- a robust variant of Iterative Hard Thresholding (IHT) by using trimmed mean in gradient computations. Our algorithm can deal with a wide class of sparsity constrained $M$-estimation problems, and we can tolerate a nearly dimension independent fraction of arbitrarily corrupted samples. More specifically, when the corrupted fraction satisfies $\epsilon \lesssim {1} /\left({\sqrt{k} \log (nd)}\right)$, where $k$ is the sparsity of the parameter, we obtain accurate estimation and model selection guarantees with optimal sample complexity. Furthermore, we extend our algorithm to sparse Gaussian graphical model (precision matrix) estimation via a neighborhood selection approach. We demonstrate the effectiveness of robust estimation in sparse linear, logistic regression, and sparse precision matrix estimation on synthetic and real-world US equities data.

algorithm 1, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1901.08237

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.50)

Industry: Banking & Finance (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Global Convergence of EM Algorithm for Mixtures of Two Component Linear Regression

Kwon, Jeongyeol, Caramanis, Constantine

arXiv.org Machine LearningOct-12-2018

The Expectation-Maximization algorithm is perhaps the most broadly used algorithm for inference of latent variable problems. A theoretical understanding of its performance, however, largely remains lacking. Recent results established that EM enjoys global convergence for Gaussian Mixture Models. For Mixed Regression, however, only local convergence results have been established, and those only for the high SNR regime. We show here that EM converges for mixed linear regression with two components (it is known not to converge for three or more), and moreover that this convergence holds for random initialization.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Machine Learning

1810.05752

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)

Add feedback

Entropic Latent Variable Discovery

Kocaoglu, Murat, Shakkottai, Sanjay, Dimakis, Alexandros G., Caramanis, Constantine, Vishwanath, Sriram

arXiv.org Artificial IntelligenceJul-26-2018

We consider the problem of discovering the simplest latent variable that can make two observed discrete variables conditionally independent. This problem has appeared in the literature as probabilistic latent semantic analysis (pLSA), and has connections to non-negative matrix factorization. When the simplicity of the variable is measured through its cardinality, we show that a solution to this latent variable discovery problem can be used to distinguish direct causal relations from spurious correlations among almost all joint distributions on simple causal graphs with two observed variables. Conjecturing a similar identifiability result holds with Shannon entropy, we study a loss function that trades-off between entropy of the latent variable and the conditional mutual information of the observed variables. We then propose a latent variable discovery algorithm -- LatentSearch -- and show that its stationary points are the stationary points of our loss function. We experimentally show that LatentSearch can indeed be used to distinguish direct causal relations from spurious correlations.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1807.10399

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

High Dimensional Robust Sparse Regression

Liu, Liu, Shen, Yanyao, Li, Tianyang, Caramanis, Constantine

arXiv.org Machine LearningMay-29-2018

We provide a novel -- and to the best of our knowledge, the first -- algorithm for high dimensional sparse regression with corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse parameters in the presence of a constant fraction of arbitrary corruptions. Our main contribution is a robust variant of Iterative Hard Thresholding. Using this, we provide accurate estimators with sub-linear sample complexity. Our algorithm consists of a novel randomized outlier removal technique for robust sparse mean estimation that may be of interest in its own right: it is orderwise more efficient computationally than existing algorithms, and succeeds with high probability, thus making it suitable for general use in iterative algorithms. We demonstrate the effectiveness on large-scale sparse regression problems with arbitrary corruptions.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1805.11643

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Approximate Newton-based statistical inference using only stochastic gradients

Li, Tianyang, Kyrillidis, Anastasios, Liu, Liu, Caramanis, Constantine

arXiv.org Machine LearningMay-22-2018

We present a novel inference framework for convex empirical risk minimization, using approximate stochastic Newton steps. The proposed algorithm is based on the notion of finite differences and allows the approximation of a Hessian-vector product from first-order information. In theory, our method efficiently computes the statistical error covariance in $M$-estimation, both for unregularized convex learning problems and high-dimensional LASSO regression, without using exact second order information, or resampling the entire data set. In practice, we demonstrate the effectiveness of our framework on large-scale machine learning problems, that go even beyond convexity: as a highlight, our work can be used to detect certain adversarial attacks on neural networks.

artificial intelligence, optimization problem, statistical inference, (16 more...)

arXiv.org Machine Learning

1805.0892

Country:

North America > United States > Texas (0.14)
North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Education > Focused Education > Special Education (0.44)
Information Technology > Security & Privacy (0.34)
Government > Military (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Statistical Inference Using SGD

Li, Tianyang (University of Texas at Austin) | Liu, Liu (University of Texas at Austin) | Kyrillidis, Anastasios ( IBM T.J. Watson Research Center, Yorktown Heights ) | Caramanis, Constantine (University of Texas at Austin)

AAAI ConferencesFeb-8-2018

We present a novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. To show the merits of our scheme, we apply it to both synthetic and real data sets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

artificial intelligence, inference, step size, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Add feedback

Statistical inference using SGD

Li, Tianyang, Liu, Liu, Kyrillidis, Anastasios, Caramanis, Constantine

arXiv.org Machine LearningNov-19-2017

We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

artificial intelligence, machine learning, sgd inference procedure, (13 more...)

arXiv.org Machine Learning

1705.07477

Genre: Research Report > New Finding (0.95)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback