AITopics | Mathematical & Statistical Methods

Collaborating Authors

Mathematical & Statistical Methods

News Overviews Instructional Materials AI-Alerts Classics

Local identifiability of $l_1$-minimization dictionary learning: a sufficient and almost necessary condition

arXiv.org Machine LearningJul-12-2016

We study the theoretical properties of learning a dictionary from $N$ signals $\mathbf x_i\in \mathbb R^K$ for $i=1,...,N$ via $l_1$-minimization. We assume that $\mathbf x_i$'s are $i.i.d.$ random linear combinations of the $K$ columns from a complete (i.e., square and invertible) reference dictionary $\mathbf D_0 \in \mathbb R^{K\times K}$. Here, the random linear coefficients are generated from either the $s$-sparse Gaussian model or the Bernoulli-Gaussian model. First, for the population case, we establish a sufficient and almost necessary condition for the reference dictionary $\mathbf D_0$ to be locally identifiable, i.e., a local minimum of the expected $l_1$-norm objective function. Our condition covers both sparse and dense cases of the random linear coefficients and significantly improves the sufficient condition by Gribonval and Schnass (2010). In addition, we show that for a complete $\mu$-coherent reference dictionary, i.e., a dictionary with absolute pairwise column inner-product at most $\mu\in[0,1)$, local identifiability holds even when the random linear coefficient vector has up to $O(\mu^{-2})$ nonzeros on average. Moreover, our local identifiability results also translate to the finite sample case with high probability provided that the number of signals $N$ scales as $O(K\log K)$.

artificial intelligence, identifiability, machine learning, (16 more...)

arXiv.org Machine Learning

1505.04363

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.46)

Add feedback

From Dependence to Causation

Lopez-Paz, David

arXiv.org Machine LearningJul-12-2016

Machine learning is the science of discovering statistical dependencies in data, and the use of those dependencies to perform predictions. During the last decade, machine learning has made spectacular progress, surpassing human performance in complex tasks such as object recognition, car driving, and computer gaming. However, the central role of prediction in machine learning avoids progress towards general-purpose artificial intelligence. As one way forward, we argue that causal inference is a fundamental component of human intelligence, yet ignored by learning algorithms. Causal inference is the problem of uncovering the cause-effect relationships between the variables of a data generating system. Causal structures provide understanding about how these systems behave under changing, unseen environments. In turn, knowledge about these causal dynamics allows to answer "what if" questions, describing the potential responses of the system under hypothetical manipulations and interventions. Thus, understanding cause and effect is one step from machine learning towards machine reasoning and machine intelligence. But, currently available causal inference algorithms operate in specific regimes, and rely on assumptions that are difficult to verify in practice. This thesis advances the art of causal inference in three different ways. First, we develop a framework for the study of statistical dependence based on copulas and random features. Second, we build on this framework to interpret the problem of causal inference as the task of distribution classification, yielding a family of novel causal inference algorithms. Third, we discover causal structures in convolutional neural network features using our algorithms. The algorithms presented in this thesis are scalable, exhibit strong theoretical guarantees, and achieve state-of-the-art performance in a variety of real-world benchmarks.

artificial intelligence, bayesian inference, machine learning, (21 more...)

arXiv.org Machine Learning

1607.033

Country:

Europe (1.00)
North America > United States > California (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Education (1.00)
Leisure & Entertainment (0.87)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (1.00)
(3 more...)

Add feedback

The Oracle of Arithmetic Works Best Without Writing Down a Thing

WIREDJul-3-2016, 12:45:24 GMT

In 2010, a startling rumor filtered through the number theory community and reached Jared Weinstein. Apparently, some graduate student at the University of Bonn in Germany had written a paper that redid "Harris-Taylor"--a 288-page book dedicated to a single impenetrable proof in number theory--in only 37 pages. The 22-year-old student, Peter Scholze, had found a way to sidestep one of the most complicated parts of the proof, which deals with a sweeping connection between number theory and geometry. "It was just so stunning for someone so young to have done something so revolutionary," said Weinstein, a 34-year-old number theorist now at Boston University. Mathematicians at the University of Bonn, who made Scholze a full professor just two years later, were already aware of his extraordinary mathematical mind.

artificial intelligence, perfectoid space, scholze, (17 more...)

WIRED

Country:

Europe > Germany (0.25)
North America > United States > Michigan (0.04)

Industry: Education > Educational Setting > Higher Education (0.55)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.92)

Add feedback

Expectation propagation for continuous time stochastic processes

Cseke, Botond, Schnoerr, David, Opper, Manfred, Sanguinetti, Guido

arXiv.org Machine LearningJun-28-2016

Physical and technological processes frequently exhibit intrinsic stochasticity. The main mathematical framework to describe and reason about such systems is provided by the theory of continuous time (Markovian) stochastic processes. Such processes have been well studied in chemical physics for several decades as models of chemical reactions at very low concentrations [Gardiner, 1985, e.g.]. More recently, the theory has found novel and diverse areas of application including systems biology at the single cell level [Wilkinson, 2011], ecology [Volkov et al., 2007] and performance modelling in computer systems [Hillston, 2005], to name but a few. The popularity of the approach has been greatly enhanced by the availability of efficient and accurate simulation algorithms [Gillespie, 1977, Gillespie et al., 2013], which permit a numerical solution of medium-sized systems within a reasonable time frame. As with most of science, many of the application domains of continuous time stochastic processes are becoming increasingly data-rich, creating a critical demand for inference algorithms which can use data to calibrate the models and analyse the uncertainty in the predictions. This raises new challenges and opportunities for statistics and machine learning, and has motivated the development of several algorithms for efficient inference in these systems. In this paper, we focus on the Bayesian approach, and formulate the inverse problem in terms of obtaining an approximation to a posterior distribution over the stochastic process, given observations of the system and using existing scientific information to build a prior model of the process.

approximation, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1088/1751-8113/49/49/494002

1512.06098

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Linear Algebra Mathematics MIT OpenCourseWare

#artificialintelligenceJun-12-2016, 05:00:39 GMT

This course covers matrix theory and linear algebra, emphasizing topics useful in other disciplines such as physics, economics and social sciences, natural sciences, and engineering. It parallels the combination of theory and applications in Professor Strang's textbook Introduction to Linear Algebra. This course has been designed for independent study. It provides everything you will need to understand the concepts covered in the course.

artificial intelligence, linear algebra mathematics mit opencourseware

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)

Add feedback

Network analytics: more than pretty pictures

@machinelearnbotJun-11-2016, 03:20:18 GMT

Network analysis is a rapidly growing analytics domain propelled by the explosion of interest in social networking. The methods rest upon much older foundations in the realms of statistics and social science. Euler's graph theory was proposed in the early 18th century and Moreno established the foundations for social network analysis (SNA) in the 1930's. One of the exciting aspects of network analysis is the ability to generate elaborate and insightful visualizations. Indeed this can be a valuable tool for discovery and pattern identification.

artificial intelligence, social media, visualization, (9 more...)

@machinelearnbot

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.38)

Add feedback

Selection of resources to learn Artificial Intelligence / Machine Learning / Statistical Inference…

#artificialintelligenceJun-5-2016, 20:05:23 GMT

This is a very incomplete and subjective selection of resources to learn about the algorithms and maths of Artificial Intelligence (AI) / Machine Learning (ML) / Statistical Inference (SI) / Deep Learning (DL) / Reinforcement Learning (RL) -- for beginners. It is not an exhaustive list and only contains some of the learning materials that I have personally completed so that I can include brief personal comments on them. It is also by no means the best path to follow (nowadays most MOOCs have full paths all the way from basic statistics and linear algebra to ML/DL). But this is the path I took and in a sense it's a partial documentation of my personal journey into DL (actually I bounced around all of these back and forth like crazy). As someone who has no formal background in Computer Science (but has been programming for many years), the language, notation and concepts of ML/SI/DL and even CS was completely alien to me, and the learning curve was not only steep, but vertical, treacherous and slippery like ice.

artificial intelligence, machine learning, selection, (6 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.52)

Industry: Education (0.78)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.38)

Add feedback

A Sharp Bound on the Computation-Accuracy Tradeoff for Majority Voting Ensembles

Lopes, Miles E.

arXiv.org Machine LearningJun-3-2016

When random forests are used for binary classification, an ensemble of $t=1,2,\dots$ randomized classifiers is generated, and the predictions of the classifiers are aggregated by majority vote. Due to the randomness in the algorithm, there is a natural tradeoff between statistical performance and computational cost. On one hand, as $t$ increases, the (random) prediction error of the ensemble tends to decrease and stabilize. On the other hand, larger ensembles require greater computational cost for training and making new predictions. The present work offers a new approach for quantifying this tradeoff: Given a fixed training set $\mathcal{D}$, let the random variables $\text{Err}_{t,0}$ and $\text{Err}_{t,1}$ denote the class-wise prediction error rates of a randomly generated ensemble of size $t$. As $t\to\infty$, we provide a general bound on the "algorithmic variance", $\text{var}(\text{Err}_{t,l}|\mathcal{D})\leq \frac{f_l(1/2)^2}{4t}+o(\frac{1}{t})$, where $l\in\{0,1\}$, and $f_l$ is a density function that arises from the ensemble method. Conceptually, this result is somewhat surprising, because $\text{var}(\text{Err}_{t,l}|\mathcal{D})$ describes how $\text{Err}_{t,l}$ varies over repeated runs of the algorithm, and yet, the formula leads to a method for bounding $\text{var}(\text{Err}_{t,l}|\mathcal{D})$ with a single ensemble. The bound is also sharp in the sense that it is attained by an explicit family of randomized classifiers. With regard to the task of estimating $f_l(1/2)$, the presence of the ensemble leads to a unique twist on the classical setup of non-parametric density estimation --- wherein the effects of sample size and computational cost are intertwined. In particular, we propose an estimator for $f_l(1/2)$, and derive an upper bound on its MSE that matches "standard optimal non-parametric rates" when $t$ is sufficiently large.

artificial intelligence, ensemble, machine learning, (17 more...)

arXiv.org Machine Learning

1303.0727

Country:

North America > United States > Wisconsin (0.04)
North America > United States > California > Yolo County > Davis (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area (0.98)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)

Add feedback

Recycling Randomness with Structure for Sublinear time Kernel Expansions

Choromanski, Krzysztof, Sindhwani, Vikas

arXiv.org Machine LearningMay-29-2016

We propose a scheme for recycling Gaussian random vectors into structured matrices to approximate various kernel functions in sublin-ear time via random embeddings. Our framework includes the Fastfood construction of Le et al. (2013) as a special case, but also extends to Circulant, Toeplitz and Hankel matrices, and the broader family of structured matrices that are characterized by the concept of low-displacement rank. We introduce notions of coherence and graph-theoretic structural constants that control the approximation quality, and prove unbiasedness and low-variance properties of random feature maps that arise within our framework. For the case of low-displacement matrices, we show how the degree of structure and randomness can be controlled to reduce statistical variance at the cost of increased computation and storage requirements. Empirical results strongly support our theory and justify the use of a broader family of structured matrices for scaling up kernel methods using random features.

artificial intelligence, machine learning, matrix, (17 more...)

arXiv.org Machine Learning

1605.09049

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

A simple and provable algorithm for sparse diagonal CCA

Asteris, Megasthenis, Kyrillidis, Anastasios, Koyejo, Oluwasanmi, Poldrack, Russell

arXiv.org Machine LearningMay-28-2016

Given two sets of variables, derived from a common set of samples, sparse Canonical Correlation Analysis (CCA) seeks linear combinations of a small number of variables in each set, such that the induced canonical variables are maximally correlated. Sparse CCA is NP-hard. We propose a novel combinatorial algorithm for sparse diagonal CCA, i.e., sparse CCA under the additional assumption that variables within each set are standardized and uncorrelated. Our algorithm operates on a low rank approximation of the input data and its computational complexity scales linearly with the number of input variables. It is simple to implement, and parallelizable. In contrast to most existing approaches, our algorithm administers precise control on the sparsity of the extracted canonical vectors, and comes with theoretical data-dependent global approximation guarantees, that hinge on the spectrum of the input data. Finally, it can be straightforwardly adapted to other constrained variants of CCA enforcing structure beyond sparsity. We empirically evaluate the proposed scheme and apply it on a real neuroimaging dataset to investigate associations between brain activity and behavior measurements.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1605.08961

Country:

North America > United States > Illinois (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)

Add feedback