Goto

Collaborating Authors

 Mathematical & Statistical Methods


How and Why: Decorrelate Time Series

@machinelearnbot

When dealing with time series, the first step consists in isolating trends and periodicites. Once this is done, we are left with a normalized time series, and studying the auto-correlation structure is the next step, called model fitting. The purpose is to check whether the underlying data follows some well known stochastic process with a similar auto-correlation structure, such as ARMA processes, using tools such as Box and Jenkins. Once a fit with a specific model is found, model parameters can be estimated and used to make predictions. A deeper investigation consists in isolating the auto-correlations to see whether the remaining values, once decorrelated, behave like white noise, or not.


Querying Partially Labelled Data to Improve a K-nn Classifier

AAAI Conferences

When learning from instances whose output labels may be partial, the problem of knowing which of these output labels should be made precise to improve the accuracy of predictions arises. This problem can be seen as the intersection of two tasks: the one of learning from partial labels and the one of active learning, where the goal is to provide the labels of additional instances to improve the model accuracy. In this paper, we propose querying strategies of partial labels for the well-known K-nn classifier. We propose different criteria of increasing complexity, using among other things the amount of ambiguity that partial labels introduce in the K-nn decision rule. We then show that our strategies usually outperform simple baseline schemes, and that more complex strategies provide a faster improvement of the model accuracies.


Lifted Inference for Convex Quadratic Programs

AAAI Conferences

Symmetry is the essential element of lifted inferencethat has recently demonstrated the possibility to perform very efficient inference in highly-connected, but symmetric probabilistic models. This raises the question, whether this holds for optimization problems in general.Here we show that for a large classof optimization methods this is actually the case.Specifically, we introduce the concept of fractionalsymmetries of convex quadratic programs (QPs),which lie at the heart of many AI and machine learning approaches,and exploit it to lift, i.e., to compress QPs.These lifted QPs can then be tackled with the usual optimization toolbox (off-the-shelf solvers, cutting plane algorithms,stochastic gradients etc.). If the original QP exhibitssymmetry, then the lifted one will generallybe more compact, and hence more efficient to solve.


The Symbolic Interior Point Method

AAAI Conferences

Numerical optimization is arguably the most prominent computational framework in machine learning and AI. It can be seen as an assembly language for hard combinatorial problems ranging from classification and regression in learning, to computing optimal policies and equilibria in decision theory, to entropy minimization in information sciences. Unfortunately, specifying such problems in complex domains involving relations, objects and other logical dependencies is cumbersome at best, requiring considerable expert knowledge, and solvers require models to be painstakingly reduced to standard forms. To overcome this, we introduce a rich modeling framework for optimization problems that allows convenient codification of symbolic structure. Rather than reducing this symbolic structure to a sparse or dense matrix, we represent and exploit it directly using algebraic decision diagrams (ADDs). Combining efficient ADD-based matrix-vector algebra with a matrix-free interior-point method, we develop an engine that can fully leverage the structure of symbolic representations to solve convex linear and quadratic optimization problems. We demonstrate the flexibility of the resulting symbolic-numeric optimizer on decision making and compressed sensing tasks with millions of non-zero entries.


Going Beyond Primal Treewidth for (M)ILP

AAAI Conferences

Integer Linear Programming (ILP) and its mixed variant (MILP) are archetypical examples of NP-complete optimization problems which have a wide range of applications in various areas of artificial intelligence. However, we still lack a thorough understanding of which structural restrictions make these problems tractable. Here we focus on structure captured via so-called decompositional parameters, which have been highly successful in fields such as boolean satisfiability and constraint satisfaction but have not yet reached their full potential in the ILP setting. In particular, primal treewidth (an established decompositional parameter) can only be algorithmically exploited to solve ILP under restricted circumstances. Our main contribution is the introduction and algorithmic exploitation of two new decompositional parameters for ILP and MILP. The first, torso-width, is specifically tailored to the linear programming setting and is the first decompositional parameter which can also be used for MILP. The latter, incidence treewidth, is a concept which originates from boolean satisfiability but has not yet been used in the ILP setting; here we obtain a full complexity landscape mapping the precise conditions under which incidence treewidth can be used to obtain efficient algorithms. Both of these parameters overcome previous shortcomings of primal treewidth for ILP in unique ways, and consequently push the frontiers of tractability for these important problems.


Introduction to Number Theory: Fascinating Facts and Conjectures about Primes and Other Special Numbers

@machinelearnbot

I discuss here off-the-beaten-path beautiful, even spectacular results from number theory: not just about prime numbers, but also about related problems such as integers that are sum of two squares. The connection between these numbers and prime numbers will appear later in this article. A few important unsolved mathematical conjectures are presented in a unified approach, and some new research material is also introduced, especially an attempt at generalizing and unifying concepts related to data set density and limiting distributions. The approach is very applied, focusing on algorithms, simulations, and big data, to help discover fascinating results. Even though some of the most exciting topics of mathematics are discussed here (including fundamental, century-old problems still unresolved as well as brand new hypotheses), most of the article can be understood by the layman. Among other things, you will learn some new ways to estimate Pi based on non-traditional experiments, or how a conjecture for prime numbers somehow generalizes to apply to Fibonacci numbers as well.


SciPy Tutorial: Linear Algebra

#artificialintelligence

An array is, structurally speaking, nothing but pointers. It contains information about the raw data, how to locate an element and how to interpret an element. The memory address and strides are important when you dive deeper into the lower-level details of arrays, while the data type and shape are things that beginners should surely know and understand. Two other attributes that you might want to consider are the data and size, which allow you to gather even more information on your array. Refresh the usage of the ndarray attributes in the following DataCamp Light chunk.


Free Book: Probability and Statistics Cookbook - DataScienceCentral.com

#artificialintelligence

The format is very similar to a BIG cheat sheet. It is based on literature and in-class material from courses of the statistics department at the University of California in Berkeley but also influenced by other sources . To read the PDF version, click here.


Distributed Inexact Damped Newton Method: Data Partitioning and Work-Balancing

AAAI Conferences

In this paper, we study inexact damped Newton method implemented in a distributed environment. We are motivated by the original DiSCO algorithm [Communication-Efficient Distributed Optimization of Self-Concordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015].We show that this algorithm may not scale well and propose algorithmic modifications which lead to fewer communications and better load-balancing between nodes. Those modifications lead to a more efficient algorithm with better scaling. This was made possibly by introducing our new pre-conditioner which is specially designed so that the preconditioning step can be solved exactly and efficiently.Numerical experiments for minimization of regularized empirical loss with a 273GB instance shows the efficiency of proposed algorithm.


Edge-exchangeable graphs and sparsity (NIPS 2016)

arXiv.org Machine Learning

Many popular network models rely on the assumption of (vertex) exchangeability, in which the distribution of the graph is invariant to relabelings of the vertices. However, the Aldous-Hoover theorem guarantees that these graphs are dense or empty with probability one, whereas many real-world graphs are sparse. We present an alternative notion of exchangeability for random graphs, which we call edge exchangeability, in which the distribution of a graph sequence is invariant to the order of the edges. We demonstrate that edge-exchangeable models, unlike models that are traditionally vertex exchangeable, can exhibit sparsity. To do so, we outline a general framework for graph generative models; by contrast to the pioneering work of Caron and Fox (2015), models within our framework are stationary across steps of the graph sequence. In particular, our model grows the graph by instantiating more latent atoms of a single random measure as the dataset size increases, rather than adding new atoms to the measure.