Goto

Collaborating Authors

 Computational Learning Theory


The Mathematics of Machine Learning

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I have observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow, R-caret etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.


The State of Enterprise Machine Learning

#artificialintelligence

For a topic that generates so much interest, it is surprisingly difficult to find a concise definition of machine learning that satisfies everyone. Complicating things further is the fact that much of machine learning, at least in terms of its enterprise value, looks somewhat like existing analytics and business intelligence tools. To set the course for this three-part series that puts the scope of machine learning into enterprise context, we define machine learning as software that extracts high-value knowledge from data with little or no human supervision. Academics who work in formal machine learning theory may object to a definition that limits machine learning to software. In the enterprise, however, machine learning is software.


Machine Learning Algorithm : ensemble (part 7 of 12)

#artificialintelligence

In machine learning and computational learning theory, Logit Boost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost functional of logistic regression, one can derive the LogitBoost algorithm. LogitBoost can be seen as a convex optimization. Bootstrap Aggregation (or Bagging for short), is a simple and very powerful ensemble method.


Machine Learning Theory - Part 2: Generalization Bounds

#artificialintelligence

Last time we concluded by noticing that minimizing the empirical risk (or the training error) is not in itself a solution to the learning problem, it could only be considered a solution if we can guarantee that the difference between the training error and the generalization error (which is also called the generalization gap) is small enough. That is if this probability is small, we can guarantee that the difference between the errors is not much, and hence the learning problem can be solved. In this part we'll start investigating that probability at depth and see if it indeed can be small, but before starting you should note that I skipped a lot of the mathematical proofs here. You'll often see phrases like "It can be proved that โ€ฆ", "One can prove โ€ฆ", "It can be shown that โ€ฆ", โ€ฆ etc without giving the actual proof. This is to make the post easier to read and to focus all the effort on the conceptual understanding of the subject.



Machine Learning Theory - Part 1: Introduction

#artificialintelligence

Now we can use the function f f(x), which we call the target function, as the proxy for the conditional distribution.


Optimal learning with Bernstein Online Aggregation

arXiv.org Machine Learning

We introduce a new recursive aggregation procedure called Bernstein Online Aggregation (BOA). The exponential weights include an accuracy term and a second order term that is a proxy of the quadratic variation as in Hazan and Kale (2010). This second term stabilizes the procedure that is optimal in different senses. We first obtain optimal regret bounds in the deterministic context. Then, an adaptive version is the first exponential weights algorithm that exhibits a second order bound with excess losses that appears first in Gaillard et al. (2014). The second order bounds in the deterministic context are extended to a general stochastic context using the cumulative predictive risk. Such conversion provides the main result of the paper, an inequality of a novel type comparing the procedure with any deterministic aggregation procedure for an integrated criteria. Then we obtain an observable estimate of the excess of risk of the BOA procedure. To assert the optimality, we consider finally the iid case for strongly convex and Lipschitz continuous losses and we prove that the optimal rate of aggregation of Tsybakov (2003) is achieved. The batch version of the BOA procedure is then the first adaptive explicit algorithm that satisfies an optimal oracle inequality with high probability.


Refined Error Bounds for Several Learning Algorithms

arXiv.org Machine Learning

This article studies the achievable guarantees on the error rates of certain learning algorithms, with particular focus on refining logarithmic factors. Many of the results are based on a general technique for obtaining bounds on the error rates of sample-consistent classifiers with monotonic error regions, in the realizable case. We prove bounds of this type expressed in terms of either the VC dimension or the sample compression size. This general technique also enables us to derive several new bounds on the error rates of general sample-consistent learning algorithms, as well as refined bounds on the label complexity of the CAL active learning algorithm. Additionally, we establish a simple necessary and sufficient condition for the existence of a distribution-free bound on the error rates of all sample-consistent learning rules, converging at a rate inversely proportional to the sample size. We also study learning in the presence of classification noise, deriving a new excess error rate guarantee for general VC classes under Tsybakov's noise condition, and establishing a simple and general necessary and sufficient condition for the minimax excess risk under bounded noise to converge at a rate inversely proportional to the sample size.


European Commission : CORDIS : News and Events : How maggots are influencing the future of robotics

#artificialintelligence

What can software designers and ICT specialists learn from maggots? Quite a lot, it would appear. Through understanding how complex learning processes in simple organisms work, EU-funded scientists hope to usher in an era of self-learning robots and predictive computing. Even with limited brain power, an organism can choose the right thing to do in response to external stimuli, which is something that current computational learning theory cannot fully account for. Learning from maggots The EU-funded MINIMAL project, launched in 2014, has focused on the learning processes in a relatively simple animal, the fruit fly larva (maggots).


The Mathematics of Machine Learning

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.