Goto

Collaborating Authors

 Computational Learning Theory


The Mathematics of Machine Learning

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.


A Subsequence Interleaving Model for Sequential Pattern Mining

arXiv.org Machine Learning

Recent sequential pattern mining methods have used the minimum description length (MDL) principle to define an encoding scheme which describes an algorithm for mining the most compressing patterns in a database. We present a novel subsequence interleaving model based on a probabilistic model of the sequence database, which allows us to search for the most compressing set of patterns without designing a specific encoding scheme. Our proposed algorithm is able to efficiently mine the most relevant sequential patterns and rank them using an associated measure of interestingness. The efficient inference in our model is a direct result of our use of a structural expectation-maximization framework, in which the expectation-step takes the form of a submodular optimization problem subject to a coverage constraint. We show on both synthetic and real world datasets that our model mines a set of sequential patterns with low spuriousness and redundancy, high interpretability and usefulness in real-world applications. Furthermore, we demonstrate that the quality of the patterns from our approach is comparable to, if not better than, existing state of the art sequential pattern mining algorithms.


The State of Enterprise Machine Learning

#artificialintelligence

For a topic that generates so much interest, it is surprisingly difficult to find a concise definition of machine learning that satisfies everyone. Complicating things further is the fact that much of machine learning, at least in terms of its enterprise value, looks somewhat like existing analytics and business intelligence tools. To set the course for this three-part series that puts the scope of machine learning into enterprise context, we define machine learning as software that extracts high-value knowledge from data with little or no human supervision. Academics who work in formal machine learning theory may object to a definition that limits machine learning to software. In the enterprise, however, machine learning is software.


Cross: Efficient Low-rank Tensor Completion

arXiv.org Machine Learning

The completion of tensors, or high-order arrays, attracts significant attention in recent research. Current literature on tensor completion primarily focuses on recovery from a set of uniformly randomly measured entries, and the required number of measurements to achieve recovery is not guaranteed to be optimal. In addition, the implementation of some previous methods are NP-hard. In this article, we propose a framework for low-rank tensor completion via a novel tensor measurement scheme we name Cross. The proposed procedure is efficient and easy to implement. In particular, we show that a third order tensor of Tucker rank-$(r_1, r_2, r_3)$ in $p_1$-by-$p_2$-by-$p_3$ dimensional space can be recovered from as few as $r_1r_2r_3 + r_1(p_1-r_1) + r_2(p_2-r_2) + r_3(p_3-r_3)$ noiseless measurements, which matches the sample complexity lower-bound. In the case of noisy measurements, we also develop a theoretical upper bound and the matching minimax lower bound for recovery error over certain classes of low-rank tensors for the proposed procedure. The results can be further extended to fourth or higher-order tensors. Simulation studies show that the method performs well under a variety of settings. Finally, the procedure is illustrated through a real dataset in neuroimaging.


The Mathematics of Machine Learning

#artificialintelligence

In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I have observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow, R-caret etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.


The State of Enterprise Machine Learning

#artificialintelligence

For a topic that generates so much interest, it is surprisingly difficult to find a concise definition of machine learning that satisfies everyone. Complicating things further is the fact that much of machine learning, at least in terms of its enterprise value, looks somewhat like existing analytics and business intelligence tools. To set the course for this three-part series that puts the scope of machine learning into enterprise context, we define machine learning as software that extracts high-value knowledge from data with little or no human supervision. Academics who work in formal machine learning theory may object to a definition that limits machine learning to software. In the enterprise, however, machine learning is software.


Machine Learning Algorithm : ensemble (part 7 of 12)

#artificialintelligence

In machine learning and computational learning theory, Logit Boost is a boosting algorithm formulated by Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The original paper casts the AdaBoost algorithm into a statistical framework. Specifically, if one considers AdaBoost as a generalized additive model and then applies the cost functional of logistic regression, one can derive the LogitBoost algorithm. LogitBoost can be seen as a convex optimization. Bootstrap Aggregation (or Bagging for short), is a simple and very powerful ensemble method.


Machine Learning Theory - Part 2: Generalization Bounds

#artificialintelligence

Last time we concluded by noticing that minimizing the empirical risk (or the training error) is not in itself a solution to the learning problem, it could only be considered a solution if we can guarantee that the difference between the training error and the generalization error (which is also called the generalization gap) is small enough. That is if this probability is small, we can guarantee that the difference between the errors is not much, and hence the learning problem can be solved. In this part we'll start investigating that probability at depth and see if it indeed can be small, but before starting you should note that I skipped a lot of the mathematical proofs here. You'll often see phrases like "It can be proved that …", "One can prove …", "It can be shown that …", … etc without giving the actual proof. This is to make the post easier to read and to focus all the effort on the conceptual understanding of the subject.



Machine Learning Theory - Part 1: Introduction

#artificialintelligence

Now we can use the function f f(x), which we call the target function, as the proxy for the conditional distribution.