Computational Learning Theory
Multi-step learning and underlying structure in statistical models
In multi-step learning, where a final learning task is accomplished via a sequence of intermediate learning tasks, the intuition is that successive steps or levels transform the initial data into representations more and more ``suited" to the final learning task. A related principle arises in transfer-learning where Baxter (2000) proposed a theoretical framework to study how learning multiple tasks transforms the inductive bias of a learner. The most widespread multi-step learning approach is semi-supervised learning with two steps: unsupervised, then supervised. Several authors (Castelli-Cover, 1996; Balcan-Blum, 2005; Niyogi, 2008; Ben-David et al, 2008; Urner et al, 2011) have analyzed SSL, with Balcan-Blum (2005) proposing a version of the PAC learning framework augmented by a ``compatibility function" to link concept class and unlabeled data distribution. We propose to analyze SSL and other multi-step learning approaches, much in the spirit of Baxter's framework, by defining a learning problem generatively as a joint statistical model on $X \times Y$. This determines in a natural way the class of conditional distributions that are possible with each marginal, and amounts to an abstract form of compatibility function. It also allows to analyze both discrete and non-discrete settings. As tool for our analysis, we define a notion of $\gamma$-uniform shattering for statistical models. We use this to give conditions on the marginal and conditional models which imply an advantage for multi-step learning approaches. In particular, we recover a more general version of a result of Poggio et al (2012): under mild hypotheses a multi-step approach which learns features invariant under successive factors of a finite group of invariances has sample complexity requirements that are additive rather than multiplicative in the size of the subgroups.
Supervised learning through the lens of compression
David, Ofir, Moran, Shay, Yehudayoff, Amir
This work continues the study of the relationship between sample compression schemes and statistical learning, which has been mostly investigated within the framework of binary classification. We first extend the investigation to multiclass categorization: we prove that in this case learnability is equivalent to compression of logarithmic sample size and that the uniform convergence property implies compression of constant size. We use the compressibility-learnability equivalence to show that (i) for multiclass categorization, PAC and agnostic PAC learnability are equivalent, and (ii) to derive a compactness theorem for learnability. We then consider supervised learning under general loss functions: we show that in this case, in order to maintain the compressibility-learnability equivalence, it is necessary to consider an approximate variant of compression. We use it to show that PAC and agnostic PAC are not equivalent, even when the loss function has only three values.
Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models
Vuffray, Marc, Misra, Sidhant, Lokhov, Andrey, Chertkov, Michael
We consider the problem of learning the underlying graph of an unknown Ising model on p spins from a collection of i.i.d. samples generated from the model. We suggest a new estimator that is computationally efficient and requires a number of samples that is near-optimal with respect to previously established information theoretic lower-bound. Our statistical estimator has a physical interpretation in terms of "interaction screening". The estimator is consistent and is efficiently implemented using convex optimization. We prove that with appropriate regularization, the estimator recovers the underlying graph using a number of samples that is logarithmic in the system size p and exponential in the maximum coupling-intensity and maximum node-degree.
On the Recursive Teaching Dimension of VC Classes
Chen, Xi, Chen, Xi, Cheng, Yu, Tang, Bo
The recursive teaching dimension (RTD) of a concept class $C \subseteq \{0, 1\}^n$, introduced by Zilles et al. [ZLHZ11], is a complexity parameter measured by the worst-case number of labeled examples needed to learn any target concept of $C$ in the recursive teaching model. In this paper, we study the quantitative relation between RTD and the well-known learning complexity measure VC dimension (VCD), and improve the best known upper and (worst-case) lower bounds on the recursive teaching dimension with respect to the VC dimension. Given a concept class $C \subseteq \{0, 1\}^n$ with $VCD(C) = d$, we first show that $RTD(C)$ is at most $d 2^{d+1}$. This is the first upper bound for $RTD(C)$ that depends only on $VCD(C)$, independent of the size of the concept class $|C|$ and its~domain size $n$. Before our work, the best known upper bound for $RTD(C)$ is $O(d 2^d \log \log |C|)$, obtained by Moran et al. [MSWY15]. We remove the $\log \log |C|$ factor. We also improve the lower bound on the worst-case ratio of $RTD(C)$ to $VCD(C)$. We present a family of classes $\{ C_k \}_{k \ge 1}$ with $VCD(C_k) = 3k$ and $RTD(C_k)=5k$, which implies that the ratio of $RTD(C)$ to $VCD(C)$ in the worst case can be as large as $5/3$. Before our work, the largest ratio known was $3/2$ as obtained by Kuhlmann [Kuh99]. Since then, no finite concept class $C$ has been known to satisfy $RTD(C) > (3/2) VCD(C)$.
The real prerequisite for machine learning isn't math, it's data analysis - SHARP SIGHT LABS
When beginners get started with machine learning, the inevitable question is "what are the prerequisites? What do I need to know to get started?" A list like this is enough to intimidate anyone but a person with an advanced math degree. It's unfortunate, because I think a lot of beginners lose heart and are scared away by this advice. If you're intimidated by the math, I have some good news for you: in order to get started building machine learning models (as opposed to doing machine learning theory), you need less math background than you think (and almost certainly less math than you've been told that you need).
Intelligent Things It's all about machine learning
Evolving from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores software algorithms that can learn from, and make predictions on volumes of data. Simply stated... Machine learning helps humans make data-driven decisions. Machine learning offers practical solutions that can maximize resource utilization, prolong the lifespan of IoT sensors, platforms and networks, and enables dynamic services architecture. Our connected world is increasingly dependent on big data -- at rest, and in years to come, streaming fast data -- in motion." With real-time predictive models, once a streaming fast data point has been observed it might never be seen again.
Artificial Intelligence and Machine Learning in Big Data and IoT: The Market for Data Capture โฆ
Artificial Intelligence and Machine Learning in Big Data and IoT: The Market for Data Capture โฆ NEW YORK, Dec. 16, 2016 /PRNewswire/ Overview:More than 50% of enterprise IT organizations are experimenting with Artificial Intelligence (AI) in various forms such as Machine Learning, Deep Learning, Computer Vision, Image Recognition, Voice Recognition, Artificial Neural Networks, and more. AI is not a single technology but a convergence of various technologies, statistical models, algorithms, and approaches. Machine Learning is a sub-field of computer science that evolved from the study of pattern recognition and computational learning theory in AI.Every large corporation collects and maintains a huge amount of human-oriented data associated with its customers including their preferences, purchases, habits, and other personal information. As the Internet of Things (IoT) progresses, there will an increasingly large amount of unstructured machine data.
Data Science & Machine Learning Training Workshop
Data Science Middle East Foundation in partnership with EVERATI running 3-day training workshop series across Middle East to get you started on your data science and machine learning journey, as you learn how to use data and science to deliver insights, value and innovation. Data Science and Machine Learning workshop is a 3-day practical training program for applied introduction to data science industry practices and models of machine learning. The workshop has a strong focus on gaining hands-on experience implementing algorithms and building predictive models on real datasets. By the end of the workshop, participants will be ready to implement the machine learning algorithms using data science on their own data, and immediately generate business value. The workshop will take participants through the conceptual and applied foundations of the subject.
Machine Learning Theory - Part 3: Regularization and the Bias-variance Trade-off
In first part we explored the statistical model underlying the machine learning problem, and used it to formalize the problem in terms of obtaining the minimum generalization error. By noting that we cannot directly evaluate the generalization error of an ML model, we continued in the second part by establishing a theory that relates this elusive generalization error to another error metric that we can actually evaluate, which is the empirical error. That is: the generalization error (or the risk) $R(h)$ is bounded by the empirical risk (or the training error) plus a term that is proportionate to the complexity (or the richness) of the hypothesis space $ \mathcal{H} $, the dataset size $N$, and the degree of certainty $1 - \delta$ about the bound. Starting from this part, and based on this simplified theoretical result, we'll begin to draw some practical concepts for the process of solving the ML problem. We'll start by trying to get more intuition about why a more complex hypothesis space is bad.
The Mathematics of Machine Learning
In the last few months, I have had several people contact me about their enthusiasm for venturing into the world of data science and using Machine Learning (ML) techniques to probe statistical regularities and build impeccable data-driven products. However, I've observed that some actually lack the necessary mathematical intuition and framework to get useful results. This is the main reason I decided to write this blog post. Recently, there has been an upsurge in the availability of many easy-to-use machine and deep learning packages such as scikit-learn, Weka, Tensorflow etc. Machine Learning theory is a field that intersects statistical, probabilistic, computer science and algorithmic aspects arising from learning iteratively from data and finding hidden insights which can be used to build intelligent applications. Despite the immense possibilities of Machine and Deep Learning, a thorough mathematical understanding of many of these techniques is necessary for a good grasp of the inner workings of the algorithms and getting good results.