Inductive Learning
Resource Constrained Structured Prediction
Bolukbasi, Tolga, Chang, Kai-Wei, Wang, Joseph, Saligrama, Venkatesh
We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy.
Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs
Osokin, Anton, Alayrac, Jean-Baptiste, Lukasewitz, Isabella, Dokania, Puneet K., Lacoste-Julien, Simon
In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gapbased sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets.
Artificial Intelligence and the Future of Work
How can Artificial Intelligence (AI) help companies operate in the 21st century? How might it impact organisations and employees? AI has been around for years, but now it seems that it is taking the business world by storm. According to software startup advisor Steve Ardire (pictured right), it will fundamentally reshape organisations. "Human capital will start to shift from mundane tasks and transactions to higher-order and creative work. Along the way, we will see massive businesses where the technology transforms specific job functions," he tells me.
Muffled Semi-Supervised Learning
Balsubramani, Akshay, Freund, Yoav
We explore a novel approach to semi-supervised learning. This approach is contrary to the common approach in that the unlabeled examples serve to "muffle," rather than enhance, the guidance provided by the labeled examples. We provide several variants of the basic algorithm and show experimentally that they can achieve significantly higher AUC than boosted trees, random forests and logistic regression when unlabeled examples are available.
Valiance Improving Predictions with Ensemble Model
"Alone we can do so little and together we can do much" โ a phrase from Helen Keller during 50's is a reflection of achievements and successful stories in real life scenarios from decades. Same thing applies to most of the cases from innovation with big impacts and with advanced technologies world. The machine Learning domain is also in the same race to make predictions and classification in a more accurate way using so called ensemble method and it is proved that ensemble modeling offers one of the most convincing way to build highly accurate predictive models. Ensemble methods are learning models that achieve performance by combining the opinions of multiple learners. Typically, an ensemble model is a supervised learning technique for combining multiple weak learners or models to produce a strong learner with the concept of Bagging and Boosting for data sampling.
A Theory of Formal Synthesis via Inductive Learning
Jha, Susmit, Seshia, Sanjit A.
Formal synthesis is the process of generating a program satisfying a high-level formal specification. In recent times, effective formal synthesis methods have been proposed based on the use of inductive learning. We refer to this class of methods that learn programs from examples as formal inductive synthesis. In this paper, we present a theoretical framework for formal inductive synthesis. We discuss how formal inductive synthesis differs from traditional machine learning. We then describe oracle-guided inductive synthesis (OGIS), a framework that captures a family of synthesizers that operate by iteratively querying an oracle. An instance of OGIS that has had much practical impact is counterexample-guided inductive synthesis (CEGIS). We present a theoretical characterization of CEGIS for learning any program that computes a recursive language. In particular, we analyze the relative power of CEGIS variants where the types of counterexamples generated by the oracle varies. We also consider the impact of bounded versus unbounded memory available to the learning algorithm. In the special case where the universe of candidate programs is finite, we relate the speed of convergence to the notion of teaching dimension studied in machine learning theory. Altogether, the results of the paper take a first step towards a theoretical foundation for the emerging field of formal inductive synthesis.
Variable Sequence Lengths in TensorFlow
I recently wrote a guide on recurrent networks in TensorFlow. That covered the basics but often we want to learn on sequences of variable lengths, possibly even within the same batch of training examples. In this post, I will explain how to use variable length sequences in TensorFlow and what implications they have on your model. Since TensorFlow unfolds our recurrent network for a given number of steps, we can only feed sequences of that shape to the network. We also want the input to have a fixed size so that we can represent a training batch as a single tensor of shape batch_size x max_length x frame_size.
The one technology that's causing Google to rethink "everything" - SHARP SIGHT LABS
On Google's recent Q3 earnings call, Google's CEO, Sundar Pichai said that one "transformative" technology is causing Google to rethink "how we're doing everything." There's a single technology that's causing Google to rethink they way it does everything. The same technology is in the process of transforming many of the biggest names in tech -- Facebook, Amazon, Netflix, UBER, Twitter -- not to mention smaller, up-and-coming startups. Entrepreneur and thought leader Peter Diamandis say that it will "do more to improve healthcare than all the biological sciences combined" and will generate large amounts of wealth and abundance. Billionaire venture capitalist Vinod Khosla agrees, saying that over the next 50 years, it will drive abundance, transform industries, and impact almost every part of society.
Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation
Traditional graph-based semi-supervised learning (SSL) approaches, even though widely applied, are not suited for massive data and large label scenarios since they scale linearly with the number of edges $|E|$ and distinct labels $m$. To deal with the large label size problem, recent works propose sketch-based methods to approximate the distribution on labels per node thereby achieving a space reduction from $O(m)$ to $O(\log m)$, under certain conditions. In this paper, we present a novel streaming graph-based SSL approximation that captures the sparsity of the label distribution and ensures the algorithm propagates labels accurately, and further reduces the space complexity per node to $O(1)$. We also provide a distributed version of the algorithm that scales well to large data sizes. Experiments on real-world datasets demonstrate that the new method achieves better performance than existing state-of-the-art algorithms with significant reduction in memory footprint. We also study different graph construction mechanisms for natural language applications and propose a robust graph augmentation strategy trained using state-of-the-art unsupervised deep learning architectures that yields further significant quality gains.
Empirical Similarity for Absent Data Generation in Imbalanced Classification
When the training data in a two-class classification problem is overwhelmed by one class, most classification techniques fail to correctly identify the data points belonging to the underrepresented class. We propose Similarity-based Imbalanced Classification (SBIC) that learns patterns in the training data based on an empirical similarity function. To take the imbalanced structure of the training data into account, SBIC utilizes the concept of absent data, i.e. data from the minority class which can help better find the boundary between the two classes. SBIC simultaneously optimizes the weights of the empirical similarity function and finds the locations of absent data points. As such, SBIC uses an embedded mechanism for synthetic data generation which does not modify the training dataset, but alters the algorithm to suit imbalanced datasets. Therefore, SBIC uses the ideas of both major schools of thoughts in imbalanced classification: Like cost-sensitive approaches SBIC operates on an algorithm level to handle imbalanced structures; and similar to synthetic data generation approaches, it utilizes the properties of unobserved data points from the minority class. The application of SBIC to imbalanced datasets suggests it is comparable to, and in some cases outperforms, other commonly used classification techniques for imbalanced datasets.