Education
Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer
Isele, David, Rostami, Mohammad, Eaton, Eric
Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.
High-dimensional dynamics of generalization error in neural networks
Advani, Madhu S., Saxe, Andrew M.
We perform an average case analysis of the generalization dynamics of large neural networks trained using gradient descent. We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset. Using random matrix theory and exact solutions in linear models, we derive the generalization error and training error dynamics of learning and analyze how they depend on the dimensionality of data and signal to noise ratio of the learning problem. We find that the dynamics of gradient descent learning naturally protect against overtraining and overfitting in large networks. Overtraining is worst at intermediate network sizes, when the effective number of free parameters equals the number of samples, and thus can be reduced by making a network smaller or larger. Additionally, in the high-dimensional regime, low generalization error requires starting with small initial weights. We then turn to non-linear neural networks, and show that making networks very large does not harm their generalization performance. On the contrary, it can in fact reduce overtraining, even without early stopping or regularization of any sort. We identify two novel phenomena underlying this behavior in overcomplete models: first, there is a frozen subspace of the weights in which no learning occurs under gradient descent; and second, the statistical properties of the high-dimensional regime yield better-conditioned input correlations which protect against overtraining. We demonstrate that naive application of worst-case theories such as Rademacher complexity are inaccurate in predicting the generalization performance of deep neural networks, and derive an alternative bound which incorporates the frozen subspace and conditioning effects and qualitatively matches the behavior observed in simulation.
Fast and Strong Convergence of Online Learning Algorithms
In this paper, we study the online learning algorithm without explicit regularization terms. This algorithm is essentially a stochastic gradient descent scheme in a reproducing kernel Hilbert space (RKHS). The polynomially decaying step size in each iteration can play a role of regularization to ensure the generalization ability of online learning algorithm. We develop a novel capacity dependent analysis on the performance of the last iterate of online learning algorithm. The contribution of this paper is two-fold. First, our nice analysis can lead to the convergence rate in the standard mean square distance which is the best so far. Second, we establish, for the first time, the strong convergence of the last iterate with polynomially decaying step sizes in the RKHS norm. We demonstrate that the theoretical analysis established in this paper fully exploits the fine structure of the underlying RKHS, and thus can lead to sharp error estimates of online learning algorithm.
Gaussian Processes for Data-Efficient Learning in Robotics and Control
Deisenroth, Marc Peter, Fox, Dieter, Rasmussen, Carl Edward
Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this article, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.
From artificial intelligence to design thinking: How reskilling is changing Indian IT landscape
Reskilling is the buzzword in the IT sector. With the sector seeing huge churn due to automation and protectionism in the western markets, industry lobby group Nasscom's president R Chandrashekhar told employees in May: Re-skill or perish. The sector is seeing layoffs and voluntary severances. Companies' hiring is on the decline. One estimate even puts the likely job loss at a whopping 2 lakh over the next three years. And in that, the sector is class agnostic.
AI Frontiers Conference
Redmarlin RedMarlin protects brands from online infringement and abuse. Every year, companies lose billions of dollars to counterfeiting and millions to phishing and other online scams infringing on their brand. RedMarlin monitors the internet, detects fake websites in real-time using Computer Vision, NLP and Deep Learning, and achieves fastest take downs, so brands can focus on their core business. Presenter: Abhishek Dubey, Co-founder & CEO; Shashi Prakash, Co-founder & Chief Scientist TrueShelf TrueShelf is an AI powered adaptive learning platform that understands the relationships between thousands of math skills, adaptively generates an unlimited number of problems, and helps students learn mathematics rigorously by providing them instant granular feedback. TrueShelf analyzes students' abilities and identifies mistakes, from conceptual mistakes to application mistakes, and even silly mistakes, then guides them in a personalized manner.
Does L&D Need Data Science
Some time ago I wrote some articles here relating to data and L&D. I ended one of those articles with a question which is, does L&D need data science? I intend to answer that question with this post. Without wasting much time, I will say that the answer is yes. Now before you roll your eyes, let's answer another question, what really is data science?
Scalable programming with Scala and Spark - Udemy
This team has decades of practical experience in working with Java and with billions of rows of data. If you are an analyst or a data scientist, you're used to having multiple systems for working with data. With Spark, you have a single engine where you can explore and play with large amounts of data, run machine learning algorithms and then use the same system to productionize your code. Scala: Scala is a general purpose programming language - like Java or C . It's functional programming nature and the availability of a REPL environment make it particularly suited for a distributed computing framework like Spark.
Teaching Cars To See -- Advanced Lane Detection Using Computer Vision
Identifying lanes on the road is a common task performed by all human drivers to ensure their vehicles are within lane constraints when driving, so as to make sure traffic is smooth and minimise chances of collisions with other cars in nearby lanes. Similarly, it is a critical task for an autonomous vehicle to perform. It turns out that recognising lane markings on roads is possible using well known computer vision techniques. We will cover how to use various techniques to identify and draw the inside of a lane, compute lane curvature, and even estimate the vehicle's position relative to the center of the lane. The first step we will take is to find the calibration matrix, along with distortion coefficients for the camera that was used to take pictures of the road.
On formalizing fairness in prediction with machine learning
Machine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives. Various fairness formalizations, with no firm consensus yet, are employed to prevent such algorithms from systematically discriminating against people based on certain attributes protected by law. The aim of this article is to survey how fairness is formalized in the machine learning literature for the task of prediction and present these formalizations with their corresponding notions of distributive justice from the social sciences literature. We provide theoretical as well as empirical critiques of these notions from the social sciences literature and explain how these critiques limit the suitability of the corresponding fairness formalizations to certain domains. We also suggest two notions of distributive justice which address some of these critiques and discuss avenues for prospective fairness formalizations.