Goto

Collaborating Authors

 Education


YellowFin and the Art of Momentum Tuning

arXiv.org Artificial Intelligence

Hyperparameter tuning is one of the most time-consuming workloads in deep learning. State-of-the-art optimizers, such as AdaGrad, RMSProp and Adam, reduce this labor by adaptively tuning an individual learning rate for each variable. Recently researchers have shown renewed interest in simpler methods like momentum SGD as they may yield better test metrics. Motivated by this trend, we ask: can simple adaptive methods based on SGD perform as well or better? We revisit the momentum SGD algorithm and show that hand-tuning a single learning rate and momentum makes it competitive with Adam. We then analyze its robustness to learning rate misspecification and objective curvature variation. Based on these insights, we design YellowFin, an automatic tuner for momentum and learning rate in SGD. YellowFin optionally uses a negative-feedback loop to compensate for the momentum dynamics in asynchronous settings on the fly. We empirically show that YellowFin can converge in fewer iterations than Adam on ResNets and LSTMs for image recognition, language modeling and constituency parsing, with a speedup of up to 3.28x in synchronous and up to 2.69x in asynchronous settings.


Crowd ideation of supervised learning problems

arXiv.org Artificial Intelligence

Crowdsourcing is an important avenue for collecting machine learning data, but crowdsourcing can go beyond simple data collection by employing the creativity and wisdom of crowd workers. Yet crowd participants are unlikely to be experts in statistics or predictive modeling, and it is not clear how well non-experts can contribute creatively to the process of machine learning. Here we study an end-to-end crowdsourcing algorithm where groups of non-expert workers propose supervised learning problems, rank and categorize those problems, and then provide data to train predictive models on those problems. Problem proposal includes and extends feature engineering because workers propose the entire problem, not only the input features but also the target variable. We show that workers without machine learning experience can collectively construct useful datasets and that predictive models can be learned on these datasets. In our experiments, the problems proposed by workers covered a broad range of topics, from politics and current events to problems capturing health behavior, demographics, and more. Workers also favored questions showing positively correlated relationships, which has interesting implications given many supervised learning methods perform as well with strong negative correlations. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of problems proposed by workers. In general, shifting the focus of machine learning tasks from designing and training individual predictive models to problem proposal allows crowdsourcers to design requirements for problems of interest and then guide workers towards contributing to the most suitable problems.


Overcoming catastrophic forgetting with hard attention to the task

arXiv.org Artificial Intelligence

Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.


Interpretable and Pedagogical Examples

arXiv.org Artificial Intelligence

Teachers intentionally pick the most informative examples to show their students. However, if the teacher and student are neural networks, the examples that the teacher network learns to give, although effective at teaching the student, are typically uninterpretable. We show that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies. We evaluate interpretability by (1) measuring the similarity of the teacher's emergent strategies to intuitive strategies in each domain and (2) conducting human experiments to evaluate how effective the teacher's strategies are at teaching humans. We show that the teacher network learns to select or generate interpretable, pedagogical examples to teach rule-based, probabilistic, boolean, and hierarchical concepts.


Cluster Analysis and Unsupervised Machine Learning in Python

@machinelearnbot

Cluster analysis is a staple of unsupervised machine learning and data science. It is very useful for data mining and big data because it automatically finds patterns in the data, without the need for labels, unlike supervised machine learning. In a real-world environment, you can imagine that a robot or an artificial intelligence won't always have access to the optimal answer, or maybe there isn't an optimal correct answer. You'd want that robot to be able to explore the world on its own, and learn things just by looking for patterns. Do you ever wonder how we get the data that we use in our supervised machine learning algorithms?


Time Series Analysis: A Primer

@machinelearnbot

What is a Time Series? Many data sets are cross-sectional and represent a single slice of time. However, we also have data collected over many periods - weekly sales data, for instance. This is an example of time series data. Time series analysis is a specialized branch of statistics used extensively in fields such as Econometrics and Operations Research.


Keras LSTM tutorial - How to easily build a powerful deep learning language model - Adventures in Machine Learning

@machinelearnbot

In previous posts, I introduced Keras for building convolutional neural networks and performing word embedding. The next natural step is to talk about implementing recurrent neural networks in Keras. In a previous tutorial of mine, I gave a very comprehensive introduction to recurrent neural networks and long short term memory (LSTM) networks, implemented in TensorFlow. In this tutorial, I'll concentrate on creating LSTM networks in Keras, briefly giving a recap or overview of how LSTMs work. In this Keras LSTM tutorial, we'll implement a sequence-to-sequence text prediction model by utilizing a large text data set called the PTB corpus. All the code in this tutorial can be found on this site's Github repository. Recommended online course: If you are more of a video course learner, I'd recommend this inexpensive Udemy course to learn more about Keras and LSTM networks: Zero to Deep Learning with Python and Keras A LSTM network is a kind of recurrent neural network.


How to spot a machine learning opportunity

#artificialintelligence

When it comes to AI adoption, the adage "the future is already here, it's just not evenly distributed" applies. There's not only a shortage of data scientists, but also a shortage of business stakeholders willing or accustomed to identifying problems capable of being solved with AI. That's the premise of a recent Harvard Business Review article by Kathryn Hume, "How to spot a machine learning opportunity, even if you aren't a data scientist." Hume makes the case that having an intuition for how machine learning algorithms work will be an important business skill in the foreseeable future. Fortunately, this skill can be learned.


Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner

arXiv.org Artificial Intelligence

During their first years of life, infants learn the language(s) of their environment at an amazing speed despite large cross cultural variations in amount and complexity of the available language input. Understanding this simple fact still escapes current cognitive and linguistic theories. Recently, spectacular progress in the engineering science, notably, machine learning and wearable technology, offer the promise of revolutionizing the study of cognitive development. Machine learning offers powerful learning algorithms that can achieve human-like performance on many linguistic tasks. Wearable sensors can capture vast amounts of data, which enable the reconstruction of the sensory experience of infants in their natural environment. The project of 'reverse engineering' language development, i.e., of building an effective system that mimics infant's achievements appears therefore to be within reach. Here, we analyze the conditions under which such a project can contribute to our scientific understanding of early language development. We argue that instead of defining a sub-problem or simplifying the data, computational models should address the full complexity of the learning situation, and take as input the raw sensory signals available to infants. This implies that (1) accessible but privacy-preserving repositories of home data be setup and widely shared, and (2) models be evaluated at different linguistic levels through a benchmark of psycholinguist tests that can be passed by machines and humans alike, (3) linguistically and psychologically plausible learning architectures be scaled up to real data using probabilistic/optimization principles from machine learning. We discuss the feasibility of this approach and present preliminary results.


'Assassin's Creed Origins' virtual tours can actually teach history

Engadget

The Assassin's Creed series is known for its vast and richly detailed historical environments, and well... lots of murder. What you might not realize is just how much work goes into making these virtual windows into the past somewhat realistic. That's something Ubisoft is aiming to highlight with Assassin's Creed Origins' Discovery Tour. You can think of it as a museum-like experience set within the game's meticulous rendition of ancient Egypt. To turn one of the most popular gaming franchises in the world into a truly useful educational tool.