AITopics

We consider two questions at the heart of machine learning; how can we predict if a minimum will generalize to the test set, and why does stochastic gradient descent find minima that generalize well? Our work responds to Zhang et al. (2016), who showed deep neural networks can easily memorize randomly labeled training data, despite generalizing well on real labels of the same inputs. We show that the same phenomenon occurs in small linear models. These observations are explained by the Bayesian evidence, which penalizes sharp minima but is invariant to model parameterization. We also demonstrate that, when one holds the learning rate fixed, there is an optimum batch size which maximizes the test set accuracy. We propose that the noise introduced by small mini-batches drives the parameters towards minima whose evidence is large. Interpreting stochastic gradient descent as a stochastic differential equation, we identify the "noise scale" $g = \epsilon (\frac{N}{B} - 1) \approx \epsilon N/B$, where $\epsilon$ is the learning rate, $N$ the training set size and $B$ the batch size. Consequently the optimum batch size is proportional to both the learning rate and the size of the training set, $B_{opt} \propto \epsilon N$. We verify these predictions empirically.

artificial intelligence, batch size, machine learning, (16 more...)

1710.06451

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Zhang, Jian, Mitliagkas, Ioannis

YellowFin and the Art of Momentum Tuning

Hyperparameter tuning is one of the most time-consuming workloads in deep learning. State-of-the-art optimizers, such as AdaGrad, RMSProp and Adam, reduce this labor by adaptively tuning an individual learning rate for each variable. Recently researchers have shown renewed interest in simpler methods like momentum SGD as they may yield better test metrics. Motivated by this trend, we ask: can simple adaptive methods based on SGD perform as well or better? We revisit the momentum SGD algorithm and show that hand-tuning a single learning rate and momentum makes it competitive with Adam. We then analyze its robustness to learning rate misspecification and objective curvature variation. Based on these insights, we design YellowFin, an automatic tuner for momentum and learning rate in SGD. YellowFin optionally uses a negative-feedback loop to compensate for the momentum dynamics in asynchronous settings on the fly. We empirically show that YellowFin can converge in fewer iterations than Adam on ResNets and LSTMs for image recognition, language modeling and constituency parsing, with a speedup of up to 3.28x in synchronous and up to 2.69x in asynchronous settings.

artificial intelligence, machine learning, yellowfin, (16 more...)

1706.03471

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Education (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Fisac, Jaime F., Liu, Chang, Hamrick, Jessica B., Sastry, S. Shankar, Hedrick, J. Karl, Griffiths, Thomas L., Dragan, Anca D.

Generating Plans that Predict Themselves

Collaboration requires coordination, and we coordinate by anticipating our teammates' future actions and adapting to their plan. In some cases, our teammates' actions early on can give us a clear idea of what the remainder of their plan is, i.e. what action sequence we should expect. In others, they might leave us less confident, or even lead us to the wrong conclusion. Our goal is for robot actions to fall in the first category: we want to enable robots to select their actions in such a way that human collaborators can easily use them to correctly anticipate what will follow. While previous work has focused on finding initial plans that convey a set goal, here we focus on finding two portions of a plan such that the initial portion conveys the final one. We introduce $t$-\ACty{}: a measure that quantifies the accuracy and confidence with which human observers can predict the remaining robot plan from the overall task goal and the observed initial $t$ actions in the plan. We contribute a method for generating $t$-predictable plans: we search for a full plan that accomplishes the task, but in which the first $t$ actions make it as easy as possible to infer the remaining ones. The result is often different from the most efficient plan, in which the initial actions might leave a lot of ambiguity as to how the task will be completed. Through an online experiment and an in-person user study with physical robots, we find that our approach outperforms a traditional efficiency-based planner in objective and subjective collaboration metrics.

artificial intelligence, planning & scheduling, robot, (18 more...)

1802.0525

Country: North America > United States > California (0.28)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.50)
Research Report > Experimental Study (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)

Barros, Gabriella A. B., Green, Michael Cerny, Liapis, Antonios, Togelius, Julian

Who Killed Albert Einstein? From Open Data to Murder Mystery Games

This paper presents a framework for generating adventure games from open data. Focusing on the murder mystery type of adventure games, the generator is able to transform open data from Wikipedia articles, OpenStreetMap and images from Wikimedia Commons into WikiMysteries. Every WikiMystery game revolves around the murder of a person with a Wikipedia article and populates the game with suspects who must be arrested by the player if guilty of the murder or absolved if innocent. Starting from only one person as the victim, an extensive generative pipeline finds suspects, their alibis, and paths connecting them from open data, transforms open data into cities, buildings, non-player characters, locks and keys and dialog options. The paper describes in detail each generative step, provides a specific playthrough of one WikiMystery where Albert Einstein is murdered, and evaluates the outcomes of games generated for the 100 most influential people of the 20th century.

einstein, evolutionary algorithm, machine learning, (22 more...)

1802.05219

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Bloch, Isabelle, Lang, Jérôme, Pérez, Ramón Pino, Uzcátegui, Carlos

Morphologic for knowledge dynamics: revision, fusion, abduction

An explanatory relation is a binary relation where the intended meaning of α γ is "γ is a preferred explanation of α". In [37], a set of postulates that should be satisfied by preferred explanatory relations was proposed and discussed. The aim of this section is threefold: first, to propose very natural explanatory relations using morphologic that in some cases are computationally tractable; secondly, to examine the adequacy of logical postulates proposed in [37], and thirdly, the discovery of new logical properties for explanatory reasoning. Morphologic allows us to define the most central part of a formula, according to the fundamental principles of this theory (see e.g.

artificial intelligence, belief revision, logic & formal reasoning, (19 more...)

1802.05142

Country:

Europe (1.00)
North America > United States (0.67)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.68)

Crowd ideation of supervised learning problems

Bagrow, James P.

Crowdsourcing is an important avenue for collecting machine learning data, but crowdsourcing can go beyond simple data collection by employing the creativity and wisdom of crowd workers. Yet crowd participants are unlikely to be experts in statistics or predictive modeling, and it is not clear how well non-experts can contribute creatively to the process of machine learning. Here we study an end-to-end crowdsourcing algorithm where groups of non-expert workers propose supervised learning problems, rank and categorize those problems, and then provide data to train predictive models on those problems. Problem proposal includes and extends feature engineering because workers propose the entire problem, not only the input features but also the target variable. We show that workers without machine learning experience can collectively construct useful datasets and that predictive models can be learned on these datasets. In our experiments, the problems proposed by workers covered a broad range of topics, from politics and current events to problems capturing health behavior, demographics, and more. Workers also favored questions showing positively correlated relationships, which has interesting implications given many supervised learning methods perform as well with strong negative correlations. Proper instructions are crucial for non-experts, so we also conducted a randomized trial to understand how different instructions may influence the types of problems proposed by workers. In general, shifting the focus of machine learning tasks from designing and training individual predictive models to problem proposal allows crowdsourcers to design requirements for problems of interest and then guide workers towards contributing to the most suitable problems.

artificial intelligence, machine learning, social media, (18 more...)

1802.05101

Country: North America > United States > Vermont (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.88)

Industry:

Health & Medicine (0.93)
Education > Focused Education > Special Education (0.61)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Pappalardo, Luca, Cintia, Paolo, Ferragina, Paolo, Pedreschi, Dino, Giannotti, Fosca

PlayeRank: Multi-dimensional and role-aware rating of soccer player performance

The problem of rating the performance of soccer players is attracting the interest of many companies, websites, and the scientific community, thanks to the availability of massive data capturing all the events generated during a game (e.g., tackles, passes, shots, etc.). Existing approaches fail to fully exploit the richness of the available data and lack of a proper validation. In this paper, we design and implement {\sf PlayeRank}, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of soccer players. We validate the framework through an experimental analysis advised by soccer experts, based on a massive dataset of millions of events pertaining four seasons of the five prominent European leagues. Experiments show that {\sf PlayeRank} is robust in agreeing with the experts' evaluation of players, significantly improving the state of the art. We also explore an application of PlayeRank --- i.e. searching players --- by introducing a special form of spatial query on the soccer field. This shows its flexibility and efficiency, which makes it worth to be used in the design of a scalable platform for soccer analytics.

data mining, machine learning, playerank, (20 more...)

1802.04987

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Information Management (0.68)

Directly Estimating the Variance of the {\lambda}-Return Using Temporal-Difference Methods

Sherstan, Craig, Bennett, Brendan, Young, Kenny, Ashley, Dylan R., White, Adam, White, Martha, Sutton, Richard S.

This paper investigates estimating the variance of a temporal-difference learning agent's update target. Most reinforcement learning methods use an estimate of the value function, which captures how good it is for the agent to be in a particular state and is mathematically expressed as the expected sum of discounted future rewards (called the return). These values can be straightforwardly estimated by averaging batches of returns using Monte Carlo methods. However, if we wish to update the agent's value estimates during learning--before terminal outcomes are observed--we must use a different estimation target called the {\lambda}-return, which truncates the return with the agent's own estimate of the value function. Temporal difference learning methods estimate the expected {\lambda}-return for each state, allowing these methods to update online and incrementally, and in most cases achieve better generalization error and faster learning than Monte Carlo methods. Naturally one could attempt to estimate higher-order moments of the {\lambda}-return. This paper is about estimating the variance of the {\lambda}-return. Prior work has shown that given estimates of the variance of the {\lambda}-return, learning systems can be constructed to (1) mitigate risk in action selection, and (2) automatically adapt the parameters of the learning process itself to improve performance. Unfortunately, existing methods for estimating the variance of the {\lambda}-return are complex and not well understood empirically. We contribute a method for estimating the variance of the {\lambda}-return directly using policy evaluation methods from reinforcement learning. Our approach is significantly simpler than prior methods that independently estimate the second moment of the {\lambda}-return. Empirically our new approach behaves at least as well as existing approaches, but is generally more robust.

machine learning, reinforcement learning, variance, (16 more...)

1801.08287

Country:

Asia (0.68)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Serrà, Joan, Surís, Dídac, Miron, Marius, Karatzoglou, Alexandros

Overcoming catastrophic forgetting with hard attention to the task

Catastrophic forgetting occurs when a neural network loses the information learned in a previous task after training on subsequent tasks. This problem remains a hurdle for artificial intelligence systems with sequential learning capabilities. In this paper, we propose a task-based hard attention mechanism that preserves previous tasks' information without affecting the current task's learning. A hard attention mask is learned concurrently to every task, through stochastic gradient descent, and previous masks are exploited to condition such learning. We show that the proposed mechanism is effective for reducing catastrophic forgetting, cutting current rates by 45 to 80%. We also show that it is robust to different hyperparameter choices, and that it offers a number of monitoring capabilities. The approach features the possibility to control both the stability and compactness of the learned knowledge, which we believe makes it also attractive for online learning or network compression applications.

artificial intelligence, machine learning, overcoming catastrophic forgetting, (15 more...)

1801.01423

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Milli, Smitha, Abbeel, Pieter, Mordatch, Igor

Interpretable and Pedagogical Examples

Teachers intentionally pick the most informative examples to show their students. However, if the teacher and student are neural networks, the examples that the teacher network learns to give, although effective at teaching the student, are typically uninterpretable. We show that training the student and teacher iteratively, rather than jointly, can produce interpretable teaching strategies. We evaluate interpretability by (1) measuring the similarity of the teacher's emergent strategies to intuitive strategies in each domain and (2) conducting human experiments to evaluate how effective the teacher's strategies are at teaching humans. We show that the teacher network learns to select or generate interpretable, pedagogical examples to teach rule-based, probabilistic, boolean, and hierarchical concepts.

artificial intelligence, machine learning, student, (15 more...)

1711.00694

Genre: Research Report > Experimental Study (0.68)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)