Goto

Collaborating Authors

 wolpert


Learning to Abstract Visuomotor Mappings using Meta-Reinforcement Learning

arXiv.org Artificial Intelligence

We investigated the human capacity to acquire multiple visuomotor mappings for de novo skills. Using a grid navigation paradigm, we tested whether contextual cues implemented as different "grid worlds", allow participants to learn two distinct key-mappings more efficiently. Our results indicate that when contextual information is provided, task performance is significantly better. The same held true for meta-reinforcement learning agents that differed in whether or not they receive contextual information when performing the task. We evaluated their accuracy in predicting human performance in the task and analyzed their internal representations. The results indicate that contextual cues allow the formation of separate representations in space and time when using different visuomotor mappings, whereas the absence of them favors sharing one representation. While both strategies can allow learning of multiple visuomotor mappings, we showed contextual cues provide a computational advantage in terms of how many mappings can be learned.


The Implications of the No-Free-Lunch Theorems for Meta-induction

arXiv.org Artificial Intelligence

The important recent book by G. Schurz appreciates that the no-free-lunch theorems (NFL) have major implications for the problem of (meta) induction. Here I review the NFL theorems, emphasizing that they do not only concern the case where there is a uniform prior -- they prove that there are "as many priors" (loosely speaking) for which any induction algorithm $A$ out-generalizes some induction algorithm $B$ as vice-versa. Importantly though, in addition to the NFL theorems, there are many {free lunch} theorems. In particular, the NFL theorems can only be used to compare the {marginal} expected performance of an induction algorithm $A$ with the marginal expected performance of an induction algorithm $B$. There is a rich set of free lunches which instead concern the statistical correlations among the generalization errors of induction algorithms. As I describe, the meta-induction algorithms that Schurz advocate as a "solution to Hume's problem" are just an example of such a free lunch based on correlations among the generalization errors of induction algorithms. I end by pointing out that the prior that Schurz advocates, which is uniform over bit frequencies rather than bit patterns, is contradicted by thousands of experiments in statistical physics and by the great success of the maximum entropy procedure in inductive inference.


The Importance Of No Free Lunch Theorems In Deep Learning

#artificialintelligence

"The no free lunch theorem calls for prudency when solving ML problems by requiring that you test multiple algorithms and solutions with a clear mind and without prejudice." In a paper titled, 'The Lack of A Priori Distinctions Between Learning Algorithms', that dates back to 1996, David Wolpert explored the following questions: He showed that for any two algorithms, A and B, there are as many scenarios where A will perform worse than B as there are instances where A will outperform B. In short, for all possible problems, average performance of both the algorithms is the same. Although the no free lunch theorem by Wolpert has a more theoretical than practical appeal, there are some implications that should still be taken into account by everyone working with machine learning algorithms. These theorems prove that under a uniform distribution over search problems or learning problems, all algorithms perform equally. Search and learning are key aspects of ML and the NFL theorems have something to deliver here.


What is important about the No Free Lunch theorems?

arXiv.org Machine Learning

The No Free Lunch theorems prove that under a uniform distribution over induction problems (search problems or learning problems), all induction algorithms perform equally. As I discuss in this chapter, the importance of the theorems arises by using them to analyze scenarios involving {non-uniform} distributions, and to compare different algorithms, without any assumption about the distribution over problems at all. In particular, the theorems prove that {anti}-cross-validation (choosing among a set of candidate algorithms based on which has {worst} out-of-sample behavior) performs as well as cross-validation, unless one makes an assumption -- which has never been formalized -- about how the distribution over induction problems, on the one hand, is related to the set of algorithms one is choosing among using (anti-)cross validation, on the other. In addition, they establish strong caveats concerning the significance of the many results in the literature which establish the strength of a particular algorithm without assuming a particular distribution. They also motivate a ``dictionary'' between supervised learning and improve blackbox optimization, which allows one to ``translate'' techniques from supervised learning into the domain of blackbox optimization, thereby strengthening blackbox optimization algorithms. In addition to these topics, I also briefly discuss their implications for philosophy of science.


Some observations concerning Off Training Set (OTS) error

arXiv.org Machine Learning

A new measure of generalisation error called Off Training Set (OTS) er ror was introduced recently in [Wolpert, 1996b, Wolpert, 1996a]. Under quit e weak assumptions it was shown that with respect to OTS error there are no a priori distinctions between learning algorithms, at least if it is assumed that the target functions are uniformly distributed. Thus, as far as OTS error is co ncerned, an algorithm that minimizes error on the training set will do no better tha n random guessing. If OTS error accurately models the concept of generaliz ation then this is a depressing conclusion indeed. However, in this paper it is argued that OTS error does not model wh at is normally meant by generalization error. In particular, it is shown th at the assumptions underlying one of the main "no free lunch" (NFL) theor ems (theorem 2) in [Wolpert, 1996b] imply that the distributions used to genera te training data and testing data have disjoint supports. Thus, training a neu ral network to recognise faces by showing it images of handwrittten character s is the kind of learning problem covered by the NFL theorem.


There is No Free Lunch in Data Science

#artificialintelligence

During your adventures in machine learning, you may have already come across the "No Free Lunch" Theorem. Borrowing its name from the adage "there ain't no such thing as a free lunch," the mathematical folklore theorem describes the phenomena that there is no single algorithm that is best suited for all possible scenarios and data sets. There are, generally speaking, two No Free Lunch (NFL) theorems: one for machine learning and one for search and optimization. These two theorems are related and tend to be bundled into one general axiom (the folklore theorem). Although many different researchers have contributed to the collective publications on the No Free Lunch theorems, the most prevalent name associated with these works is David Wolpert.


Automate Stacking In Python: How to Boost Your Performance While Saving Time

#artificialintelligence

Utilizing stacking (stacked generalizations) is a very hot topic when it comes to pushing your machine learning algorithm to new heights. For instance, most if not all winning Kaggle submissions nowadays make use of some form of stacking or a variation of it. First introduced in the 1992 paper Stacked Generalization by David Wolpert, their main purpose is to reduce the generalization error. According to Wolpert, they can be understood "as a more sophisticated version of cross-validation". While Wolpert himself noted at the time that large parts of stacked generalizations are "black art", it seems that building larger and larger stacked generalizations win over smaller stacked generalizations.


Brains, cancer and computers

AITopics Original Links

The race is on to apply machine learning to biology. The starting gun was fired in 2002 when research company Correlogic stunned the medical world with the announcement of a vastly improved test for detecting ovarian cancer. The new test was simple - a few drops of blood are all that's required - yet reliable. What made it truly remarkable was that the test was discovered by machine. This formed a key theme at this month's International Joint Conference in AI (IJCAI) at Edinburgh.


Can't Tickle Yourself? That's a Good Thing

AITopics Original Links

As a child, my brother would frequently challenge me to a game he called punch-for-punch. He'd let me hit him in the arm if he could hit me back just as hard. It wasn't a long game; dull punches soon became bruising wallops. Being several years younger and many pounds lighter, I'd often concede quickly, fearing the next blow that, despite the game's equally-hard rule, always felt more forceful than the last. The thing is, my brother and I were both playing by the rules--at least, we thought we were.


Motor Simulation via Coupled Internal Models Using Sequential Monte Carlo

AAAI Conferences

We describe a generative Bayesian model for action understanding in which inverse-forward internal model pairs are considered "hypotheses" of plausible action goals that are explored in parallel via an approximate inference mechanism based on sequential Monte Carlo methods. The reenactment of internal model pairs can be considered a form of motor simulation, which supports both perceptual prediction and action understanding at the goal level. However, this procedure is generally considered to be computationally inefficient. We present a model that dynamically reallocates computational resources to more accurate internal models depending on both the available prior information and the prediction error of the inverse-forward models, and which leads to successful action recognition. We present experimental results that test the robustness and efficiency of our model in real-world scenarios.