Europe
Epsilon–First Policies for Budget–Limited Multi-Armed Bandits
Tran-Thanh, Long (University of Southampton) | Chapman, Archie (University of Southampton) | Cote, Enrique Munoz de (University of Southampton) | Rogers, Alex (University of Southampton) | Jennings, Nicholas R. (University of Southampton)
We introduce the budget–limited multi–armed bandit (MAB), which captures situations where a learner’s actions are costly and constrained by a fixed budget that is incommensurable with the rewards earned from the bandit machine, and then describe a first algorithm for solving it. Since the learner has a budget, the problem’s duration is finite. Consequently an optimal exploitation policy is not to pull the optimal arm repeatedly, but to pull the combination of arms that maximises the agent’s total reward within the budget. As such, the rewards for all arms must be estimated, because any of them may appear in the optimal combination. This difference from existing MABs means that new approaches to maximising the total reward are required. To this end, we propose an epsilon–first algorithm, in which the first epsilon of the budget is used solely to learn the arms’ rewards (exploration), while the remaining 1 − epsilon is used to maximise the received reward based on those estimates (exploitation). We derive bounds on the algorithm’s loss for generic and uniform exploration methods, and compare its performance with traditional MAB algorithms under various distributions of rewards and costs, showing that it outperforms the others by up to 50%.
Symbolic Dynamic Programming for First-order POMDPs
Sanner, Scott (NICTA and ANU) | Kersting, Kristian (Fraunhofer IAIS)
Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (approximately) optimal dynamic programming solutions. Much work in recent years has focused on improving the efficiency of these dynamic programming algorithms by exploiting symmetries and factored or relational representations. In this work, we show that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs. Among the advantages of this approach are the ability to maintain compact value function representations, abstract over the space of potentially optimal actions, and automatically derive compact conditional policy trees that minimally partition relational observation spaces according to distinctions that have an impact on policy values. This is the first lifted relational POMDP solution that can optimally accommodate actions with a potentially infinite relational space of observation outcomes.
Probabilistic Plan Recognition Using Off-the-Shelf Classical Planners
Ramírez, Miguel (Universitat Pompeu Fabra) | Geffner, Hector (ICREA &)
Plan recognition is the problem of inferring the goals and plans of an agent after observing its behavior. Recently, it has been shown that this problem can be solved efficiently, without the need of a plan library, using slightly modified planning algorithms. In this work, we extend this approach to the more general problem of probabilistic plan recognition where a probability distribution over the set of goals is sought under the assumptions that actions have deterministic effects and both agent and observer have complete information about the initial state. We show that this problem can be solved efficiently using classical planners provided that the probability of a partially observed execution given a goal is defined in terms of the cost difference of achieving the goal under two conditions: complying with the observations, and not complying with them. This cost, and hence the posterior goal probabilities, are computed by means of two calls to a classical planner that no longer has to be modified in any way. A number of examples is considered to illustrate the quality, flexibility, and scalability of the approach.
Planning in Dynamic Environments: Extending HTNs with Nonlinear Continuous Effects
Molineaux, Matthew (Knexus Research Corporation) | Klenk, Matthew (Naval Research Laboratory) | Aha, David (Naval Research Laboratory)
Planning in dynamic continuous environments requires reasoning about nonlinear continuous effects, which previous Hierarchical Task Network (HTN) planners do not support. In this paper, we extend an existing HTN planner with a new state projection algorithm. To our knowledge, this is the first HTN planner that can reason about nonlinear continuous effects. We use a wait action to instruct this planner to consider continuous effects in a given state. We also introduce a new planning domain to demonstrate the benefits of planning with nonlinear continuous effects. We compare our approach with a linear continuous effects planner and a discrete effects HTN planner on a benchmark domain, which reveals that its additional costs are largely mitigated by domain knowledge. Finally, we present an initial application of this algorithm in a practical domain, a Navy training simulation, illustrating the utility of this approach for planning in dynamic continuous environments.
SAP Speaks PDDL
Hoffmann, Joerg (INRIA) | Weber, Ingo (University of New South Wales) | Kraft, Frank Michael (SAP)
In several application areas for Planning, in particular helping with the creation of new processes in Business Process Management (BPM), a major obstacle lies in the modeling. Obtaining a suitable model to plan with is often prohibitively complicated and/or costly. Our core observation in this work is that, for software-architectural purposes, SAP is already using a model that is essentially a variant of PDDL. That model describes the behavior of Business Objects, in terms of status variables and how they are affected by system transactions. We show herein that one can leverage the model to obtain (a) a promising BPM planning application which incurs hardly any modeling costs, and (b) an interesting planning benchmark. We design a suitable planning formalism and an adaptation of FF, and we perform large-scale experiments. Our prototype is part of a research extension to the SAP NetWeaver platform.
Using Closed Captions as Supervision for Video Activity Recognition
Gupta, Sonal (Stanford University) | Mooney, Raymond J. (University of Texas at Austin)
Recognizing activities in real-world videos is a difficult problem exacerbated by background clutter, changes in camera angle & zoom, and rapid camera movements. Large corpora of labeled videos can be used to train automated activity recognition systems, but this requires expensive human labor and time. This paper explores how closed captions that naturally accompany many videos can act as weak supervision that allows automatically collecting "labeled" data for activity recognition. We show that such an approach can improve activity retrieval in soccer videos. Our system requires no manual labeling of video clips and needs minimal human supervision. We also present a novel caption classifier that uses additional linguistic information to determine whether a specific comment refers to an ongoing activity. We demonstrate that combining linguistic analysis and automatically trained activity recognizers can significantly improve the precision of video retrieval.
Forest-Based Semantic Role Labeling
Xiong, Hao (Chinese Academy of Sciences) | Mi, Haitao (Chinese Academy of Sciences) | Liu, Yang (Chinese Academy of Sciences) | Liu, Qun (Chinese Academy of Sciences)
Parsing plays an important role in semantic role labeling (SRL) because most SRL systems infer semantic relations from 1-best parses. Therefore, parsing errors inevitably lead to labeling mistakes. To alleviate this problem, we propose to use packed forest, which compactly encodes all parses for a sentence. We design an algorithm to exploit exponentially many parses to learn semantic relations efciently. Experimental results on the CoNLL-2005 shared task show that using forests achieves an absolute improvement of 1.2% in terms of F1 score over using 1-best parses and 0.6% over using 50-best parses.
Kernelized Sorting for Natural Language Processing
Jagaralmudi, Jagadeesh (University of Utah) | Juarez, Seth (University of Utah) | Daume, Hal (University of Utah)
We further develop Object matching, or alignment, is an underlying problem a semi-supervised "bootstrapping" variant of kernelized for many natural language processing tasks, including document sorting that addresses the problem of noise. We compare alignment (Vu, Aw, and Zhang 2009), sentence alignment kernelized sorting with matching canonical correlation (Gale and Church 1991; Rapp 1999) and transliteration analysis (MCCA) (Haghighi et al. 2008) on a wide variety mining (Hermjakob, Knight, and Daumé III 2008; of tasks and data sets and show that these strategies are Udupa et al. 2009). For example, in document alignment, sufficient to turn kernelized sorting from an approach with we have English documents (objects) and French documents highly unpredictable performance into a viable approach for (objects) and our goal is to discover a matching between NLP problems.
A Temporal Proof System for General Game Playing
Thielscher, Michael (The University of New South Wales) | Voigt, Sebastian (Dresden University of Technology)
A general game player is a system that understands the rules of unknown games and learns to play these games well without human intervention. A major challenge for research in General Game Playing is to endow a player with the ability to extract and prove game-specific knowledge from the mere game rules. We define a formal language to express temporally extended — yet local — properties of games. We also develop a provably correct proof theory for this language using the paradigm of Answer Set Programming, and we report on experiments with a practical implementation of this proof system in combination with a successful general game player.
User-Specific Learning for Recognizing a Singer's Intended Pitch
Guillory, Andrew (University of Washington) | Basu, Sumit (Microsoft Research) | Morris, Dan (Microsoft Research)
We consider the problem of automatic vocal melody transcription: translating an audio recording of a sung melody into a musical score. While previous work has focused on finding the closest notes to the singer's tracked pitch, we instead seek to recover the melody the singer intended to sing. Often, the melody a singer intended to sing differs from what they actually sang; our hypothesis is that this occurs in a singer-specific way. For example, a given singer may often be flat in certain parts of her range, or another may have difficulty with certain intervals. We thus pursue methods for singer-specific training which use learning to combine different methods for pitch prediction. In our experiments with human subjects, we show that via a short training procedure we can learn a singer-specific pitch predictor and significantly improve transcription of intended pitch over other methods. For an average user, our method gives a 20 to 30 percent reduction in pitch classification errors with respect to a baseline method which is comparable to commercial voice transcription tools. For some users, we achieve even more dramatic reductions. Our best results come from a combination of singer-specific-learning with non-singer-specific feature selection. We also discuss the implications of our work for training more general control signals. We make our experimental data available to allow others to replicate or extend our results.