Not enough data to create a plot.
Try a different view from the menu above.
Country
Mean field for Markov Decision Processes: from Discrete to Continuous Optimization
Gast, Nicolas, Gaujal, Bruno, Boudec, Jean-Yves Le
We study the convergence of Markov Decision Processes made of a large number of objects to optimization problems on ordinary differential equations (ODE). We show that the optimal reward of such a Markov Decision Process, satisfying a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov Decision Process. We give bounds on the difference of the rewards, and a constructive algorithm for deriving an approximating solution to the Markov Decision Process from a solution of the HJB equations. We illustrate the method on three examples pertaining respectively to investment strategies, population dynamics control and scheduling in queues are developed. They are used to illustrate and justify the construction of the controlled ODE and to show the gain obtained by solving a continuous HJB equation rather than a large discrete Bellman equation.
PAC-Bayesian Analysis of Martingales and Multiarmed Bandits
Seldin, Yevgeny, Laviolette, François, Shawe-Taylor, John, Peters, Jan, Auer, Peter
We present two alternative ways to apply PAC-Bayesian analysis to sequences of dependent random variables. The first is based on a new lemma that enables to bound expectations of convex functions of certain dependent random variables by expectations of the same functions of independent Bernoulli random variables. This lemma provides an alternative tool to Hoeffding-Azuma inequality to bound concentration of martingale values. Our second approach is based on integration of Hoeffding-Azuma inequality with PAC-Bayesian analysis. We also introduce a way to apply PAC-Bayesian analysis in situation of limited feedback. We combine the new tools to derive PAC-Bayesian generalization and regret bounds for the multiarmed bandit problem. Although our regret bound is not yet as tight as state-of-the-art regret bounds based on other well-established techniques, our results significantly expand the range of potential applications of PAC-Bayesian analysis and introduce a new analysis tool to reinforcement learning and many other fields, where martingales and limited feedback are encountered.
Probabilistic Inference from Arbitrary Uncertainty using Mixtures of Factorized Generalized Gaussians
Garrido, M. C., Lopez-de-Teruel, P. E., Ruiz, A.
This paper presents a general and efficient framework for probabilistic inference and learning from arbitrary uncertain information. It exploits the calculation properties of finite mixture models, conjugate families and factorization. Both the joint probability density of the variables and the likelihood function of the (objective or subjective) observation are approximated by a special mixture model, in such a way that any desired conditional distribution can be directly obtained without numerical integration. We have developed an extended version of the expectation maximization (EM) algorithm to estimate the parameters of mixture models from uncertain training examples (indirect observations). As a consequence, any piece of exact or uncertain information about both input and output values is consistently handled in the inference and learning stages. This ability, extremely useful in certain situations, is not found in most alternative methods. The proposed framework is formally justified from standard probabilistic principles and illustrative examples are provided in the fields of nonparametric pattern classification, nonlinear regression and pattern completion. Finally, experiments on a real application and comparative results over standard databases provide empirical evidence of the utility of the method in a wide range of applications.
Adding Abstractive Reflection to a Tutorial Dialog System
Ward, Arthur (University of Pittsburgh) | Litman, Diane (University of Pittsburgh)
In this work we hypothesize that giving students a reflective reading after spoken dialog tutoring in qualitative physics will improve learning. The reading is designed to help students compare similar aspects of previously tutored problems, and to abstract their commonalities. We also hypothesize that student motivation will affect how well the text is processed, and so influence learning. We find that the beneficial effects of the reflective text significantly interact with motivation, such that moderately motivated students learn significantly more from the reflective text than from a non-reflective control text. More poorly or highly motivated students did not benefit from reflective text. These results demonstrate that implicit reflection can improve learning after dialog tutoring with a qualitative physics tutor. They further demonstrate that this result can be obtained with a reflective/abstractive text without recourse to dialog, and that the effectiveness of the text is sensitive to the motivation level of the student.
How Artefacts Influence the Construction of Communications and Contexts during Collaboration in an Agile Software Development Team
Abdullah, Nik Nailah Binti (Mimos Berhad Company) | Sharp, Helen (The Open University) | Honiden, Shinichi (National Institute of Informatics)
We used a stimulus and response method in cognition to consider agents as situated in their specific (Binti Abdullah et al, 2010) to uncover correlation patterns context as it was realized that people are strongly affected of the physical artefact-communication during specific by, and possibly dependent on their environment contexts of communications. We found preliminary empirical (Susi & Ziemke, 2001). With this shift of focus, new interactive evidence that the physical artefacts influence the theories of cognition have emerged. These interactive communication process in a mutually constraining relationship theories such as situated cognition (Clancey, 1997), with the contexts. In which the context is made up and distributed cognition (Hutchins, 1999), are noted for of the teams' practice that includes how they collaborate, their emphasis on the relationship between cognition, and the physical setting, situations, and participation role.
Active and Interactive Discovery of Goal Selection Knowledge
Powell, Jay (Indiana University) | Molineaux, Matthew (Knexus Research Corporation) | Aha, David William (Naval Research Laboratory)
If given manually-crafted goal selection knowledge, goal reasoning agents can dynamically determine which goals they should achieve in complex environments. These agents should instead learn goal selection knowledge through expert interaction. We describe T-ARTUE, a goal reasoning agent that performs case-based active and interactive learning to discover goal selection knowledge. We also report tests of its performance in a complex environment. We found that, under some conditions, T-ARTUE can quickly learn goal selection knowledge.
Happy Movie: A Group Recommender Application in Facebook
Quijano-Sánchez, Lara (Universidad Complutense de Madrid) | Recio-Garcia, Juan A. (Universidad Complutense de Madrid) | Díaz-Agudo, Belén (Universidad Complutense de Madrid) | Jimenez-Diaz, Guillermo (Universidad Complutense de Madrid)
In this paper we introduce our recommender Happy Movie, a Facebook application for movie recommendation to groups. This system exploits information about the social relationships and behaviour of the users to provide better recommendations. Our previous works have shown that social factors improve the recommendation results. However it required many questionnaires to be filled for obtaining the social information, so we have moved to a social network environment where this information is easily available.
What Determines Difficulty of Transport Puzzles?
Jarušek, Petr (Masaryk University Brno) | Pelánek, Radek (Masaryk University Brno)
What determines difficulty of solving a problem? Although this question has been studied before, we found examples which show large differences in problem difficulty which are not explained by concepts identified in previous research. This differences are caused mainly by the structure of a problems' state spaces and cannot be easily captured by static metrics like size of the state space or the length of a solution. To address these unexplained differences, we propose a computational model of human problem solving behaviour. We provide evaluation of the model over large scale dataset (hundreds of hours of problem solving, more than 100 problem instances) for three transport puzzles (Sokoban, Rush hour, and Replacement puzzle).
Scheduling an Aircraft Repair Shop
Bajestani, Maliheh Aramon (University of Toronto) | Beck, J. Christopher (University of Toronto)
We address a scheduling problem in the context of military aircraft maintenance where the goal is to meet the aircraft requirements for a number of missions in the presence of breakdowns. The assignment of aircraft to a mission must consider the requirements for the mission, the probability of aircraft failure, and capacity of the repair shop that maintains the aircraft. Therefore, a solution both assigns aircraft to missions and schedules the repair shop to meet the assignments. We propose a dispatching heuristic algorithm; three complete approaches based on mixed integer programming, constraint programming, and logic-based Benders decomposition; and a hybrid heuristic-complete approach. Experiments demonstrate that the logic-based Benders variation combining mixed integer programming and constraint programming outperforms the other approaches, that the dispatching heuristic can feasibly schedule the repair shop in a very short time, and that using the dispatching solution as a bound marginally improves the complete approaches.
Planning and Acting in Incomplete Domains
Weber, Christopher (Utah State University) | Bryce, Daniel (Utah State University)
Engineering complete planning domain descriptions is often very costly because of human error or lack of domain knowl- edge. Learning complete domain descriptions is also very challenging because many features are irrelevant to achieving the goals and data may be scarce. We present a planner and agent that respectively plan and act in incomplete domains by i) synthesizing plans to avoid execution failure due to ignorance of the domain model, and ii) passively learning about the domain model during execution to improve later re-planning attempts. Our planner DeFault is the first to reason about a domain’s incompleteness to avoid potential plan failure. DeFault computes failure explanations for each action and state in the plan and counts the number of interpretations of the incomplete domain where failure will occur. We show that DeFault performs best by counting prime implicants (failure diagnoses) rather than propositional models. Our agent Goalie learns about the preconditions and effects of incompletely-specified actions while monitoring its state and, in conjunction with DeFault plan failure explanations, can diagnose past and future action failures. We show that by reasoning about incompleteness (as opposed to ignoring it) Goalie fails and re-plans less and executes fewer actions.