AITopics | Weng, Paul

In this framework, the robot learning problem corresponds to an RL problem that aims at obtaining a policy π: S G A such that the expected discounted sum of rewards is maximized for any given goal. When the reward function is sparse, as assumed here, this RL problem is particularly hard to solve. In particular, we consider here reward functions that are described as follows: R ( s,a,s null,g) 1[ d( s null,g) null R] 1 where 1 is the indicator function, d is a distance, and null R 0 is a fixed threshold. To tackle this issue, Andrychowicz et al. [2017] proposed HER, which is based on the following principle: Any trajectory that failed to reach its goal still carries useful information; it has at least reached the states of its trajectory path. Using this natural and powerful idea, memory replay can be augmented with the failed trajectories by changing their goals in hindsight .

artificial intelligence, reinforcement learning, symmetry, (17 more...)

arXiv.org Artificial Intelligence

1910.09959

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Invariant Transform Experience Replay

Lin, Yijiong, Huang, Jiancong, Zimmer, Matthieu, Rojas, Juan, Weng, Paul

arXiv.org Artificial IntelligenceSep-24-2019

Yijiong Lin 1, Jiancong Huang 1, Matthieu Zimmer 2, Juan Rojas 1, Paul Weng 2 Abstract -- Deep reinforcement learning (DRL) is a promising approach for adaptive robot control, but its current application to robotics is currently hindered by high sample requirements. We propose two novel data augmentation techniques for DRL based on invariant transformations of trajectories in order to reuse more efficiently observed interaction. The first one called Kaleidoscope Experience Replay exploits reflectional symmetries, while the second called Goal-augmented Experience Replay takes advantage of lax goal definitions. In the Fetch tasks from OpenAI Gym, our experimental results show a large increase in learning speed. I. INTRODUCTION Deep reinforcement learning (DRL) has demonstrated great promise in recent years [1], [2]. However, despite being shown to be a viable approach in robotics [3], [4], DRL still suffers from low sample efficiency in practice--an acute issue in robot learning. Given how critical this issue is, many diverse propositions have been presented. For brevity, we only recall the most related to our work.

artificial intelligence, reinforcement learning, symmetry, (18 more...)

arXiv.org Artificial Intelligence

1909.10707

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Fairness in Reinforcement Learning

Weng, Paul

arXiv.org Artificial IntelligenceJul-24-2019

Decision support systems (e.g., for ecological conservation) and autonomous systems (e.g., adaptive controllers in smart cities) start to be deployed in real applications. Although their operations often impact many users or stakeholders, no fairness consideration is generally taken into account in their design, which could lead to completely unfair outcomes for some users or stakeholders. To tackle this issue, we advocate for the use of social welfare functions that encode fairness and present this general novel problem in the context of (deep) reinforcement learning, although it could possibly be extended to other machine learning tasks.

artificial intelligence, reinforcement learning, survey article, (15 more...)

arXiv.org Artificial Intelligence

1907.10323

Country:

Asia > China (0.16)
North America > United States (0.15)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Exploiting the sign of the advantage function to learn deterministic policies in continuous domains

Zimmer, Matthieu, Weng, Paul

arXiv.org Machine LearningJun-9-2019

In the context of learning deterministic policies in continuous domains, we revisit an approach, which was first proposed in Continuous Actor Critic Learning Automaton (CACLA) and later extended in Neural Fitted Actor Critic (NFAC). This approach is based on a policy update different from that of deterministic policy gradient (DPG). Previous work has observed its excellent performance empirically, but a theoretical justification is lacking. To fill this gap, we provide a theoretical explanation to motivate this unorthodox policy update by relating it to another update and making explicit the objective function of the latter. We furthermore discuss in depth the properties of these updates to get a deeper understanding of the overall approach. In addition, we extend it and propose a new trust region algorithm, Penalized NFAC (PeNFAC). Finally, we experimentally demonstrate in several classic control problems that it surpasses the state-of-the-art algorithms to learn deterministic policies.

algorithm, artificial intelligence, optimization problem, (17 more...)

arXiv.org Machine Learning

1906.04556

Country:

Europe > Sweden (0.14)
Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Optimizing Quantiles in Preference-Based Markov Decision Processes

Gilbert, Hugo (Pierre and Marie Curie University) | Weng, Paul (Sun Yat-sen University) | Xu, Yan (Carnegie Mellon University)

AAAI ConferencesFeb-14-2017

In the Markov decision process model, policies are usually evaluated by expected cumulative rewards. As this decision criterion is not always suitable, we propose in this paper an algorithm for computing a policy optimal for the quantile criterion. Both finite and infinite horizons are considered. Finally we experimentally evaluate our approach on random MDPs and on a data center control problem.

artificial intelligence, criterion, optimization problem, (18 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

North America > United States (0.28)
Europe (0.28)

Industry: Information Technology > Services (0.34)

Technology:

Information Technology > Decision Support Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.73)

Add feedback

Solving MDPs with Skew Symmetric Bilinear Utility Functions

Gilbert, Hugo (Sorbonne Universités, UPMC University of Paris 06, UMR 7606, LIP6 and CNRS, UMR 7606, LIP6) | Spanjaard, Olivier (Sorbonne Universités, UPMC University of Paris 06, UMR 7606, LIP6 and CNRS, UMR 7606, LIP6) | Viappiani, Paolo (Sorbonne Universités, UPMC University of Paris 06, UMR 7606, LIP6 and CNRS, UMR 7606, LIP6) | Weng, Paul (SYSU-CMU Joint Institute of Engineering, Guangzhou and SYSU-CMU Shunde International Joint Research Institute, Shunde)

AAAI ConferencesJul-15-2015

In this paper we adopt Skew Symmetric Bilinear (SSB) utility functions to compare policies in Markov Decision Processes (MDPs). By considering pairs of alternatives, SSB utility theory generalizes von Neumann and Morgenstern's expected utility (EU) theory to encompass rational decision behaviors that EU cannot accommodate. We provide a game-theoretic analysis of the problem of identifying an SSB-optimal policy in finite horizon MDPs and propose an algorithm based on a double oracle approach for computing an optimal (possibly randomized) policy. Finally, we present and discuss experimental results where SSB-optimal policies are computed for a popular TV contest according to several instantiations of SSB utility functions.

Add feedback

Reports of the AAAI 2014 Conference Workshops

Albrecht, Stefano V. (University of Edinburgh) | Barreto, André M. S. (Brazilian National Laboratory for Scientific Computing) | Braziunas, Darius (Kobo Inc.) | Buckeridge, David L. (McGill University) | Cuayáhuitl, Heriberto (Heriot-Watt University) | Dethlefs, Nina (Heriot-Watt University) | Endres, Markus (University of Augsburg) | Farahmand, Amir-massoud (Carnegie Mellon University) | Fox, Mark (University of Toronto) | Frommberger, Lutz (University of Bremen) | Ganzfried, Sam (Carnegie Mellon University) | Gil, Yolanda (University of Southern California) | Guillet, Sébastien (Université du Québec à Chicoutimi) | Hunter, Lawrence E. (University of Colorado School of Medicine) | Jhala, Arnav (University of California Santa Cruz) | Kersting, Kristian (Technical University of Dortmund) | Konidaris, George (Massachusetts Institute of Technology) | Lecue, Freddy (IBM Research) | McIlraith, Sheila (University of Toronto) | Natarajan, Sriraam (Indiana University) | Noorian, Zeinab (University of Saskatchewan) | Poole, David (University of British Columbia) | Ronfard, Rémi (University of Grenoble) | Saffiotti, Alessandro (Orebro University) | Shaban-Nejad, Arash (McGill University) | Srivastava, Biplav (IBM Research) | Tesauro, Gerald (IBM Research) | Uceda-Sosa, Rosario (IBM Research) | Broeck, Guy Van den (Katholieke Universiteit Leuven) | Otterlo, Martijn van (Radboud University Nijmegen) | Wallace, Byron C. (University of Texas) | Weng, Paul (Pierre and Marie Curie University) | Wiens, Jenna (University of Michigan) | Zhang, Jie (Nanyang Technological University)

AI MagazineMar-22-2015

The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities -- Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.

Health & Medicine, human computer interaction, workshop, (9 more...)

AI Magazine

Industry:

Information Technology (1.00)
Leisure & Entertainment (0.67)
Health & Medicine > Public Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Interfaces > Human Computer Interaction (0.67)
(4 more...)

Add feedback

Reports of the AAAI 2014 Conference Workshops

Albrecht, Stefano V. (University of Edinburgh) | Barreto, André M. S. (Brazilian National Laboratory for Scientific Computing) | Braziunas, Darius (Kobo Inc.) | Buckeridge, David L. (McGill University) | Cuayáhuitl, Heriberto (Heriot-Watt University) | Dethlefs, Nina (Heriot-Watt University) | Endres, Markus (University of Augsburg) | Farahmand, Amir-massoud (Carnegie Mellon University) | Fox, Mark (University of Toronto) | Frommberger, Lutz (University of Bremen) | Ganzfried, Sam (Carnegie Mellon University) | Gil, Yolanda (University of Southern California) | Guillet, Sébastien (Université du Québec à Chicoutimi) | Hunter, Lawrence E. (University of Colorado School of Medicine) | Jhala, Arnav (University of California Santa Cruz) | Kersting, Kristian (Technical University of Dortmund) | Konidaris, George (Massachusetts Institute of Technology) | Lecue, Freddy (IBM Research) | McIlraith, Sheila (University of Toronto) | Natarajan, Sriraam (Indiana University) | Noorian, Zeinab (University of Saskatchewan) | Poole, David (University of British Columbia) | Ronfard, Rémi (University of Grenoble) | Saffiotti, Alessandro (Orebro University) | Shaban-Nejad, Arash (McGill University) | Srivastava, Biplav (IBM Research) | Tesauro, Gerald (IBM Research) | Uceda-Sosa, Rosario (IBM Research) | Broeck, Guy Van den (Katholieke Universiteit Leuven) | Otterlo, Martijn van (Radboud University Nijmegen) | Wallace, Byron C. (University of Texas) | Weng, Paul (Pierre and Marie Curie University) | Wiens, Jenna (University of Michigan) | Zhang, Jie (Nanyang Technological University)

AI MagazineMar-22-2015

The AAAI-14 Workshop program was held Sunday and Monday, July 27–28, 2012, at the Québec City Convention Centre in Québec, Canada. Canada. The AAAI-14 workshop program included fifteen workshops covering a wide range of topics in artificial intelligence. The titles of the workshops were AI and Robotics; Artificial Intelligence Applied to Assistive Technologies and Smart Environments; Cognitive Computing for Augmented Human Intelligence; Computer Poker and Imperfect Information; Discovery Informatics; Incentives and Trust in Electronic Communities; Intelligent Cinematography and Editing; Machine Learning for Interactive Systems: Bridging the Gap between Perception, Action and Communication; Modern Artificial Intelligence for Health Analytics; Multiagent Interaction without Prior Coordination; Multidisciplinary Workshop on Advances in Preference Handling; Semantic Cities — Beyond Open Data to Models, Standards and Reasoning; Sequential Decision Making with Big Data; Statistical Relational AI; and The World Wide Web and Public Health Intelligence. This article presents short summaries of those events.

diabetes, neural network, workshop, (25 more...)

AI Magazine

Country:

North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.24)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.24)
North America > Canada > Quebec > Montreal (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre:

Research Report (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Media > Film (1.00)
Information Technology (1.00)
Government (1.00)
(4 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(4 more...)

Add feedback

Preface

Braziunas, Darius (Kobo Inc.) | Endres, Markus (University of Augsburg) | Venable, K. Brent (Tulane University) | Weng, Paul (Université Pierre et Marie Curi) | Xia, Lirong (Rensselaer Polytechnic Institute)

AAAI ConferencesJul-22-2014

Nearly all areas of artificial intelligence deal with choice situations and can thus benefit from computational methods for handling preferences. Moreover, social choice methods are also of key importance in computational domains such as multiagent systems. This broadened scope of preferences leads to new types of preference models, new problems for applying preference structures, and new kinds of benefits. Preferences are inherently a multi-disciplinary topic, of interest to economists, computer scientists, operations researchers, math- ematicians and more. The workshop on Advances in Preferences Handling promotes this broadened scope of preference handling. The workshop seeks to improve the overall understanding of the benefits of preferences for those tasks. Another important goal is to provide cross-fertilization between different fields.

preface

AAAI Conferences

Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.53)

Add feedback

Axiomatic Foundations for a Class of Generalized Expected Utility: Algebraic Expected Utility

Weng, Paul

arXiv.org Artificial IntelligenceJun-27-2012

Expected Utility: Algebraic Expected Utility In this paper, we provide two axiomatizations of algebraic expected utility, which is a particular generalized expected utility, in a von Neumann-Morgenstern setting, i.e. uncertainty representation is supposed to be given and here to be described by a plausibility measure valued on a semiring, which could be partially ordered. We show that axioms identical to those for expected utility entail that preferences are represented by an algebraic expected utility. This algebraic approach allows many previous propositions (expected utility, binary possibilistic utility,...) to be unified in a same general framework and proves that the obtained utility enjoys the same nice features as expected utility: linearity, dynamic consistency, autoduality of the underlying uncertainty measure, autoduality of the decision criterion and possibility of modeling decision maker's attitude toward uncertainty.

artificial intelligence, game theory, lottery, (17 more...)

arXiv.org Artificial Intelligence

1206.6867

Country: