Sim2real transfer is primarily concerned with transferring policies trained in simulation to potentially noisy real world environments. A common problem associated with sim2real transfer is estimating the real-world environmental parameters to ground the simulated environment to. Although existing methods such as Domain Randomisation (DR) can produce robust policies by sampling from a distribution of parameters during training, there is no established method for identifying the parameters of the corresponding distribution for a given real-world setting. In this work, we propose Uncertainty-aware policy search (UncAPS), where we use Universal Policy Network (UPN) to store simulation-trained task-specific policies across the full range of environmental parameters and then subsequently employ robust Bayesian optimisation to craft robust policies for the given environment by combining relevant UPN policies in a DR like fashion. Such policy-driven grounding is expected to be more efficient as it estimates only task-relevant sets of parameters. Further, we also account for the estimation uncertainties in the search process to produce policies that are robust against both aleatoric and epistemic uncertainties. We empirically evaluate our approach in a range of noisy, continuous control environments, and show its improved performance compared to competing baselines.
In this chapter, we provide a review of conversational agents (CAs), discussing chatbots, intended for casual conversation with a user, as well as task-oriented agents that generally engage in discussions intended to reach one or several specific goals, often (but not always) within a specific domain. We also consider the concept of embodied conversational agents, briefly reviewing aspects such as character animation and speech processing. The many different approaches for representing dialogue in CAs are discussed in some detail, along with methods for evaluating such agents, emphasizing the important topics of accountability and interpretability. A brief historical overview is given, followed by an extensive overview of various applications, especially in the fields of health and education. We end the chapter by discussing benefits and potential risks regarding the societal impact of current and future CA technology.
Bisimulation metrics define a distance measure between states of a Markov decision process (MDP) based on a comparison of reward sequences. Due to this property they provide theoretical guarantees in value function approximation. In this work we first prove that bisimulation metrics can be defined via any $p$-Wasserstein metric for $p\geq 1$. Then we describe an approximate policy iteration (API) procedure that uses $\epsilon$-aggregation with $\pi$-bisimulation and prove performance bounds for continuous state spaces. We bound the difference between $\pi$-bisimulation metrics in terms of the change in the policies themselves. Based on these theoretical results, we design an API($\alpha$) procedure that employs conservative policy updates and enjoys better performance bounds than the naive API approach. In addition, we propose a novel trust region approach which circumvents the requirement to explicitly solve a constrained optimization problem. Finally, we provide experimental evidence of improved stability compared to non-conservative alternatives in simulated continuous control.
An unaddressed challenge in human-AI coordination is to enable AI agents to exploit the semantic relationships between the features of actions and the features of observations. Humans take advantage of these relationships in highly intuitive ways. For instance, in the absence of a shared language, we might point to the object we desire or hold up our fingers to indicate how many objects we want. To address this challenge, we investigate the effect of network architecture on the propensity of learning algorithms to exploit these semantic relationships. Across a procedurally generated coordination task, we find that attention-based architectures that jointly process a featurized representation of observations and actions have a better inductive bias for zero-shot coordination. Through fine-grained evaluation and scenario analysis, we show that the resulting policies are human-interpretable. Moreover, such agents coordinate with people without training on any human data.
Reasoning about uncertainty is vital in many real-life autonomous systems. However, current state-of-the-art planning algorithms cannot either reason about uncertainty explicitly, or do so with a high computational burden. Here, we focus on making informed decisions efficiently, using reward functions that explicitly deal with uncertainty. We formulate an approximation, namely an abstract observation model, that uses an aggregation scheme to alleviate computational costs. We derive bounds on the expected information-theoretic reward function and, as a consequence, on the value function. We then propose a method to refine aggregation to achieve identical action selection with a fraction of the computational time.
How can we plan efficiently in a large and complex environment when the time budget is limited? However, there are three main limitations of this "twophase" Given the original simulator of the environment, paradigm, where a simulator is learned offline and which may be computationally very demanding, we then used as-is for online simulation and planning. First, no propose to learn online an approximate but much planning is possible until the offline learning phase finishes, faster simulator that improves over time. To plan which can take a long time. Second, the separation of learning reliably and efficiently while the approximate simulator and planning raises a question on what data collection policy is learning, we develop a method that adaptively should be used during training to ensure good online prediction decides which simulator to use for every simulation, during planning. We empirically demonstrate that when based on a statistic that measures the accuracy the training data is collected by a uniform random policy, the of the approximate simulator. This allows us to learned influence predictors can perform poorly during online use the approximate simulator to replace the original planning, due to distribution shift. Third, completely replacing simulator for faster simulations when it is accurate the original simulator with the approximate one after enough under the current context, thus trading training implies a risk of poor planning performance in certain off simulation speed and accuracy. Experimental situations, which is hard to detect in advance.
Vogelstein, Joshua T., Verstynen, Timothy, Kording, Konrad P., Isik, Leyla, Krakauer, John W., Etienne-Cummings, Ralph, Ogburn, Elizabeth L., Priebe, Carey E., Burns, Randal, Kutten, Kwame, Knierim, James J., Potash, James B., Hartung, Thomas, Smirnova, Lena, Worley, Paul, Savonenko, Alena, Phillips, Ian, Miller, Michael I., Vidal, Rene, Sulam, Jeremias, Charles, Adam, Cowan, Noah J., Bichuch, Maxim, Venkataraman, Archana, Li, Chen, Thakor, Nitish, Kebschull, Justus M, Albert, Marilyn, Xu, Jinchong, Shuler, Marshall Hussain, Caffo, Brian, Ratnanather, Tilak, Geisa, Ali, Roh, Seung-Eon, Yezerets, Eva, Madhyastha, Meghana, How, Javier J., Tomita, Tyler M., Dey, Jayanta, Ningyuan, null, Huang, null, Shin, Jong M., Kinfu, Kaleab Alemayehu, Chaudhari, Pratik, Baker, Ben, Schapiro, Anna, Jayaraman, Dinesh, Eaton, Eric, Platt, Michael, Ungar, Lyle, Wehbe, Leila, Kepecs, Adam, Christensen, Amy, Osuagwu, Onyema, Brunton, Bing, Mensh, Brett, Muotri, Alysson R., Silva, Gabriel, Puppo, Francesca, Engert, Florian, Hillman, Elizabeth, Brown, Julia, White, Chris, Yang, Weiwei
Research on both natural intelligence (NI) and artificial intelligence (AI) generally assumes that the future resembles the past: intelligent agents or systems (what we call 'intelligence') observe and act on the world, then use this experience to act on future experiences of the same kind. We call this 'retrospective learning'. For example, an intelligence may see a set of pictures of objects, along with their names, and learn to name them. A retrospective learning intelligence would merely be able to name more pictures of the same objects. We argue that this is not what true intelligence is about. In many real world problems, both NIs and AIs will have to learn for an uncertain future. Both must update their internal models to be useful for future tasks, such as naming fundamentally new objects and using these objects effectively in a new context or to achieve previously unencountered goals. This ability to learn for the future we call 'prospective learning'. We articulate four relevant factors that jointly define prospective learning. Continual learning enables intelligences to remember those aspects of the past which it believes will be most useful in the future. Prospective constraints (including biases and priors) facilitate the intelligence finding general solutions that will be applicable to future problems. Curiosity motivates taking actions that inform future decision making, including in previously unmet situations. Causal estimation enables learning the structure of relations that guide choosing actions for specific outcomes, even when the specific action-outcome contingencies have never been observed before. We argue that a paradigm shift from retrospective to prospective learning will enable the communities that study intelligence to unite and overcome existing bottlenecks to more effectively explain, augment, and engineer intelligences.
Opponent modeling is the ability to use prior knowledge and observations in order to predict the behavior of an opponent. This survey presents a comprehensive overview of existing opponent modeling techniques for adversarial domains, many of which must address stochastic, continuous, or concurrent actions, and sparse, partially observable payoff structures. We discuss all the components of opponent modeling systems, including feature extraction, learning algorithms, and strategy abstractions. These discussions lead us to propose a new form of analysis for describing and predicting the evolution of game states over time. We then introduce a new framework that facilitates method comparison, analyze a representative selection of techniques using the proposed framework, and highlight common trends among recently proposed methods. Finally, we list several open problems and discuss future research directions inspired by AI research on opponent modeling and related research in other disciplines.
This is Part II of the two-part comprehensive survey devoted to a computing framework most commonly known under the names Hyperdimensional Computing and Vector Symbolic Architectures (HDC/VSA). Both names refer to a family of computational models that use high-dimensional distributed representations and rely on the algebraic properties of their key operations to incorporate the advantages of structured symbolic representations and vector distributed representations. Holographic Reduced Representations is an influential HDC/VSA model that is well-known in the machine learning domain and often used to refer to the whole family. However, for the sake of consistency, we use HDC/VSA to refer to the area. Part I of this survey covered foundational aspects of the area, such as historical context leading to the development of HDC/VSA, key elements of any HDC/VSA model, known HDC/VSA models, and transforming input data of various types into high-dimensional vectors suitable for HDC/VSA. This second part surveys existing applications, the role of HDC/VSA in cognitive computing and architectures, as well as directions for future work. Most of the applications lie within the machine learning/artificial intelligence domain, however we also cover other applications to provide a thorough picture. The survey is written to be useful for both newcomers and practitioners.
Petropoulos, Fotios, Apiletti, Daniele, Assimakopoulos, Vassilios, Babai, Mohamed Zied, Barrow, Devon K., Taieb, Souhaib Ben, Bergmeir, Christoph, Bessa, Ricardo J., Bijak, Jakub, Boylan, John E., Browell, Jethro, Carnevale, Claudio, Castle, Jennifer L., Cirillo, Pasquale, Clements, Michael P., Cordeiro, Clara, Oliveira, Fernando Luiz Cyrino, De Baets, Shari, Dokumentov, Alexander, Ellison, Joanne, Fiszeder, Piotr, Franses, Philip Hans, Frazier, David T., Gilliland, Michael, Gönül, M. Sinan, Goodwin, Paul, Grossi, Luigi, Grushka-Cockayne, Yael, Guidolin, Mariangela, Guidolin, Massimo, Gunter, Ulrich, Guo, Xiaojia, Guseo, Renato, Harvey, Nigel, Hendry, David F., Hollyman, Ross, Januschowski, Tim, Jeon, Jooyoung, Jose, Victor Richmond R., Kang, Yanfei, Koehler, Anne B., Kolassa, Stephan, Kourentzes, Nikolaos, Leva, Sonia, Li, Feng, Litsiou, Konstantia, Makridakis, Spyros, Martin, Gael M., Martinez, Andrew B., Meeran, Sheik, Modis, Theodore, Nikolopoulos, Konstantinos, Önkal, Dilek, Paccagnini, Alessia, Panagiotelis, Anastasios, Panapakidis, Ioannis, Pavía, Jose M., Pedio, Manuela, Pedregal, Diego J., Pinson, Pierre, Ramos, Patrícia, Rapach, David E., Reade, J. James, Rostami-Tabar, Bahman, Rubaszek, Michał, Sermpinis, Georgios, Shang, Han Lin, Spiliotis, Evangelos, Syntetos, Aris A., Talagala, Priyanga Dilini, Talagala, Thiyanga S., Tashman, Len, Thomakos, Dimitrios, Thorarinsdottir, Thordis, Todini, Ezio, Arenas, Juan Ramón Trapero, Wang, Xiaoqian, Winkler, Robert L., Yusupova, Alisa, Ziel, Florian
Forecasting has always been at the forefront of decision making and planning. The uncertainty that surrounds the future is both exciting and challenging, with individuals and organisations seeking to minimise risks and maximise utilities. The large number of forecasting applications calls for a diverse set of forecasting methods to tackle real-life challenges. This article provides a non-systematic review of the theory and the practice of forecasting. We provide an overview of a wide range of theoretical, state-of-the-art models, methods, principles, and approaches to prepare, produce, organise, and evaluate forecasts. We then demonstrate how such theoretical concepts are applied in a variety of real-life contexts. We do not claim that this review is an exhaustive list of methods and applications. However, we wish that our encyclopedic presentation will offer a point of reference for the rich work that has been undertaken over the last decades, with some key insights for the future of forecasting theory and practice. Given its encyclopedic nature, the intended mode of reading is non-linear. We offer cross-references to allow the readers to navigate through the various topics. We complement the theoretical concepts and applications covered by large lists of free or open-source software implementations and publicly-available databases.