Agents
Fairness, Welfare, and Equity in Personalized Pricing
We study the interplay of fairness, welfare, and equity considerations Studying the case of personalized pricing is conceptually challenging in personalized pricing based on customer features. Sellers because prices are a shared tool in drastically different are increasingly able to conduct price personalization based on domains: we consider lending/insurance, consumer goods, and public predictive modeling of demand conditional on covariates: setting provision. A crucial distinction is between value-based pricing customized interest rates, targeted discounts of consumer goods, that offers different prices to customers based on their estimated and personalized subsidies of scarce resources with positive externalities willingness to pay, and risk-based pricing which offers different like vaccines and bed nets. These different application areas prices to customers based on their estimated costs, as in lending may lead to different concerns around fairness, welfare, and equity and insurance [34]. While discrimination law is strongest in insurance on different objectives: price burdens on consumers, price envy, and lending, in lending, discrimination concerns often firm revenue, access to a good, equal access, and distributional consequences arise from individual agents providing offers from an actuariallyfair when the good in question further impacts downstream securitized rate sheet [9]. In particular, distributional concerns outcomes of interest. We conduct a comprehensive literature review regarding price optimization reflect overall concern for differentially in order to disentangle these different normative considerations adept/prepared/educated negotiating customers in insurance and propose a taxonomy of different objectives with mathematical and lending, but slight optimism in value-based pricing since lowincome definitions. We focus on observational metrics that do not assume individuals may be more price-sensitive [9]. Hence, the access to an underlying valuation distribution which is either unobserved majority of our analysis will focus on value-based pricing, which due to binary feedback or ill-defined due to overriding lends itself more readily to price optimization.
2020 in Review: 10 Nonfiction Books AI Experts Enjoyed Reading
The long-anticipated revision of Artificial Intelligence: A Modern Approach explores the full breadth and depth of the field of artificial intelligence (AI). The 4th Edition brings readers up to date on the latest technologies, presents concepts in a more unified manner, and offers new or expanded coverage of machine learning, deep learning, transfer learning, multiagent systems, robotics, natural language processing, causality, probabilistic programming, privacy, fairness, and safe AI.
Whom to Test? Active Sampling Strategies for Managing COVID-19
Wang, Yingfei, Yahav, Inbal, Padmanabhan, Balaji
This paper presents methods to choose individuals to test for infection during a pandemic such as COVID-19, characterized by high contagion and presence of asymptomatic carriers. The smart-testing ideas presented here are motivated by active learning and multi-armed bandit techniques in machine learning. Our active sampling method works in conjunction with quarantine policies, can handle different objectives, is dynamic and adaptive in the sense that it continually adapts to changes in real-time data. The bandit algorithm uses contact tracing, location-based sampling and random sampling in order to select specific individuals to test. Using a data-driven agent-based model simulating New York City we show that the algorithm samples individuals to test in a manner that rapidly traces infected individuals. Experiments also suggest that smart-testing can significantly reduce the death rates as compared to current methods such as testing symptomatic individuals with or without contact tracing.
Hierarchical Planning for Resource Allocation in Emergency Response Systems
Pettet, Geoffrey, Mukhopadhyay, Ayan, Kochenderfer, Mykel, Dubey, Abhishek
A classical problem in city-scale cyber-physical systems (CPS) is resource allocation under uncertainty. Spatial-temporal allocation of resources is optimized to allocate electric scooters across urban areas, place charging stations for vehicles, and design efficient on-demand transit. Typically, such problems are modeled as Markov (or semi-Markov) decision processes. While online, offline, and decentralized methodologies have been used to tackle such problems, none of the approaches scale well for large-scale decision problems. We create a general approach to hierarchical planning that leverages structure in city-level CPS problems to tackle resource allocation under uncertainty. We use emergency response as a case study and show how a large resource allocation problem can be split into smaller problems. We then create a principled framework for solving the smaller problems and tackling the interaction between them. Finally, we use real-world data from a major metropolitan area in the United States to validate our approach. Our experiments show that the proposed approach outperforms state-of-the-art approaches used in the field of emergency response.
Distributed Adaptive Control: An ideal Cognitive Architecture candidate for managing a robotic recycling plant
Guerrero-Rosado, Oscar, Verschure, Paul
In the past decade, society has experienced notable growth in a variety of technological areas. However, the Fourth Industrial Revolution has not been embraced yet. Industry 4.0 imposes several challenges which include the necessity of new architectural models to tackle the uncertainty that open environments represent to cyber-physical systems (CPS). Waste Electrical and Electronic Equipment (WEEE) recycling plants stand for one of such open environments. Here, CPSs must work harmoniously in a changing environment, interacting with similar and not so similar CPSs, and adaptively collaborating with human workers. In this paper, we support the Distributed Adaptive Control (DAC) theory as a suitable Cognitive Architecture for managing a recycling plant. Specifically, a recursive implementation of DAC (between both singleagent and large-scale levels) is proposed to meet the expected demands of the European Project HR-Recycler. Additionally, with the aim of having a realistic benchmark for future implementations of the recursive DAC, a micro-recycling plant prototype is presented. Keywords: Cognitive Architecture, Distributed Adaptive Control, Recycling Plant, Navigation, Motor Control, Human-Robot Interaction.
Awareness Logic: A Kripke-based Rendition of the Heifetz-Meier-Schipper Model
Belardinelli, Gaia, Rendsvig, Rasmus K.
Heifetz, Meier and Schipper (HMS) present a lattice model of awareness. The HMS model is syntax-free, which precludes the simple option to rely on formal language to induce lattices, and represents uncertainty and unawareness with one entangled construct, making it difficult to assess the properties of either. Here, we present a model based on a lattice of Kripke models, induced by atom subset inclusion, in which uncertainty and unawareness are separate. We show the models to be equivalent by defining transformations between them which preserve formula satisfaction, and obtain completeness through our and HMS' results.
Cooperative Policy Learning with Pre-trained Heterogeneous Observation Representations
Shi, Wenlei, Wei, Xinran, Zhang, Jia, Ni, Xiaoyuan, Jiang, Arthur, Bian, Jiang, Liu, Tie-Yan
Multi-agent reinforcement learning (MARL) has been increasingly explored to learn the cooperative policy towards maximizing a certain global reward. Many existing studies take advantage of graph neural networks (GNN) in MARL to propagate critical collaborative information over the interaction graph, built upon inter-connected agents. Nevertheless, the vanilla GNN approach yields substantial defects in dealing with complex real-world scenarios since the generic message passing mechanism is ineffective between heterogeneous vertices and, moreover, simple message aggregation functions are incapable of accurately modeling the combinational interactions from multiple neighbors. While adopting complex GNN models with more informative message passing and aggregation mechanisms can obviously benefit heterogeneous vertex representations and cooperative policy learning, it could, on the other hand, increase the training difficulty of MARL and demand more intense and direct reward signals compared to the original global reward. To address these challenges, we propose a new cooperative learning framework with pre-trained heterogeneous observation representations. Particularly, we employ an encoder-decoder based graph attention to learn the intricate interactions and heterogeneous representations that can be more easily leveraged by MARL. Moreover, we design a pre-training with local actor-critic algorithm to ease the difficulty in cooperative policy learning. Extensive experiments over real-world scenarios demonstrate that our new approach can significantly outperform existing MARL baselines as well as operational research solutions that are widely-used in industry.
SPOTTER: Extending Symbolic Planning Operators through Targeted Reinforcement Learning
Sarathy, Vasanth, Kasenberg, Daniel, Goel, Shivam, Sinapov, Jivko, Scheutz, Matthias
Symbolic planning models allow decision-making agents to sequence actions in arbitrary ways to achieve a variety of goals in dynamic domains. However, they are typically handcrafted and tend to require precise formulations that are not robust to human error. Reinforcement learning (RL) approaches do not require such models, and instead learn domain dynamics by exploring the environment and collecting rewards. However, RL approaches tend to require millions of episodes of experience and often learn policies that are not easily transferable to other tasks. In this paper, we address one aspect of the open problem of integrating these approaches: how can decision-making agents resolve discrepancies in their symbolic planning models while attempting to accomplish goals? We propose an integrated framework named SPOTTER that uses RL to augment and support ("spot") a planning agent by discovering new operators needed by the agent to accomplish goals that are initially unreachable for the agent. SPOTTER outperforms pure-RL approaches while also discovering transferable symbolic knowledge and does not require supervision, successful plan traces or any a priori knowledge about the missing planning operator.
Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach
Mazzi, Giulio, Castellini, Alberto, Farinelli, Alessandro
Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders interpretability. In this work, we propose a methodology based on Satisfiability Modulo Theory (SMT) for analyzing POMCP policies by inspecting their traces, namely sequences of belief-action-observation triplets generated by the algorithm. The proposed method explores local properties of policy behavior to identify unexpected decisions. We propose an iterative process of trace analysis consisting of three main steps, i) the definition of a question by means of a parametric logical formula describing (probabilistic) relationships between beliefs and actions, ii) the generation of an answer by computing the parameters of the logical formula that maximize the number of satisfied clauses (solving a MAX-SMT problem), iii) the analysis of the generated logical formula and the related decision boundaries for identifying unexpected decisions made by POMCP with respect to the original question. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the approach can exploit human knowledge on the domain, outperforming state-of-the-art anomaly detection methods in identifying unexpected decisions. An improvement of the Area Under Curve up to 47\% has been achieved in our tests.