Noothigattu, Ritesh
A Voting-Based System for Ethical Decision Making
Noothigattu, Ritesh, Gaikwad, Snehalkumar 'Neil' S., Awad, Edmond, Dsouza, Sohan, Rahwan, Iyad, Ravikumar, Pradeep, Procaccia, Ariel D.
The problem of ethical decision making, which has long been a grand challenge for AI [23], has recently caught the public imagination. Perhaps its best-known manifestation is a modern variant of the classic trolley problem [10]: An autonomous vehicle has a brake failure, leading to an accident with inevitably tragic consequences; due to the vehicle's superior perception and computation capabilities, it can make an informed decision. Should it stay its course and hit a wall, killing its three passengers, one of whom is a young girl? Or swerve and kill a male athlete and his dog, who are crossing the street on a red light? A notable paper by Bonnefon et al. [2] has shed some light on how people address such questions, and even former US President Barack Obama has weighed in.
Envy-Free Classification
Balcan, Maria-Florina, Dick, Travis, Noothigattu, Ritesh, Procaccia, Ariel D.
In classic fair division problems such as cake cutting and rent division, envy-freeness requires that each individual (weakly) prefer his allocation to anyone else's. On a conceptual level, we argue that envy-freeness also provides a compelling notion of fairness for classification tasks. Our technical focus is the generalizability of envy-free classification, i.e., understanding whether a classifier that is envy free on a sample would be almost envy free with respect to the underlying distribution with high probability. Our main result establishes that a small sample is sufficient to achieve such guarantees, when the classifier in question is a mixture of deterministic classifiers that belong to a family of low Natarajan dimension.
Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration
Noothigattu, Ritesh, Bouneffouf, Djallel, Mattei, Nicholas, Chandra, Rachita, Madan, Piyush, Varshney, Kush, Campbell, Murray, Singh, Moninder, Rossi, Francesca
Autonomous cyber-physical agents and systems play an increasingly large role in our lives. To ensure that agents behave in ways aligned with the values of the societies in which they operate, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. These constraints and norms can come from any number of sources including regulations, business process guidelines, laws, ethical principles, social norms, and moral values. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations of the task, and reinforcement learning to learn to maximize the environment rewards. More precisely, we assume that an agent can observe traces of behavior of members of the society but has no access to the explicit set of constraints that give rise to the observed behavior. Inverse reinforcement learning is used to learn such constraints, that are then combined with a possibly orthogonal value function through the use of a contextual bandit-based orchestrator that picks a contextually-appropriate choice between the two policies (constraint-based and environment reward-based) when taking actions. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using a Pac-Man domain and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.
Choosing How to Choose Papers
Noothigattu, Ritesh, Shah, Nihar B., Procaccia, Ariel D.
It is common to see a handful of reviewers reject a highly novel paper, because they view, say, extensive experiments as far more important than novelty, whereas the community as a whole would have embraced the paper. More generally, the disparate mapping of criteria scores to final recommendations by different reviewers is a major source of inconsistency in peer review. In this paper we present a framework --- based on $L(p,q)$-norm empirical risk minimization --- for learning the community's aggregate mapping. We draw on computational social choice to identify desirable values of $p$ and $q$; specifically, we characterize $p=q=1$ as the only choice that satisfies three natural axiomatic properties. Finally, we implement and apply our approach to reviews from IJCAI 2017.
Weighted Voting Via No-Regret Learning
Haghtalab, Nika (Carnegie Mellon University) | Noothigattu, Ritesh (Carnegie Mellon University) | Procaccia, Ariel D. (Carnegie Mellon University)
Voting systems typically treat all voters equally. We argue that perhaps they should not: Voters who have supported good choices in the past should be given higher weight than voters who have supported bad ones. To develop a formal framework for desirable weighting schemes, we draw on no-regret learning. Specifically, given a voting rule, we wish to design a weighting scheme such that applying the voting rule, with voters weighted by the scheme, leads to choices that are almost as good as those endorsed by the best voter in hindsight. We derive possibility and impossibility results for the existence of such weighting schemes, depending on whether the voting rule and the weighting scheme are deterministic or randomized, as well as on the social choice axioms satisfied by the voting rule.
A Voting-Based System for Ethical Decision Making
Noothigattu, Ritesh (Carnegie Mellon University) | Gaikwad, Snehalkumar S. (Massachusetts Institute of Technology) | Awad, Edmond (Massachusetts Institute of Technology) | Dsouza, Sohan (Massachusetts Institute of Technology) | Rahwan, Iyad (Massachusetts Institute of Technology) | Ravikumar, Pradeep ( Carnegie Mellon University ) | Procaccia, Ariel D. ( Carnegie Mellon University )
We present a general approach to automating ethical decisions, drawing on machine learning and computational social choice. In a nutshell, we propose to learn a model of societal preferences, and, when faced with a specific ethical dilemma at runtime, efficiently aggregate those preferences to identify a desirable choice. We provide a concrete algorithm that instantiates our approach; some of its crucial steps are informed by a new theory of swap-dominance efficient voting rules. Finally, we implement and evaluate a system for ethical decision making in the autonomous vehicle domain, using preference data collected from 1.3 million people through the Moral Machine website.
OGA-UCT: On-the-Go Abstractions in UCT
Anand, Ankit (Indian Institute of Technology, Delhi) | Noothigattu, Ritesh (Indian Institute of Technology, Delhi) | ., Mausam (Indian Institute of Technology, Delhi) | Singla, Parag (Indian Institute of Technology, Delhi)
Recent work has begun exploring the value of domain abstractions in Monte-Carlo Tree Search (MCTS) algorithms for probabilistic planning. These algorithms automatically aggregate symmetric search nodes (states or state-action pairs) saving valuable planning time. Existing algorithms alternate between two phases: (1) abstraction computation forcomputing node aggregations, and (2) modified MCTS that use aggregate nodes. We believe that these algorithms do not achieve the full potential of abstractions because of disjoint phases – e.g., it can take a while to recover from erroneous abstractions, or compute better abstractions based on newly found knowledge.In response, we propose On-the-Go Abstractions (OGA), a novel approach in which abstraction computation is tightlyintegrated into the MCTS algorithm. We implement these on top of UCT and name the resulting algorithm OGA-UCT.It has several desirable properties, including (1) rapid use of new information in modifying existing abstractions, (2) elimination of the expensive batch abstraction computationphase, and (3) focusing abstraction computation on important part of the sampled search space. We experimentally compare OGA-UCT against ASAP-UCT, a recent state-of-the-art MDP algorithm as well as vanilla UCT algorithm. We find that OGA-UCT is robust across a suite of planning competition and other MDP domains, and obtains up to 28 % quality improvements.