AITopics

2007.00777

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.96)

Tubella, Andrea Aler, Theodorou, Andreas, Dignum, Virginia, Michael, Loizos

Contestable Black Boxes

The right to contest a decision with consequences on individuals or the society is a well-established democratic right. Despite this right also being explicitly included in GDPR in reference to automated decision-making, its study seems to have received much less attention in the AI literature compared, for example, to the right for explanation. This paper investigates the type of assurances that are needed in the contesting process when algorithmic black boxes are involved, opening new questions about the interplay of contestability and explainability. We argue that specialised complementary methodologies to evaluate automated decision-making in the case of a particular decision being contested need to be developed. Further, we propose a combination of well-established software engineering and rule-based approaches as a possible socio-technical solution to the issue of contestability, one of the new democratic challenges posed by the automation of decision making.

artificial intelligence, compliance contract, requirement, (16 more...)

doi: 10.1007/978-3-030-57977-7_12

2006.05133

Country:

Europe > Middle East > Cyprus (0.05)
North America > United States > Virginia (0.05)
Europe > Sweden > Västerbotten County > Umeå (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.89)
Transportation > Air (0.87)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.70)

Junges, Sebastian, Jansen, Nils, Seshia, Sanjit A.

Enforcing Almost-Sure Reachability in POMDPs

Partially-Observable Markov Decision Processes (POMDPs) are a well-known formal model for planning scenarios where agents operate under limited information about their environment. In safety-critical domains, the agent must adhere to a policy satisfying certain behavioral constraints. We study the problem of synthesizing policies that almost-surely reach some goal state while a set of bad states is never visited. In particular, we present an iterative symbolic approach that computes a winning region, that is, a set of system configurations such that all policies that stay within this set are guaranteed to satisfy the constraints. The approach generalizes and improves previous work in terms of scalability and efficacy, as demonstrated in the empirical evaluation. Additionally, we show the applicability to safe exploration by restricting agent behavior to these winning regions.

artificial intelligence, machine learning, specification, (18 more...)

2007.00085

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Reinforcement Learning based Control of Imitative Policies for Near-Accident Driving

Cao, Zhangjie, Bıyık, Erdem, Wang, Woodrow Z., Raventos, Allan, Gaidon, Adrien, Rosman, Guy, Sadigh, Dorsa

Autonomous driving has achieved significant progress in recent years, but autonomous cars are still unable to tackle high-risk situations where a potential accident is likely. In such near-accident scenarios, even a minor change in the vehicle's actions may result in drastically different consequences. To avoid unsafe actions in near-accident scenarios, we need to fully explore the environment. However, reinforcement learning (RL) and imitation learning (IL), two widely-used policy learning methods, cannot model rapid phase transitions and are not scalable to fully cover all the states. To address driving in near-accident scenarios, we propose a hierarchical reinforcement and imitation learning (H-ReIL) approach that consists of low-level policies learned by IL for discrete driving modes, and a high-level policy learned by RL that switches between different driving modes. Our approach exploits the advantages of both IL and RL by integrating them into a unified learning framework. Experimental results and user studies suggest our approach can achieve higher efficiency and safety compared to other methods. Analyses of the policies demonstrate our high-level policy appropriately switches between different low-level policies in near-accident driving situations.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2007.00178

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)

Genre: Research Report > Experimental Study (0.66)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

R2-B2: Recursive Reasoning-Based Bayesian Optimization for No-Regret Learning in Games

Dai, Zhongxiang, Chen, Yizhou, Low, Kian Hsiang, Jaillet, Patrick, Ho, Teck-Hua

This paper presents a recursive reasoning formalism of Bayesian optimization (BO) to model the reasoning process in the interactions between boundedly rational, self-interested agents with unknown, complex, and costly-to-evaluate payoff functions in repeated games, which we call Recursive Reasoning-Based BO (R2-B2). Our R2-B2 algorithm is general in that it does not constrain the relationship among the payoff functions of different agents and can thus be applied to various types of games such as constant-sum, general-sum, and common-payoff games. We prove that by reasoning at level 2 or more and at one level higher than the other agents, our R2-B2 agent can achieve faster asymptotic convergence to no regret than that without utilizing recursive reasoning. We also propose a computationally cheaper variant of R2-B2 called R2-B2-Lite at the expense of a weaker convergence guarantee. The performance and generality of our R2-B2 algorithm are empirically demonstrated using synthetic games, adversarial machine learning, and multi-agent reinforcement learning.

artificial intelligence, deep learning, machine learning, (16 more...)

2006.16679

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.04)
North America > United States > Massachusetts (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Erskine, Jordan, Lehnert, Chris

Developing cooperative policies for multi-stage tasks

arXiv.org Machine LearningJun-30-2020

This paper proposes the Cooperative Soft Actor Critic (CSAC) method of enabling consecutive reinforcement learning agents to cooperatively solve a long time horizon multi-stage task. This method is achieved by modifying the policy of each agent to maximise both the current and next agent's critic. Cooperatively maximising each agent's critic allows each agent to take actions that are beneficial for its task as well as subsequent tasks. Using this method in a multi-room maze domain, the cooperative policies were able to outperform both uncooperative policies as well as a single agent trained across the entire domain. CSAC achieved a success rate of at least 20\% higher than the uncooperative policies, and converged on a solution at least 4 times faster than the single agent.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

2007.00203

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-29-2020

The Evolutionary Dynamics of Independent Learning Agents in Population Games

Hu, Shuyue, Leung, Chin-Wing, Leung, Ho-fung, Soh, Harold

Understanding the evolutionary dynamics of reinforcement learning under multi-agent settings has long remained an open problem. While previous works primarily focus on 2-player games, we consider population games, which model the strategic interactions of a large population comprising small and anonymous agents. This paper presents a formal relation between stochastic processes and the dynamics of independent learning agents who reason based on the reward signals. Using a master equation approach, we provide a novel unified framework for characterising population dynamics via a single partial differential equation (Theorem 1). Through a case study involving Cross learning agents, we illustrate that Theorem 1 allows us to identify qualitatively different evolutionary dynamics, to analyse steady states, and to gain insights into the expected behaviour of a population. In addition, we present extensive experimental results validating that Theorem 1 holds for a variety of learning methods and population games.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2006.16068

Country:

Asia > Singapore (0.14)
Europe > Netherlands (0.04)
Europe > Belgium (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Overview (0.46)
Research Report (0.40)

Industry:

Leisure & Entertainment > Games > Computer Games (0.69)
Transportation (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-29-2020

Human Trust-based Feedback Control: Dynamically varying automation transparency to optimize human-machine interactions

Akash, Kumar, McMahon, Griffon, Reid, Tahira, Jain, Neera

Human trust in automation plays an essential role in interactions between humans and automation. While a lack of trust can lead to a human's disuse of automation, over-trust can result in a human trusting a faulty autonomous system which could have negative consequences for the human. Therefore, human trust should be calibrated to optimize human-machine interactions with respect to context-specific performance objectives. In this article, we present a probabilistic framework to model and calibrate a human's trust and workload dynamics during his/her interaction with an intelligent decision-aid system. This calibration is achieved by varying the automation's transparency--the amount and utility of information provided to the human. The parameterization of the model is conducted using behavioral data collected through human-subject experiments, and three feedback control policies are experimentally validated and compared against a non-adaptive decision-aid system. The results show that human-automation team performance can be optimized when the transparency is dynamically updated based on the proposed control policy. This framework is a first step toward widespread design and implementation of real-time adaptive automation for use in human-machine interactions. Automation has become prevalent in the everyday lives of humans. However, despite significant technological advancements, human supervision and intervention are still necessary in almost all sectors of automation, ranging from manufacturing and transportation to disaster-management and healthcare [1]. Therefore, we expect that the future will be built around human-agent collectives [2] that will require efficient and successful interaction and coordination between humans and machines. It is well established that to achieve this coordination, human trust in automation plays a central role [3]-[5]. For example, the benefits of automation are lost when humans override automation due to a fundamental lack of trust [3], [5], and accidents may occur due to human mistrust in such systems [6]. Therefore, trust should be appropriately calibrated to avoid disuse or misuse of automation [4].

automation, participant, transparency, (16 more...)

doi: 10.1109/MCS.2020.3019151

2006.16353

Country:

North America > United States > Massachusetts > Middlesex County > Lowell (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
(8 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Kakish, Zahi M., Elamvazhuthi, Karthik, Berman, Spring

Using Reinforcement Learning to Herd a Robotic Swarm to a Target Distribution

arXiv.org Artificial IntelligenceJun-29-2020

In this paper, we present a reinforcement learning approach to designing a control policy for a "leader'' agent that herds a swarm of "follower'' agents, via repulsive interactions, as quickly as possible to a target probability distribution over a strongly connected graph. The leader control policy is a function of the swarm distribution, which evolves over time according to a mean-field model in the form of an ordinary difference equation. The dependence of the policy on agent populations at each graph vertex, rather than on individual agent activity, simplifies the observations required by the leader and enables the control strategy to scale with the number of agents. Two Temporal-Difference learning algorithms, SARSA and Q-Learning, are used to generate the leader control policy based on the follower agent distribution and the leader's location on the graph. A simulation environment corresponding to a grid graph with 4 vertices was used to train and validate the control policies for follower agent populations ranging from 10 to 100. Finally, the control policies trained on 100 simulated agents were used to successfully redistribute a physical swarm of 10 small robots to a target distribution among 4 spatial regions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2006.15807

Country: North America > United States > Arizona (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.54)

Seifi, Akram, Ehteram, Mohammad, Singh, Vijay P., Mosavi, Amir

Modeling and Uncertainty Analysis of Groundwater Level Using Six Evolutionary Optimization Algorithms Hybridized with ANFIS, SVM, and ANN

arXiv.org Machine LearningJun-28-2020

In the present study, six meta-heuristic schemes are hybridized with artificial neural network (ANN), adaptive neuro-fuzzy interface system (ANFIS), and support vector machine (SVM), to predict monthly groundwater level (GWL), evaluate uncertainty analysis of predictions and spatial variation analysis. The six schemes, including grasshopper optimization algorithm (GOA), cat swarm optimization (CSO), weed algorithm (WA), genetic algorithm (GA), krill algorithm (KA), and particle swarm optimization (PSO), were used to hybridize for improving the performance of ANN, SVM, and ANFIS models. Groundwater level (GWL) data of Ardebil plain (Iran) for a period of 144 months were selected to evaluate the hybrid models. The pre-processing technique of principal component analysis (PCA) was applied to reduce input combinations from monthly time series up to 12-month prediction intervals. The results showed that the ANFIS-GOA was superior to the other hybrid models for predicting GWL in the first piezometer and third piezometer in the testing stage. The performance of hybrid models with optimization algorithms was far better than that of classical ANN, ANFIS, and SVM models without hybridization. The percent of improvements in the ANFIS-GOA versus standalone ANFIS in piezometer 10 were 14.4%, 3%, 17.8%, and 181% for RMSE, MAE, NSE, and PBIAS in the training stage and 40.7%, 55%, 25%, and 132% in testing stage, respectively. The improvements for piezometer 6 in train step were 15%, 4%, 13%, and 208% and in the test step were 33%, 44.6%, 16.3%, and 173%, respectively, that clearly confirm the superiority of developed hybridization schemes in GWL modeling. Uncertainty analysis showed that ANFIS-GOA and SVM had, respectively, the best and worst performances among other models. In general, GOA enhanced the accuracy of the ANFIS, ANN, and SVM models.

algorithm, artificial intelligence, optimization problem, (18 more...)

arXiv.org Machine Learning

doi: 10.3390/su12104023

2006.16848

Country:

Asia > Middle East > Iran (0.34)
Europe > Germany (0.14)
North America > United States > Texas > Brazos County > College Station (0.14)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Energy > Oil & Gas (0.92)
Automobiles & Trucks (0.92)
Water & Waste Management > Water Management > Water Supplies & Services (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
(2 more...)