AITopics

2007.09868

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceJul-18-2020

Quick Question: Interrupting Users for Microtasks with Reinforcement Learning

Ho, Bo-Jhang, Balaji, Bharathan, Koseoglu, Mehmet, Sandha, Sandeep, Pei, Siyou, Srivastava, Mani

Human attention is a scarce resource in modern computing. A multitude of microtasks vie for user attention to crowdsource information, perform momentary assessments, personalize services, and execute actions with a single touch. A lot gets done when these tasks take up the invisible free moments of the day. However, an interruption at an inappropriate time degrades productivity and causes annoyance. Prior works have exploited contextual cues and behavioral data to identify interruptibility for microtasks with much success. With Quick Question, we explore use of reinforcement learning (RL) to schedule microtasks while minimizing user annoyance and compare its performance with supervised learning. We model the problem as a Markov decision process and use Advantage Actor Critic algorithm to identify interruptible moments based on context and history of user interactions. In our 5-week, 30-participant study, we compare the proposed RL algorithm against supervised learning methods. While the mean number of responses between both methods is commensurate, RL is more effective at avoiding dismissal of notifications and improves user experience over time.

machine learning, notification, reinforcement learning, (16 more...)

2007.09515

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.93)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

arXiv.org Artificial IntelligenceJul-17-2020

Decentralized Deep Reinforcement Learning for Network Level Traffic Signal Control

Guo, Jin

In this thesis, I propose a family of fully decentralized deep multi-agent reinforcement learning (MARL) algorithms to achieve high, real-time performance in network-level traffic signal control. In this approach, each intersection is modeled as an agent that plays a Markovian Game against the other intersection nodes in a traffic signal network modeled as an undirected graph, to approach the optimal reduction in delay. Following Partially Observable Markov Decision Processes (POMDPs), there are 3 levels of communication schemes between adjacent learning agents: independent deep Q-leaning (IDQL), shared states reinforcement learning (S2RL) and a shared states & rewards version of S2RL--S2R2L. In these 3 variants of decentralized MARL schemes, individual agent trains its local deep Q network (DQN) separately, enhanced by convergence-guaranteed techniques like double DQN, prioritized experience replay, multi-step bootstrapping, etc. To test the performance of the proposed three MARL algorithms, a SUMO-based simulation platform is developed to mimic the traffic evolution of the real world. Fed with random traffic demand between permitted OD pairs, a 4x4 Manhattan-style grid network is set up as the testbed, two different vehicle arrival rates are generated for model training and testing. The experiment results show that S2R2L has a quicker convergence rate and better convergent performance than IDQL and S2RL in the training process. Moreover, three MARL schemes all reveal exceptional generalization abilities. Their testing results surpass the benchmark Max Pressure (MP) algorithm, under the criteria of average vehicle delay, network-level queue length and fuel consumption rate. Notably, S2R2L has the best testing performance of reducing 34.55% traffic delay and dissipating 10.91% queue length compared with MP.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2007.03433

Country:

North America > United States > Texas (0.04)
North America > United States > New Jersey (0.04)
North America > United States > California > Yolo County > Davis (0.04)
(4 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.34)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Palomares, Ivan, Porcel, Carlos, Pizzato, Luiz, Guy, Ido, Herrera-Viedma, Enrique

Reciprocal Recommender Systems: Analysis of State-of-Art Literature, Challenges and Opportunities on Social Recommendation

arXiv.org Artificial IntelligenceJul-17-2020

Many social services including online dating, social media, recruitment and online learning, largely rely on \matching people with the right people". The success of these services and the user experience with them often depends on their ability to match users. Reciprocal Recommender Systems (RRS) arose to facilitate this process by identifying users who are a potential match for each other, based on information provided by them. These systems are inherently more complex than user-item recommendation approaches and unidirectional user recommendation services, since they need to take into account both users' preferences towards each other in the recommendation process. This entails not only predicting accurate preference estimates as classical recommenders do, but also defining adequate fusion processes for aggregating user-to-user preferential information. The latter is a crucial and distinctive, yet barely investigated aspect in RRS research. This paper presents a snapshot analysis of the extant literature to summarize the state-of-the-art RRS research to date, focusing on the fundamental features that differentiate RRSs from other classes of recommender systems. Following this, we discuss the challenges and opportunities for future research on RRSs, with special focus on (i) fusion strategies to account for reciprocity and (ii) emerging application domains related to social recommendation.

artificial intelligence, machine learning, social media, (19 more...)

2007.1612

Country:

South America > Argentina > Patagonia > Río Negro Province > Viedma (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
(9 more...)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (0.46)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (1.00)
Health & Medicine (0.88)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Nair, Yash, Doshi-Velez, Finale

PAC Bounds for Imitation and Model-based Batch Learning of Contextual Markov Decision Processes

arXiv.org Machine LearningJul-17-2020

We consider the problem of batch multi-task reinforcement learning with observed context descriptors, motivated by its application to personalized medical treatment. In particular, we study two general classes of learning algorithms: direct policy learning (DPL), an imitation-learning based approach which learns from expert trajectories, and model-based learning. First, we derive sample complexity bounds for DPL, and then show that model-based learning from expert actions can, even with a finite model class, be impossible. After relaxing the conditions under which the model-based approach is expected to learn by allowing for greater coverage of state-action space, we provide sample complexity bounds for model-based learning with finite model classes, showing that there exist model classes with sample complexity exponential in their statistical complexity. We then derive a sample complexity upper bound for model-based learning based on a measure of concentration of the data distribution. Our results give formal justification for imitation learning over model-based learning in this setting.

hypothesis class, learning, sample complexity, (13 more...)

2006.06352

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.86)
Research Report (0.84)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.82)

Winterer, Leonore, Wimmer, Ralf, Jansen, Nils, Becker, Bernd

Strengthening Deterministic Policies for POMDPs

arXiv.org Artificial IntelligenceJul-16-2020

The synthesis problem for partially observable Markov decision processes (POMDPs) is to compute a policy that satisfies a given specification. Such policies have to take the full execution history of a POMDP into account, rendering the problem undecidable in general. A common approach is to use a limited amount of memory and randomize over potential choices. Yet, this problem is still NP-hard and often computationally intractable in practice. A restricted problem is to use neither history nor randomization, yielding policies that are called stationary and deterministic. Previous approaches to compute such policies employ mixed-integer linear programming (MILP). We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints. It is able to handle an arbitrary number of such specifications. Yet, randomization and memory are often mandatory to achieve satisfactory policies. First, we extend our encoding to deliver a restricted class of randomized policies. Second, based on the results of the original MILP, we employ a preprocessing of the POMDP to encompass memory-based decisions. The advantages of our approach over state-of-the-art POMDP solvers lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications. The latter point allows taking trade-offs between performance and safety aspects of typical POMDP examples into account. We show the effectiveness of our method on a broad range of benchmarks.

artificial intelligence, machine learning, pomdp, (18 more...)

2007.08351

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.14)
Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJul-16-2020

Reinforcement Learning-Enabled Decision-Making Strategies for a Vehicle-Cyber-Physical-System in Connected Environment

Liu, Teng, Tang, Xiaolin, Zhang, Jinwei, Li, Wenbo, Deng, Zejian, Yang, Yalian

As a typical vehicle-cyber-physical-system (V-CPS), connected automated vehicles attracted more and more attention in recent years. This paper focuses on discussing the decision-making (DM) strategy for autonomous vehicles in a connected environment. First, the highway DM problem is formulated, wherein the vehicles can exchange information via wireless networking. Then, two classical reinforcement learning (RL) algorithms, Q-learning and Dyna, are leveraged to derive the DM strategies in a predefined driving scenario. Finally, the control performance of the derived DM policies in safety and efficiency is analyzed. Furthermore, the inherent differences of the RL algorithms are embodied and discussed in DM strategies.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2007.09101

Country:

Asia > China > Chongqing Province > Chongqing (0.06)
North America > Canada > Ontario (0.05)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)
Transportation > Passenger (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.69)

arXiv.org Artificial IntelligenceJul-16-2020

Lifted Inference in 2-Variable Markov Logic Networks with Function and Cardinality Constraints Using Discrete Fourier Transform

Kuzelka, Ondrej

Markov logic networks (MLNs, Richardson and Domingos, 2006) are a statistical relational learning [Getoor and Taskar, 2007] framework for probabilistic modelling of complex relational structures such as social and biological networks, molecules etc. In general, inference in MLNs is intractable. Lifted inference refers to a set of methods developed in the literature which exploit symmetries for making probabilistic inference more tractable, e.g.

constraint, data quality, machine learning, (17 more...)

2006.03432

Country: Europe > Czechia > Prague (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Data Science > Data Quality > Data Transformation (0.88)

arXiv.org Machine LearningJul-15-2020

Information Freshness-Aware Task Offloading in Air-Ground Integrated Edge Computing Systems

Chen, Xianfu, Wu, Celimuge, Chen, Tao, Liu, Zhi, Zhang, Honggang, Bennis, Mehdi, Liu, Hang, Ji, Yusheng

This paper studies the problem of information freshness-aware task offloading in an air-ground integrated multi-access edge computing system, which is deployed by an infrastructure provider (InP). A third-party real-time application service provider provides computing services to the subscribed mobile users (MUs) with the limited communication and computation resources from the InP based on a long-term business agreement. Due to the dynamic characteristics, the interactions among the MUs are modelled by a non-cooperative stochastic game, in which the control policies are coupled and each MU aims to selfishly maximize its own expected long-term payoff. To address the Nash equilibrium solutions, we propose that each MU behaves in accordance with the local system states and conjectures, based on which the stochastic game is transformed into a single-agent Markov decision process. Moreover, we derive a novel online deep reinforcement learning (RL) scheme that adopts two separate double deep Q-networks for each MU to approximate the Q-factor and the post-decision Q-factor. Using the proposed deep RL scheme, each MU in the system is able to make decisions without a priori statistical knowledge of dynamics. Numerical experiments examine the potentials of the proposed scheme in balancing the age of information and the energy consumption.

computation task, mec system, mus, (10 more...)

2007.10129

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
Europe > Finland > Northern Ostrobothnia > Oulu (0.04)
(18 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Networks (0.34)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Hoppe, Sabrina, Toussaint, Marc

Qgraph-bounded Q-learning: Stabilizing Model-Free Off-Policy Deep Reinforcement Learning

arXiv.org Machine LearningJul-15-2020

In state of the art model-free off-policy deep reinforcement learning, a replay memory is used to store past experience and derive all network updates. Even if both state and action spaces are continuous, the replay memory only holds a finite number of transitions. We represent these transitions in a data graph and link its structure to soft divergence. By selecting a subgraph with a favorable structure, we construct a simplified Markov Decision Process for which exact Q-values can be computed efficiently as more data comes in. The subgraph and its associated Q-values can be represented as a QGraph. We show that the Q-value for each transition in the simplified MDP is a lower bound of the Q-value for the same transition in the original continuous Q-learning problem. By using these lower bounds in temporal difference learning, our method QG-DDPG is less prone to soft divergence and exhibits increased sample efficiency while being more robust to hyperparameters. QGraphs also retain information from transitions that have already been overwritten in the replay memory, which can decrease the algorithm's sensitivity to the replay memory capacity.

machine learning, reinforcement learning, transition, (13 more...)

2007.07582

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)