AITopics

2106.07106

Genre: Research Report > Promising Solution (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.97)

arXiv.org Artificial IntelligenceJun-13-2021

Contingency-Aware Influence Maximization: A Reinforcement Learning Approach

Chen, Haipeng, Qiu, Wei, Ou, Han-Ching, An, Bo, Tambe, Milind

The influence maximization (IM) problem aims at finding a subset of seed nodes in a social network that maximize the spread of influence. In this study, we focus on a sub-class of IM problems, where whether the nodes are willing to be the seeds when being invited is uncertain, called contingency-aware IM. Such contingency aware IM is critical for applications for non-profit organizations in low resource communities (e.g., spreading awareness of disease prevention). Despite the initial success, a major practical obstacle in promoting the solutions to more communities is the tremendous runtime of the greedy algorithms and the lack of high performance computing (HPC) for the non-profits in the field -- whenever there is a new social network, the non-profits usually do not have the HPCs to recalculate the solutions. Motivated by this and inspired by the line of works that use reinforcement learning (RL) to address combinatorial optimization on graphs, we formalize the problem as a Markov Decision Process (MDP), and use RL to learn an IM policy over historically seen networks, and generalize to unseen networks with negligible runtime at test phase. To fully exploit the properties of our targeted problem, we propose two technical innovations that improve the existing methods, including state-abstraction and theoretically grounded reward shaping. Empirical results show that our method achieves influence as high as the state-of-the-art methods for contingency-aware IM, while having negligible runtime at test phase.

graph, marginal contribution, node, (14 more...)

2106.07039

Country: Asia > India (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.87)
Health & Medicine > Therapeutic Area > Immunology (0.50)
Health & Medicine > Public Health (0.48)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Li, Mengmeng, Sutter, Tobias, Kuhn, Daniel

Distributionally Robust Optimization with Markovian Data

arXiv.org Machine LearningJun-12-2021

We study a stochastic program where the probability distribution of the uncertain problem parameters is unknown and only indirectly observed via finitely many correlated samples generated by an unknown Markov chain with $d$ states. We propose a data-driven distributionally robust optimization model to estimate the problem's objective function and optimal solution. By leveraging results from large deviations theory, we derive statistical guarantees on the quality of these estimators. The underlying worst-case expectation problem is nonconvex and involves $\mathcal O(d^2)$ decision variables. Thus, it cannot be solved efficiently for large $d$. By exploiting the structure of this problem, we devise a customized Frank-Wolfe algorithm with convex direction-finding subproblems of size $\mathcal O(d)$. We prove that this algorithm finds a stationary point efficiently under mild conditions. The efficiency of the method is predicated on a dimensionality reduction enabled by a dual reformulation. Numerical experiments indicate that our approach has better computational and statistical properties than the state-of-the-art methods.

ambiguity, distributionally robust optimization, relative entropy, (12 more...)

2106.06741

Country: Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Agarwal, Mridul, Bai, Qinbo, Aggarwal, Vaneet

Markov Decision Processes with Long-Term Average Constraints

arXiv.org Artificial IntelligenceJun-11-2021

We consider the problem of constrained Markov Decision Process (CMDP) where an agent interacts with a unichain Markov Decision Process. At every interaction, the agent obtains a reward. Further, there are $K$ cost functions. The agent aims to maximize the long-term average reward while simultaneously keeping the $K$ long-term average costs lower than a certain threshold. In this paper, we propose CMDP-PSRL, a posterior sampling based algorithm using which the agent can learn optimal policies to interact with the CMDP. Further, for MDP with $S$ states, $A$ actions, and diameter $D$, we prove that following CMDP-PSRL algorithm, the agent can bound the regret of not accumulating rewards from optimal policy by $\Tilde{O}(poly(DSA)\sqrt{T})$. Further, we show that the violations for any of the $K$ constraints is also bounded by $\Tilde{O}(poly(DSA)\sqrt{T})$. To the best of our knowledge, this is the first work which obtains a $\Tilde{O}(\sqrt{T})$ regret bounds for ergodic MDPs with long-term average constraints.

algorithm, equation, optimal policy, (13 more...)

2106.0668

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Amani, Sanae, Thrampoulidis, Christos, Yang, Lin F.

Safe Reinforcement Learning with Linear Function Approximation

arXiv.org Machine LearningJun-11-2021

Safety in reinforcement learning has become increasingly important in recent years. Yet, existing solutions either fail to strictly avoid choosing unsafe actions, which may lead to catastrophic results in safety-critical systems, or fail to provide regret guarantees for settings where safety constraints need to be learned. In this paper, we address both problems by first modeling safety as an unknown linear cost function of states and actions, which must always fall below a certain threshold. We then present algorithms, termed SLUCB-QVI and RSLUCB-QVI, for episodic Markov decision processes (MDPs) with linear function approximation. We show that SLUCB-QVI and RSLUCB-QVI, while with \emph{no safety violation}, achieve a $\tilde{\mathcal{O}}\left(\kappa\sqrt{d^3H^3T}\right)$ regret, nearly matching that of state-of-the-art unsafe algorithms, where $H$ is the duration of each episode, $d$ is the dimension of the feature mapping, $\kappa$ is a constant characterizing the safety constraints, and $T$ is the total number of action plays. We further present numerical simulations that corroborate our theoretical findings.

constraint, safety constraint, slucb-qvi, (13 more...)

2106.06239

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > British Columbia (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Anand, Nishanth, Precup, Doina

Preferential Temporal Difference Learning

arXiv.org Artificial IntelligenceJun-11-2021

Temporal-Difference (TD) learning is a general and very useful tool for estimating the value function of a given policy, which in turn is required to find good policies. Generally speaking, TD learning updates states whenever they are visited. When the agent lands in a state, its value can be used to compute the TD-error, which is then propagated to other states. However, it may be interesting, when computing updates, to take into account other information than whether a state is visited or not. For example, some states might be more important than others (such as states which are frequently seen in a successful trajectory). Or, some states might have unreliable value estimates (for example, due to partial observability or lack of data), making their values less desirable as targets. We propose an approach to re-weighting states used in TD updates, both when they are the input and when they provide the target for the update. We prove that our approach converges with linear function approximation and illustrate its desirable empirical behaviour compared to other TD-style methods.

algorithm, fo state mse, mean error auc, (12 more...)

2106.06508

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts (0.04)
North America > United States > California > San Francisco County > San Francisco (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Kozuno, Tadashi, Ménard, Pierre, Munos, Rémi, Valko, Michal

Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall

arXiv.org Machine LearningJun-11-2021

We study the problem of learning a Nash equilibrium (NE) in an imperfect information game (IIG) through self-play. Precisely, we focus on two-player, zero-sum, episodic, tabular IIG under the perfect-recall assumption where the only feedback is realizations of the game (bandit feedback). In particular, the dynamic of the IIG is not known -- we can only access it by sampling or interacting with a game simulator. For this learning setting, we provide the Implicit Exploration Online Mirror Descent (IXOMD) algorithm. It is a model-free algorithm with a high-probability bound on the convergence rate to the NE of order $1/\sqrt{T}$ where $T$ is the number of played games. Moreover, IXOMD is computationally efficient as it needs to perform the updates only along the sampled trajectory.

algorithm, information, sandholm, (14 more...)

2106.06279

Country:

North America > United States (0.14)
North America > Canada > Alberta (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saxony-Anhalt > Magdeburg (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

arXiv.org Artificial IntelligenceJun-11-2021

Verified Synthesis of Optimal Safety Controllers for Human-Robot Collaboration

Gleirscher, Mario, Calinescu, Radu, Douthwaite, James, Lesage, Benjamin, Paterson, Colin, Aitken, Jonathan, Alexander, Rob, Law, James

We present a tool-supported approach for the synthesis, verification and validation of the control software responsible for the safety of the human-robot interaction in manufacturing processes that use collaborative robots. In human-robot collaboration, software-based safety controllers are used to improve operational safety, e.g., by triggering shutdown mechanisms or emergency stops to avoid accidents. Complex robotic tasks and increasingly close human-robot interaction pose new challenges to controller developers and certification authorities. Key among these challenges is the need to assure the correctness of safety controllers under explicit (and preferably weak) assumptions. Our controller synthesis, verification and validation approach is informed by the process, risk analysis, and relevant safety regulations for the target application. Controllers are selected from a design space of feasible controllers according to a set of optimality criteria, are formally verified against correctness criteria, and are translated into executable code and validated in a digital twin. The resulting controller can detect the occurrence of hazards, move the process into a safe state, and, in certain circumstances, return the process to an operational state from which it can resume its original task. We show the effectiveness of our software engineering approach through a case study involving the development of a safety controller for a manufacturing work cell equipped with a collaborative robot.

controller, operator, safety controller, (15 more...)

doi: 10.1016/j.scico.2022.102809

2106.06604

Country:

Europe > Germany > Bremen > Bremen (0.27)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Singapore (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Robots > Humanoid Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Lee, Eric Hans, Eriksson, David, Perrone, Valerio, Seeger, Matthias

A Nonmyopic Approach to Cost-Constrained Bayesian Optimization

arXiv.org Machine LearningJun-10-2021

Bayesian optimization (BO) is a popular method for optimizing expensive-to-evaluate black-box functions. BO budgets are typically given in iterations, which implicitly assumes each evaluation has the same cost. In fact, in many BO applications, evaluation costs vary significantly in different regions of the search space. In hyperparameter optimization, the time spent on neural network training increases with layer size; in clinical trials, the monetary cost of drug compounds vary; and in optimal control, control actions have differing complexities. Cost-constrained BO measures convergence with alternative cost metrics such as time, money, or energy, for which the sample efficiency of standard BO methods is ill-suited. For cost-constrained BO, cost efficiency is far more important than sample efficiency. In this paper, we formulate cost-constrained BO as a constrained Markov decision process (CMDP), and develop an efficient rollout approximation to the optimal CMDP policy that takes both the cost and future iterations into account. We validate our method on a collection of hyperparameter optimization problems as well as a sensor set selection application.

neural network, optimization problem, upstream oil & gas, (20 more...)

2106.06079

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.48)

Industry: Energy > Oil & Gas > Upstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

arXiv.org Machine LearningJun-10-2021

Temporal and Object Quantification Networks

Mao, Jiayuan, Luo, Zhezheng, Gan, Chuang, Tenenbaum, Joshua B., Wu, Jiajun, Kaelbling, Leslie Pack, Ullman, Tomer D.

We present Temporal and Object Quantification Networks (TOQ-Nets), a new class of neuro-symbolic networks with a structural bias that enables them to learn to recognize complex relational-temporal events. This is done by including reasoning layers that implement finite-domain quantification over objects and time. The structure allows them to generalize directly to input instances with varying numbers of objects in temporal sequences of varying lengths. We evaluate TOQ-Nets on input domains that require recognizing event-types in terms of complex temporal relational patterns. We demonstrate that TOQ-Nets can generalize from small amounts of data to scenarios containing more objects than were present during training and to temporal warpings of input sequences.

reasoning layer, toq-net, trajectory, (16 more...)

2106.05891

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games (0.46)
Leisure & Entertainment > Sports (0.31)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)