AITopics | Undirected Networks

Collaborating Authors

Undirected Networks

News Overviews Instructional Materials AI-Alerts Classics

Interpreting systems as solving POMDPs: a step towards a formal understanding of agency

arXiv.org Artificial IntelligenceSep-4-2022

Under what circumstances can a system be said to have beliefs and goals, and how do such agency-related features relate to its physical state? Recent work has proposed a notion of interpretation map, a function that maps the state of a system to a probability distribution representing its beliefs about an external world. Such a map is not completely arbitrary, as the beliefs it attributes to the system must evolve over time in a manner that is consistent with Bayes' theorem, and consequently the dynamics of a system constrain its possible interpretations. Here we build on this approach, proposing a notion of interpretation not just in terms of beliefs but in terms of goals and actions. To do this we make use of the existing theory of partially observable Markov processes (POMDPs): we say that a system can be interpreted as a solution to a POMDP if it not only admits an interpretation map describing its beliefs about the hidden state of a POMDP but also takes actions that are optimal according to its belief state. An agent is then a system together with an interpretation of this system as a POMDP solution. Although POMDPs are not the only possible formulation of what it means to have a goal, this nevertheless represents a step towards a more general formal definition of what it means for a system to be an agent.

interpretation, pomdp, stochastic moore machine, (13 more...)

arXiv.org Artificial Intelligence

2209.01619

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Inference and dynamic decision-making for deteriorating systems with probabilistic dependencies through Bayesian networks and deep reinforcement learning

Morato, Pablo G., Andriotis, Charalampos P., Papakonstantinou, Konstantinos G., Rigo, Philippe

arXiv.org Artificial IntelligenceSep-2-2022

In the context of modern environmental and societal concerns, there is an increasing demand for methods able to identify management strategies for civil engineering systems, minimizing structural failure risks while optimally planning inspection and maintenance (I&M) processes. Most available methods simplify the I&M decision problem to the component level due to the computational complexity associated with global optimization methodologies under joint system-level state descriptions. In this paper, we propose an efficient algorithmic framework for inference and decision-making under uncertainty for engineering systems exposed to deteriorating environments, providing optimal management strategies directly at the system level. In our approach, the decision problem is formulated as a factored partially observable Markov decision process, whose dynamics are encoded in Bayesian network conditional structures. The methodology can handle environments under equal or general, unequal deterioration correlations among components, through Gaussian hierarchical structures and dynamic Bayesian networks. In terms of policy optimization, we adopt a deep decentralized multi-agent actor-critic (DDMAC) reinforcement learning approach, in which the policies are approximated by actor neural networks guided by a critic network. By including deterioration dependence in the simulated environment, and by formulating the cost model at the system level, DDMAC policies intrinsically consider the underlying system-effects. This is demonstrated through numerical experiments conducted for both a 9-out-of-10 system and a steel frame under fatigue deterioration. Results demonstrate that DDMAC policies offer substantial benefits when compared to state-of-the-art heuristic approaches. The inherent consideration of system-effects by DDMAC strategies is also interpreted based on the learned policies.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.ress.2023.109144

2209.01092

Country:

Europe (1.00)
North America > United States (0.92)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Oil & Gas (1.00)
Materials > Construction Materials (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Learning Practical Communication Strategies in Cooperative Multi-Agent Reinforcement Learning

Hu, Diyi, Zhang, Chi, Prasanna, Viktor, Krishnamachari, Bhaskar

arXiv.org Artificial IntelligenceSep-2-2022

In Multi-Agent Reinforcement Learning, communication is critical to encourage cooperation among agents. Communication in realistic wireless networks can be highly unreliable due to network conditions varying with agents' mobility, and stochasticity in the transmission process. We propose a framework to learn practical communication strategies by addressing three fundamental questions: (1) When: Agents learn the timing of communication based on not only message importance but also wireless channel conditions. (2) What: Agents augment message contents with wireless network measurements to better select the game and communication actions. (3) How: Agents use a novel neural message encoder to preserve all information from received messages, regardless of the number and order of messages. Simulating standard benchmarks under realistic wireless network settings, we show significant improvements in game performance, convergence speed and communication efficiency compared with state-of-the-art.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2209.01288

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.47)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

plingo: A system for probabilistic reasoning in clingo based on lpmln

Hahn, Susana, Janhunen, Tomi, Kaminski, Roland, Romero, Javier, Rühling, Nicolas, Schaub, Torsten

arXiv.org Artificial IntelligenceSep-2-2022

We present plingo, an extension of the ASP system clingo with various probabilistic reasoning modes. Plingo is centered upon LP^MLN, a probabilistic extension of ASP based on a weight scheme from Markov Logic. This choice is motivated by the fact that the core probabilistic reasoning modes can be mapped onto optimization problems and that LP^MLN may serve as a middle-ground formalism connecting to other probabilistic approaches. As a result, plingo offers three alternative frontends, for LP^MLN, P-log, and ProbLog. The corresponding input languages and reasoning modes are implemented by means of clingo's multi-shot and theory solving capabilities. The core of plingo amounts to a re-implementation of LP^MLN in terms of modern ASP technology, extended by an approximation technique based on a new method for answer set enumeration in the order of optimality. We evaluate plingo's performance empirically by comparing it to other probabilistic systems.

plingo, probability, stable model, (16 more...)

arXiv.org Artificial Intelligence

2206.11515

Country:

North America > United States > Texas (0.04)
Europe > Germany > Brandenburg > Potsdam (0.04)
Europe > Finland > Pirkanmaa > Tampere (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Cooperative Online Learning in Stochastic and Adversarial MDPs

Lancewicki, Tal, Rosenberg, Aviv, Mansour, Yishay

arXiv.org Artificial IntelligenceSep-1-2022

We study cooperative online learning in stochastic and adversarial Markov decision process (MDP). That is, in each episode, $m$ agents interact with an MDP simultaneously and share information in order to minimize their individual regret. We consider environments with two types of randomness: \emph{fresh} -- where each agent's trajectory is sampled i.i.d, and \emph{non-fresh} -- where the realization is shared by all agents (but each agent's trajectory is also affected by its own actions). More precisely, with non-fresh randomness the realization of every cost and transition is fixed at the start of each episode, and agents that take the same action in the same state at the same time observe the same cost and next state. We thoroughly analyze all relevant settings, highlight the challenges and differences between the models, and prove nearly-matching regret lower and upper bounds. To our knowledge, we are the first to consider cooperative reinforcement learning (RL) with either non-fresh randomness or in adversarial MDPs.

agent, cooperative online learning, learning, (13 more...)

arXiv.org Artificial Intelligence

2201.1317

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Maryland > Baltimore (0.04)
(2 more...)

Genre: Research Report (0.81)

Industry: Education > Educational Setting > Online (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

On Almost-Sure Intention Deception Planning that Exploits Imperfect Observers

Fu, Jie

arXiv.org Artificial IntelligenceSep-1-2022

Intention deception involves computing a strategy which deceives the opponent into a wrong belief about the agent's intention or objective. This paper studies a class of probabilistic planning problems with intention deception and investigates how a defender's limited sensing modality can be exploited by an attacker to achieve its attack objective almost surely (with probability one) while hiding its intention. In particular, we model the attack planning in a stochastic system modeled as a Markov decision process (MDP). The attacker is to reach some target states while avoiding unsafe states in the system and knows that his behavior is monitored by a defender with partial observations. Given partial state observations for the defender, we develop qualitative intention deception planning algorithms that construct attack strategies to play against an action-visible defender and an action-invisible defender, respectively. The synthesized attack strategy not only ensures the attack objective is satisfied almost surely but also deceives the defender into believing that the observed behavior is generated by a normal/legitimate user and thus failing to detect the presence of an attack. We show the proposed algorithms are correct and complete and illustrate the deceptive planning methods with examples.

attacker, defender, objective, (15 more...)

arXiv.org Artificial Intelligence

2209.00573

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Greece (0.04)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Monocular Camera-based Complex Obstacle Avoidance via Efficient Deep Reinforcement Learning

Ding, Jianchuan, Gao, Lingping, Liu, Wenxi, Piao, Haiyin, Pan, Jia, Du, Zhenjun, Yang, Xin, Yin, Baocai

arXiv.org Artificial IntelligenceSep-1-2022

Abstract--Deep reinforcement learning has achieved great success in laser-based collision avoidance works because the laser can sense accurate depth information without too much redundant data, which can maintain the robustness of the algorithm when it is migrated from the simulation environment to the real world. However, high-cost laser devices are not only difficult to deploy for a large scale of robots but also demonstrate unsatisfactory robustness towards the complex obstacles, including irregular obstacles, e.g., tables, chairs, and shelves, as well as complex ground and special materials. In this paper, we propose a novel monocular camera-based complex obstacle avoidance framework. Particularly, we innovatively transform the captured RGB images to pseudo-laser measurements for efficient deep reinforcement learning. Compared to the traditional laser measurement captured at a certain height that only contains one-dimensional distance information away from the neighboring obstacles, our proposed pseudo-laser measurement fuses the depth and semantic information of the captured RGB image, which makes our method effective for complex obstacles. We also design a feature extraction guidance module to weight the input pseudo-laser measurement, and the agent has more reasonable attention for the current state, which is conducive to improving the accuracy and efficiency of the obstacle avoidance policy. Besides, we adaptively add the synthesized noise to the laser measurement during the training stage to decrease the simto-real gap and increase the robustness of our model in the real environment. Finally, the experimental results show that our framework achieves state-of-the-art performance in several virtual and real-world scenarios. J. Ding is with the School of Computer Science, Dalian University Figure 1. One-dimensional laser sensors have low robustness to the complex of Technology, Dalian 116024, China, and also with Hebei University of obstacles of certain types.

information, obstacle, pseudo-laser measurement, (17 more...)

arXiv.org Artificial Intelligence

2209.00296

Country:

Asia > China > Liaoning Province > Dalian (0.45)
Asia > China > Hong Kong (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology (0.68)
Education (0.66)
Transportation (0.50)
Leisure & Entertainment > Games > Computer Games (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Partial Counterfactual Identification for Infinite Horizon Partially Observable Markov Decision Process

Sidharta, Aditya Kelvianto

arXiv.org Artificial IntelligenceAug-31-2022

This paper investigates the problem of bounding possible output from a counterfactual query given a set of observational data. While various works of literature have described methodologies to generate efficient algorithms that provide an optimal bound for the counterfactual query, all of them assume a finite-horizon causal diagram. This paper aims to extend the previous work by modifying Q-learning algorithm to provide informative bounds of a causal query given an infinite-horizon causal diagram. Through simulations, our algorithms are proven to perform better compared to existing algorithm.

algorithm, q-learning, q-learning algorithm, (13 more...)

arXiv.org Artificial Intelligence

2209.00137

Country:

North America > United States > California (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search

Wang, Xiao, Chen, Zhe, Jiang, Bo, Tang, Jin, Luo, Bin, Tao, Dacheng

arXiv.org Artificial IntelligenceAug-30-2022

To track the target in a video, current visual trackers usually adopt greedy search for target object localization in each frame, that is, the candidate region with the maximum response score will be selected as the tracking result of each frame. However, we found that this may be not an optimal choice, especially when encountering challenging tracking scenarios such as heavy occlusion and fast motion. To address this issue, we propose to maintain multiple tracking trajectories and apply beam search strategy for visual tracking, so that the trajectory with fewer accumulated errors can be identified. Accordingly, this paper introduces a novel multi-agent reinforcement learning based beam search tracking strategy, termed BeamTracking. It is mainly inspired by the image captioning task, which takes an image as input and generates diverse descriptions using beam search algorithm. Accordingly, we formulate the tracking as a sample selection problem fulfilled by multiple parallel decision-making processes, each of which aims at picking out one sample as their tracking result in each frame. Each maintained trajectory is associated with an agent to perform the decision-making and determine what actions should be taken to update related information. When all the frames are processed, we select the trajectory with the maximum accumulated score as the tracking result. Extensive experiments on seven popular tracking benchmark datasets validated the effectiveness of the proposed algorithm.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TIP.2022.3208437

2205.09676

Country:

Asia > China > Anhui Province > Hefei (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
Europe > United Kingdom (0.04)
(2 more...)

Genre: Overview (0.93)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning

Hu, Pihe, Pan, Ling, Chen, Yu, Fang, Zhixuan, Huang, Longbo

arXiv.org Artificial IntelligenceAug-30-2022

Multi-user delay constrained scheduling is important in many real-world applications including wireless communication, live streaming, and cloud computing. Yet, it poses a critical challenge since the scheduler needs to make real-time decisions to guarantee the delay and resource constraints simultaneously without prior information of system dynamics, which can be time-varying and hard to estimate. Moreover, many practical scenarios suffer from partial observability issues, e.g., due to sensing noise or hidden correlation. To tackle these challenges, we propose a deep reinforcement learning (DRL) algorithm, named Recurrent Softmax Delayed Deep Double Deterministic Policy Gradient ($\mathtt{RSD4}$), which is a data-driven method based on a Partially Observed Markov Decision Process (POMDP) formulation. $\mathtt{RSD4}$ guarantees resource and delay constraints by Lagrangian dual and delay-sensitive queues, respectively. It also efficiently tackles partial observability with a memory mechanism enabled by the recurrent neural network (RNN) and introduces user-level decomposition and node-level merging to ensure scalability. Extensive experiments on simulated/real-world datasets demonstrate that $\mathtt{RSD4}$ is robust to system dynamics and partially observable environments, and achieves superior performances over existing DRL and non-DRL-based methods.

algorithm, rsd4, scheduling, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3492866.3549712

2208.14074

Country:

Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry:

Telecommunications (0.67)
Transportation (0.46)
Leisure & Entertainment (0.46)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback