AITopics

2304.0993

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > Austria (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(5 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Software > Programming Languages (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

arXiv.org Artificial IntelligenceApr-19-2023

Toward multi-target self-organizing pursuit in a partially observable Markov game

Sun, Lijun, Chang, Yu-Cheng, Lyu, Chao, Shi, Ye, Shi, Yuhui, Lin, Chin-Teng

The multiple-target self-organizing pursuit (SOP) problem has wide applications and has been considered a challenging self-organization game for distributed systems, in which intelligent agents cooperatively pursue multiple dynamic targets with partial observations. This work proposes a framework for decentralized multi-agent systems to improve the implicit coordination capabilities in search and pursuit. We model a self-organizing system as a partially observable Markov game (POMG) featured by large-scale, decentralization, partial observation, and noncommunication. The proposed distributed algorithm: fuzzy self-organizing cooperative coevolution (FSC2) is then leveraged to resolve the three challenges in multi-target SOP: distributed self-organizing search (SOS), distributed task allocation, and distributed single-target pursuit. FSC2 includes a coordinated multi-agent deep reinforcement learning (MARL) method that enables homogeneous agents to learn natural SOS patterns. Additionally, we propose a fuzzy-based distributed task allocation method, which locally decomposes multi-target SOP into several single-target pursuit problems. The cooperative coevolution principle is employed to coordinate distributed pursuers for each single-target pursuit problem. Therefore, the uncertainties of inherent partial observation and distributed decision-making in the POMG can be alleviated. The experimental results demonstrate that by decomposing the SOP task, FSC2 achieves superior performance compared with other implicit coordination policies fully trained by general MARL algorithms. The scalability of FSC2 is proved that up to 2048 FSC2 agents perform efficient multi-target SOP with almost 100 percent capture rates. Empirical analyses and ablation studies verify the interpretability, rationality, and effectiveness of component algorithms in FSC2.

multi-target self-organizing pursuit, observable markov game

doi: 10.1016/j.ins.2023.119475

2206.1233

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)

Wang, Yizheng, Zechner, Markus, Wen, Gege, Corso, Anthony Louis, Mern, John Michael, Kochenderfer, Mykel J., Caers, Jef Karel

Optimizing Carbon Storage Operations for Long-Term Safety

To combat global warming and mitigate the risks associated with climate change, carbon capture and storage (CCS) has emerged as a crucial technology. However, safely sequestering CO2 in geological formations for long-term storage presents several challenges. In this study, we address these issues by modeling the decision-making process for carbon storage operations as a partially observable Markov decision process (POMDP). We solve the POMDP using belief state planning to optimize injector and monitoring well locations, with the goal of maximizing stored CO2 while maintaining safety. Empirical results in simulation demonstrate that our approach is effective in ensuring safe long-term carbon storage operations. We showcase the flexibility of our approach by introducing three different monitoring strategies and examining their impact on decision quality. Additionally, we introduce a neural network surrogate model for the POMDP decision-making process to handle the complex dynamics of the multi-phase flow. We also investigate the effects of different fidelity levels of the surrogate model on decision qualities.

artificial intelligence, co 2, machine learning, (16 more...)

2304.09352

Country: North America > United States > California > Santa Clara County (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Energy > Renewable (1.00)
Energy > Oil & Gas > Upstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Ezugwu, Absalom E., Oyelade, Olaide N., Ikotun, Abiodun M., Agushaka, Jeffery O., Ho, Yuh-Shan

Machine Learning Research Trends in Africa: A 30 Years Overview with Bibliometric Analysis Review

The machine learning (ML) paradigm has gained much popularity today. Its algorithmic models are employed in every field, such as natural language processing, pattern recognition, object detection, image recognition, earth observation and many other research areas. In fact, machine learning technologies and their inevitable impact suffice in many technological transformation agendas currently being propagated by many nations, for which the already yielded benefits are outstanding. From a regional perspective, several studies have shown that machine learning technology can help address some of Africa's most pervasive problems, such as poverty alleviation, improving education, delivering quality healthcare services, and addressing sustainability challenges like food security and climate change. In this state-of-the-art paper, a critical bibliometric analysis study is conducted, coupled with an extensive literature survey on recent developments and associated applications in machine learning research with a perspective on Africa. The presented bibliometric analysis study consists of 2761 machine learning-related documents, of which 89% were articles with at least 482 citations published in 903 journals during the past three decades. Furthermore, the collated documents were retrieved from the Science Citation Index EXPANDED, comprising research publications from 54 African countries between 1993 and 2021. The bibliometric study shows the visualization of the current landscape and future trends in machine learning research and its application to facilitate future collaborative research and knowledge exchange among authors from different research institutions scattered across the African continent.

artificial intelligence, machine learning, pattern recognition, (18 more...)

2304.07542

Country:

Europe > Germany (0.14)
Africa > Sudan (0.14)
Africa > East Africa (0.14)
(83 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Law Enforcement & Public Safety (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(16 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
(3 more...)

Provably Efficient Offline Reinforcement Learning with Trajectory-Wise Reward

Xu, Tengyu, Wang, Yue, Zou, Shaofeng, Liang, Yingbin

The remarkable success of reinforcement learning (RL) heavily relies on observing the reward of every visited state-action pair. In many real world applications, however, an agent can observe only a score that represents the quality of the whole trajectory, which is referred to as the {\em trajectory-wise reward}. In such a situation, it is difficult for standard RL methods to well utilize trajectory-wise reward, and large bias and variance errors can be incurred in policy evaluation. In this work, we propose a novel offline RL algorithm, called Pessimistic vAlue iteRaTion with rEward Decomposition (PARTED), which decomposes the trajectory return into per-step proxy rewards via least-squares-based reward redistribution, and then performs pessimistic value iteration based on the learned proxy reward. To ensure the value functions constructed by PARTED are always pessimistic with respect to the optimal ones, we design a new penalty term to offset the uncertainty of the proxy reward. For general episodic MDPs with large state space, we show that PARTED with overparameterized neural network function approximation achieves an $\tilde{\mathcal{O}}(D_{\text{eff}}H^2/\sqrt{N})$ suboptimality, where $H$ is the length of episode, $N$ is the total number of samples, and $D_{\text{eff}}$ is the effective dimension of the neural tangent kernel matrix. To further illustrate the result, we show that PARTED achieves an $\tilde{\mathcal{O}}(dH^3/\sqrt{N})$ suboptimality with linear MDPs, where $d$ is the feature dimension, which matches with that with neural network function approximation, when $D_{\text{eff}}=dH$. To the best of our knowledge, PARTED is the first offline RL algorithm that is provably efficient in general MDP with trajectory-wise reward.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

2206.06426

Country:

North America > Canada > Alberta (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Andres, Alain, Schäfer, Lukas, Villar-Rodriguez, Esther, Albrecht, Stefano V., Del Ser, Javier

Using Offline Data to Speed-up Reinforcement Learning in Procedurally Generated Environments

One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known sparse reward tasks in the MiniGrid environment, we find that using IL for pre-training and concurrently during online RL training both consistently improve the sample-efficiency while converging to optimal policies. Furthermore, we show that pre-training a policy from as few as two trajectories can make the difference between learning an optimal policy at the end of online training and not learning at all. Our findings motivate the widespread adoption of IL for pre-training and concurrent IL in procedurally generated environments whenever offline trajectories are available or can be generated.

machine learning, reinforcement learning, trajectory, (17 more...)

2304.09825

Country:

Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
Europe > Spain > Basque Country > Biscay Province > Bilbao (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Setting > Online (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Mezghani, Lina, Bojanowski, Piotr, Alahari, Karteek, Sukhbaatar, Sainbayar

Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions

The success of transformer models trained with a language modeling objective brings a promising opportunity to the reinforcement learning framework. Decision Transformer (Chen et al., 2021) is a step towards this direction, showing how to train transformers with a similar next-step prediction objective on offline data. Another important development in this area is the recent emergence of large-scale datasets collected from the internet, such as the ones composed of tutorial videos with captions where people talk about what they are doing. To take advantage of this language component, we propose a novel method for unifying language reasoning with actions in a single policy. Specifically, we augment a transformer policy with word outputs, so it can generate textual captions interleaved with actions. When tested on the most challenging task in BabyAI, with captions describing next subgoals, our reasoning policy consistently outperforms the caption-free baseline.

large language model, machine learning, trajectory, (18 more...)

2304.11063

Country: Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.05)

Genre:

Instructional Material (1.00)
Research Report (0.84)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions

Khonji, Majid

The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known model for planning in stochastic environments under operating constraints. Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the probability of constraint violation, which is desired in many safety-critical applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest Path (SSP), under dead-ends, where there is a trade-off between the probability-to-goal and cost-to-goal. This work studies the structure of (C)C-MDP, particularly an important variant that involves local transition. In this variant, the state reachability exhibits a certain degree of locality and independence from the remaining states. More precisely, the number of states, at a given time, that share some reachable future states is always constant. (C)C-MDP under local transition is NP-Hard even for a planning horizon of two. In this work, we propose a fully polynomial-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm is among the best approximation algorithm attainable in theory and gives insights into the approximability of constrained MDP and its variants.

algorithm, artificial intelligence, machine learning, (17 more...)

2204.0478

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States (0.04)

Genre: Research Report (0.50)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.46)

Nyberg, Jakob, Johnson, Pontus

Training Automated Defense Strategies Using Graph-based Cyber Attack Simulations

arXiv.org Artificial IntelligenceApr-17-2023

We implemented and evaluated an automated cyber defense agent. The agent takes security alerts as input and uses reinforcement learning to learn a policy for executing predefined defensive measures. The defender policies were trained in an environment intended to simulate a cyber attack. In the simulation, an attacking agent attempts to capture targets in the environment, while the defender attempts to protect them by enabling defenses. The environment was modeled using attack graphs based on the Meta Attack Language language. We assumed that defensive measures have downtime costs, meaning that the defender agent was penalized for using them. We also assumed that the environment was equipped with an imperfect intrusion detection system that occasionally produces erroneous alerts based on the environment state. To evaluate the setup, we trained the defensive agent with different volumes of intrusion detection system noise. We also trained agents with different attacker strategies and graph sizes. In experiments, the defensive agent using policies trained with reinforcement learning outperformed agents using heuristic policies. Experiments also demonstrated that the policies could generalize across different attacker strategies. However, the performance of the learned policies decreased as the attack graphs increased in size.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2304.11084

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceApr-16-2023

Markov Observation Models

Kouritzin, Michael A.

Herein, the Hidden Markov Model is expanded to allow for Markov chain observations. In particular, the observations are assumed to be a Markov chain whose one step transition probabilities depend upon the hidden Markov chain. An Expectation-Maximization analog to the Baum-Welch algorithm is developed for this more general model to estimate the transition probabilities for both the hidden state and for the observations as well as to estimate the probabilities for the initial joint hidden-state-observation distribution. A believe state or filter recursion to track the hidden state then arises from the calculations of this Expectation-Maximization algorithm. A dynamic programming analog to the Viterbi algorithm is also developed to estimate the most likely sequence of hidden states given the sequence of observations.

algorithm, artificial intelligence, machine learning, (15 more...)

2208.06368

Country:

North America > Canada > Alberta (0.14)
North America > Canada > Saskatchewan > Saskatoon (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(4 more...)

Genre:

Research Report (0.50)
Instructional Material (0.46)

Industry: Banking & Finance > Trading (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)