AITopics

2407.19236

Country:

Europe > United Kingdom (0.14)
North America > United States > New York (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
(2 more...)

Avalos, Raphael, Bargiacchi, Eugenio, Nowé, Ann, Roijers, Diederik M., Oliehoek, Frans A.

Online Planning in POMDPs with State-Requests

arXiv.org Artificial IntelligenceJul-26-2024

In key real-world problems, full state information is sometimes available but only at a high cost, like activating precise yet energy-intensive sensors or consulting humans, thereby compelling the agent to operate under partial observability. For this scenario, we propose AEMS-SR (Anytime Error Minimization Search with State Requests), a principled online planning algorithm tailored for POMDPs with state requests. By representing the search space as a graph instead of a tree, AEMS-SR avoids the exponential growth of the search space originating from state requests. Theoretical analysis demonstrates AEMS-SR's $\varepsilon$-optimality, ensuring solution quality, while empirical evaluations illustrate its effectiveness compared with AEMS and POMCP, two SOTA online planning algorithms. AEMS-SR enables efficient planning in domains characterized by partial observability and costly state requests offering practical benefits across various applications.

aem-sr, algorithm, pb 0, (15 more...)

2407.18812

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Belgium > Flanders (0.04)

Genre: Research Report (0.63)

Industry:

Law (0.47)
Energy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Machine LearningJul-26-2024

Reinforcement Learning for Sustainable Energy: A Survey

Ponse, Koen, Kleuker, Felix, Fejér, Márton, Serra-Gómez, Álvaro, Plaat, Aske, Moerland, Thomas

The transition to sustainable energy is a key challenge of our time, requiring modifications in the entire pipeline of energy production, storage, transmission, and consumption. At every stage, new sequential decision-making challenges emerge, ranging from the operation of wind farms to the management of electrical grids or the scheduling of electric vehicle charging stations. All such problems are well suited for reinforcement learning, the branch of machine learning that learns behavior from data. Therefore, numerous studies have explored the use of reinforcement learning for sustainable energy. This paper surveys this literature with the intention of bridging both the underlying research communities: energy and machine learning. After a brief introduction of both fields, we systematically list relevant sustainability challenges, how they can be modeled as a reinforcement learning problem, and what solution approaches currently exist in the literature. Afterwards, we zoom out and identify overarching reinforcement learning themes that appear throughout sustainability, such as multi-agent, offline, and safe reinforcement learning. Lastly, we also cover standardization of environments, which will be crucial for connecting both research fields, and highlight potential directions for future work. In summary, this survey provides an extensive overview of reinforcement learning methods for sustainable energy, which may play a vital role in the energy transition.

learning, reinforcement, reinforcement learning, (11 more...)

2407.18597

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(13 more...)

Genre: Overview (1.00)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)
Energy > Renewable (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Ishida, Shu, Henriques, João F.

SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments

arXiv.org Artificial IntelligenceJul-26-2024

This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervision is a challenge. Two algorithms, PPOEM and SOAP, are proposed and studied in depth to address this problem. PPOEM applies the forward-backward algorithm (for Hidden Markov Models) to optimize the expected returns for an option-augmented policy. However, this learning approach is unstable during on-policy rollouts. It is also unsuited for learning causal policies without the knowledge of future trajectories, since option assignments are optimized for offline sequences where the entire episode is available. As an alternative approach, SOAP evaluates the policy gradient for an optimal option assignment. It extends the concept of the generalized advantage estimation (GAE) to propagate option advantages through time, which is an analytical equivalent to performing temporal back-propagation of option policy gradients. This option policy is only conditional on the history of the agent, not future actions. Evaluated against competing baselines, SOAP exhibited the most robust performance, correctly discovering options for POMDP corridor environments, as well as on standard benchmarks including Atari and MuJoCo, outperforming PPOEM, as well as LSTM and Option-Critic baselines. The open-sourced code is available at https://github.com/shuishida/SoapRL.

agent, algorithm, objective, (9 more...)

2407.18913

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

McDonald, Curtis, Barron, Andrew R

Log-Concave Coupling for Sampling Neural Net Posteriors

arXiv.org Machine LearningJul-26-2024

In this work, we present a sampling algorithm for single hidden layer neural networks. This algorithm is built upon a recursive series of Bayesian posteriors using a method we call Greedy Bayes. Sampling of the Bayesian posterior for neuron weight vectors $w$ of dimension $d$ is challenging because of its multimodality. Our algorithm to tackle this problem is based on a coupling of the posterior density for $w$ with an auxiliary random variable $\xi$. The resulting reverse conditional $w|\xi$ of neuron weights given auxiliary random variable is shown to be log concave. In the construction of the posterior distributions we provide some freedom in the choice of the prior. In particular, for Gaussian priors on $w$ with suitably small variance, the resulting marginal density of the auxiliary variable $\xi$ is proven to be strictly log concave for all dimensions $d$. For a uniform prior on the unit $\ell_1$ ball, evidence is given that the density of $\xi$ is again strictly log concave for sufficiently large $d$. The score of the marginal density of the auxiliary random variable $\xi$ is determined by an expectation over $w|\xi$ and thus can be computed by various rapidly mixing Markov Chain Monte Carlo methods. Moreover, the computation of the score of $\xi$ permits methods of sampling $\xi$ by a stochastic diffusion (Langevin dynamics) with drift function built from this score. With such dynamics, information-theoretic methods pioneered by Bakry and Emery show that accurate sampling of $\xi$ is obtained rapidly when its density is indeed strictly log-concave. After which, one more draw from $w|\xi$, produces neuron weights $w$ whose marginal distribution is from the desired posterior.

concave, log concave, posterior, (14 more...)

2407.18802

Country:

Europe > Greece > Attica > Athens (0.05)
North America > United States > Connecticut > New Haven County > New Haven (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Bianco, Francesca, Rigato, Silvia, Filippetti, Maria Laura, Ognibene, Dimitri

Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind

Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by ou r beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low - level mental states (e.g., intentions, g oals) and attributing high - level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed - forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs - driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.

actor, agent, architecture, (16 more...)

2407.18022

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > United Kingdom > England > Essex > Colchester (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Ivanov, Dima, Dütting, Paul, Talgam-Cohen, Inbal, Wang, Tonghan, Parkes, David C.

Principal-Agent Reinforcement Learning

Contracts are the economic framework which allows a principal to delegate a task to an agent -- despite misaligned interests, and even without directly observing the agent's actions. In many modern reinforcement learning settings, self-interested agents learn to perform a multi-stage task delegated to them by a principal. We explore the significant potential of utilizing contracts to incentivize the agents. We model the delegated task as an MDP, and study a stochastic game between the principal and agent where the principal learns what contracts to use, and the agent learns an MDP policy in response. We present a learning-based algorithm for optimizing the principal's contracts, which provably converges to the subgame-perfect equilibrium of the principal-agent game. A deep RL implementation allows us to apply our method to very large MDPs with unknown transition dynamics. We extend our approach to multiple agents, and demonstrate its relevance to resolving a canonical sequential social dilemma with minimal intervention to agent rewards.

agent, contract, q-function, (15 more...)

2407.18074

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
(4 more...)

Genre: Research Report > New Finding (0.92)

Industry:

Education (0.67)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)

van Vuren, Thomas, Cronk, Thomas, Sanders, Jaron

Estimating the number of clusters of a Block Markov Chain

arXiv.org Machine LearningJul-25-2024

Clustering algorithms frequently require the number of clusters to be chosen in advance, but it is usually not clear how to do this. To tackle this challenge when clustering within sequential data, we present a method for estimating the number of clusters when the data is a trajectory of a Block Markov Chain. Block Markov Chains are Markov Chains that exhibit a block structure in their transition matrix. The method considers a matrix that counts the number of transitions between different states within the trajectory, and transforms this into a spectral embedding whose dimension is set via singular value thresholding. The number of clusters is subsequently estimated via density-based clustering of this spectral embedding, an approach inspired by literature on the Stochastic Block Model. By leveraging and augmenting recent results on the spectral concentration of random matrices with Markovian dependence, we show that the method is asymptotically consistent - in spite of the dependencies between the count matrix's entries, and even when the count matrix is sparse. We also present a numerical evaluation of our method, and compare it to alternatives.

algorithm, algorithm 1, algorithm 2, (17 more...)

2407.18287

Country:

Europe > Netherlands > North Brabant > Eindhoven (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Dai, Runpeng, Wang, Jianing, Zhou, Fan, Luo, Shikai, Qin, Zhiwei, Shi, Chengchun, Zhu, Hongtu

Off-policy evaluation (OPE) is widely applied in sectors such as pharmaceuticals and e-commerce to evaluate the efficacy of novel products or policies from offline datasets. This paper introduces a causal deepset framework that relaxes several key structural assumptions, primarily the mean-field assumption, prevalent in existing OPE methodologies that handle spatio-temporal interference. These traditional assumptions frequently prove inadequate in real-world settings, thereby restricting the capability of current OPE methods to effectively address complex interference effects. In response, we advocate for the implementation of the permutation invariance (PI) assumption. This innovative approach enables the data-driven, adaptive learning of the mean-field function, offering a more flexible estimation method beyond conventional averaging. Furthermore, we present novel algorithms that incorporate the PI assumption into OPE and thoroughly examine their theoretical foundations. Our numerical analyses demonstrate that this novel approach yields significantly more precise estimations than existing baseline algorithms, thereby substantially improving the practical applicability and effectiveness of OPE methodologies.

assumption, estimator, interference, (12 more...)

2407.1791

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(5 more...)

Genre:

Overview > Innovation (0.87)
Research Report > Promising Solution (0.54)

Industry:

Health & Medicine (1.00)
Transportation > Ground > Road (0.93)
Transportation > Passenger (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
(2 more...)

Kaada, Soumeya, Tran, Dinh-Hieu, Van Huynh, Nguyen, Morel, Marie-Line Alberi, Jelassi, Sofiene, Rubino, Gerardo

Multi-Agent Deep Reinforcement Learning for Resilience Optimization in 5G RAN

Resilience is defined as the ability of a network to resist, adapt, and quickly recover from disruptions, and to continue to maintain an acceptable level of services from users' perspective. With the advent of future radio networks, including advanced 5G and upcoming 6G, critical services become integral to future networks, requiring uninterrupted service delivery for end users. Unfortunately, with the growing network complexity, user mobility and diversity, it becomes challenging to scale current resilience management techniques that rely on local optimizations to large dense network deployments. This paper aims to address this problem by globally optimizing the resilience of a dense multi-cell network based on multi-agent deep reinforcement learning. Specifically, our proposed solution can dynamically tilt cell antennas and reconfigure transmit power to mitigate outages and increase both coverage and service availability. A multi-objective optimization problem is formulated to simultaneously satisfy resiliency constraints while maximizing the service quality in the network area in order to minimize the impact of outages on neighbouring cells. Extensive simulations then demonstrate that with our proposed solution, the average service availability in terms of user throughput can be increased by up to 50-60% on average, while reaching a coverage availability of 99% in best cases.

optimization, outage, resilience, (15 more...)

2407.18066

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > France > Brittany > Ille-et-Vilaine > Rennes (0.04)
Oceania > Australia (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Telecommunications (1.00)
Information Technology > Networks (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)