Goto

Collaborating Authors

 Markov Models


Twin Sorting Dynamic Programming Assisted User Association and Wireless Bandwidth Allocation for Hierarchical Federated Learning

arXiv.org Artificial Intelligence

In this paper, we study user association and wireless bandwidth allocation for a hierarchical federated learning system that consists of mobile users, edge servers, and a cloud server. To minimize the length of a global round in hierarchical federated learning with equal bandwidth allocation, we formulate a combinatorial optimization problem. We design the twin sorting dynamic programming (TSDP) algorithm that obtains a globally optimal solution in polynomial time when there are two edge servers. In addition, we put forward the TSDP-assisted algorithm for user association when there are three or more edge servers. Furthermore, given a user association matrix, we formulate and solve a convex optimization problem for optimal wireless bandwidth allocation. Simulation results show that the proposed approach outperforms a number of alternative schemes.


Pessimistic Iterative Planning for Robust POMDPs

arXiv.org Artificial Intelligence

Robust partially observable Markov decision processes (robust POMDPs) extend classical POMDPs to handle additional uncertainty on the transition and observation probabilities via so-called uncertainty sets. Policies for robust POMDPs must not only be memory-based to account for partial observability but also robust against model uncertainty to account for the worst-case instances from the uncertainty sets. We propose the pessimistic iterative planning (PIP) framework, which finds robust memory-based policies for robust POMDPs. PIP alternates between two main steps: (1) selecting an adversarial (non-robust) POMDP via worst-case probability instances from the uncertainty sets; and (2) computing a finite-state controller (FSC) for this adversarial POMDP. We evaluate the performance of this FSC on the original robust POMDP and use this evaluation in step (1) to select the next adversarial POMDP. Within PIP, we propose the rFSCNet algorithm. In each iteration, rFSCNet finds an FSC through a recurrent neural network trained using supervision policies optimized for the adversarial POMDP. The empirical evaluation in four benchmark environments showcases improved robustness against a baseline method in an ablation study and competitive performance compared to a state-of-the-art robust POMDP solver.


Heavy-Ball Momentum Accelerated Actor-Critic With Function Approximation

arXiv.org Artificial Intelligence

By using an parametric value function to replace the Monte-Carlo rollouts for value estimation, the actor-critic (AC) algorithms can reduce the variance of stochastic policy gradient so that to improve the convergence rate. While existing works mainly focus on analyzing convergence rate of AC algorithms under Markovian noise, the impacts of momentum on AC algorithms remain largely unexplored. In this work, we first propose a heavy-ball momentum based advantage actor-critic (\mbox{HB-A2C}) algorithm by integrating the heavy-ball momentum into the critic recursion that is parameterized by a linear function. When the sample trajectory follows a Markov decision process, we quantitatively certify the acceleration capability of the proposed HB-A2C algorithm. Our theoretical results demonstrate that the proposed HB-A2C finds an $\epsilon$-approximate stationary point with $\oo{\epsilon^{-2}}$ iterations for reinforcement learning tasks with Markovian noise. Moreover, we also reveal the dependence of learning rates on the length of the sample trajectory. By carefully selecting the momentum factor of the critic recursion, the proposed HB-A2C can balance the errors introduced by the initialization and the stoschastic approximation.


A Factored MDP Approach To Moving Target Defense With Dynamic Threat Modeling and Cost Efficiency

arXiv.org Artificial Intelligence

Moving Target Defense (MTD) has emerged as a proactive and dynamic framework to counteract evolving cyber threats. Traditional MTD approaches often rely on assumptions about the attackers knowledge and behavior. However, real-world scenarios are inherently more complex, with adaptive attackers and limited prior knowledge of their payoffs and intentions. This paper introduces a novel approach to MTD using a Markov Decision Process (MDP) model that does not rely on predefined attacker payoffs. Our framework integrates the attackers real-time responses into the defenders MDP using a dynamic Bayesian Network. By employing a factored MDP model, we provide a comprehensive and realistic system representation. We also incorporate incremental updates to an attack response predictor as new data emerges. This ensures an adaptive and robust defense mechanism. Additionally, we consider the costs of switching configurations in MTD, integrating them into the reward structure to balance execution and defense costs. We first highlight the challenges of the problem through a theoretical negative result on regret. However, empirical evaluations demonstrate the frameworks effectiveness in scenarios marked by high uncertainty and dynamically changing attack landscapes.


The computational power of a human society: a new model of social evolution

arXiv.org Artificial Intelligence

Social evolutionary theory seeks to explain increases in the scale and complexity of human societies, from origins to present. Over the course of the twentieth century, social evolutionary theory largely fell out of favor as a way of investigating human history, just as advances in complex systems science and computer science saw the emergence of powerful new conceptions of complex systems, and in particular new methods of measuring complexity. We propose that these advances in our understanding of complex systems and computer science should be brought to bear on our investigations into human history. To that end, we present a new framework for modeling how human societies co-evolve with their biotic environments, recognizing that both a society and its environment are computers. This leads us to model the dynamics of each of those two systems using the same, new kind of computational machine, which we define here. For simplicity, we construe a society as a set of interacting occupations and technologies. Similarly, under such a model, a biotic environment is a set of interacting distinct ecological and climatic processes. This provides novel ways to characterize social complexity, which we hope will cast new light on the archaeological and historical records. Our framework also provides a natural way to formalize both the energetic (thermodynamic) costs required by a society as it runs, and the ways it can extract thermodynamic resources from the environment in order to pay for those costs -- and perhaps to grow with any left-over resources.


Exploring Latent Space for Generating Peptide Analogs Using Protein Language Models

arXiv.org Artificial Intelligence

Generating peptides with desired properties is crucial for drug discovery and biotechnology. Traditional sequence-based and structure-based methods often require extensive datasets, which limits their effectiveness. In this study, we proposed a novel method that utilized autoencoder shaped models to explore the protein embedding space, and generate novel peptide analogs by leveraging protein language models. The proposed method requires only a single sequence of interest, avoiding the need for large datasets. Our results show significant improvements over baseline models in similarity indicators of peptide structures, descriptors and bioactivities. The proposed method validated through Molecular Dynamics simulations on TIGIT inhibitors, demonstrates that our method produces peptide analogs with similar yet distinct properties, highlighting its potential to enhance peptide screening processes.


Experimental evaluation of offline reinforcement learning for HVAC control in buildings

arXiv.org Artificial Intelligence

Reinforcement learning (RL) techniques have been increasingly investigated for dynamic HVAC control in buildings. However, most studies focus on exploring solutions in online or off-policy scenarios without discussing in detail the implementation feasibility or effectiveness of dealing with purely offline datasets or trajectories. The lack of these works limits the real-world deployment of RL-based HVAC controllers, especially considering the abundance of historical data. To this end, this paper comprehensively evaluates the strengths and limitations of state-of-the-art offline RL algorithms by conducting analytical and numerical studies. The analysis is conducted from two perspectives: algorithms and dataset characteristics. As a prerequisite, the necessity of applying offline RL algorithms is first confirmed in two building environments. The ability of observation history modeling to reduce violations and enhance performance is subsequently studied. Next, the performance of RL-based controllers under datasets with different qualitative and quantitative conditions is investigated, including constraint satisfaction and power consumption. Finally, the sensitivity of certain hyperparameters is also evaluated. The results indicate that datasets of a certain suboptimality level and relatively small scale can be utilized to effectively train a well-performed RL-based HVAC controller. Specifically, such controllers can reduce at most 28.5% violation ratios of indoor temperatures and achieve at most 12.1% power savings compared to the baseline controller. In summary, this paper presents our well-structured investigations and new findings when applying offline reinforcement learning to building HVAC systems.


Lifelong Reinforcement Learning via Neuromodulation

arXiv.org Artificial Intelligence

Navigating multiple tasks$\unicode{x2014}$for instance in succession as in continual or lifelong learning, or in distributions as in meta or multi-task learning$\unicode{x2014}$requires some notion of adaptation. Evolution over timescales of millennia has imbued humans and other animals with highly effective adaptive learning and decision-making strategies. Central to these functions are so-called neuromodulatory systems. In this work we introduce an abstract framework for integrating theories and evidence from neuroscience and the cognitive sciences into the design of adaptive artificial reinforcement learning algorithms. We give a concrete instance of this framework built on literature surrounding the neuromodulators Acetylcholine (ACh) and Noradrenaline (NA), and empirically validate the effectiveness of the resulting adaptive algorithm in a non-stationary multi-armed bandit problem. We conclude with a theory-based experiment proposal providing an avenue to link our framework back to efforts in experimental neuroscience.


Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

arXiv.org Artificial Intelligence

In a multi-agent system (MAS), action semantics indicates the different influences of agents' actions toward other entities, and can be used to divide agents into groups in a physically heterogeneous MAS. Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents without careful discrimination of different action semantics. This common implementation decreases the cooperation and coordination between agents in complex situations. However, fully independent agent parameters dramatically increase the computational cost and training difficulty. In order to benefit from the usage of different action semantics while also maintaining a proper parameter-sharing structure, we introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is the union set of all agent actions with different semantics. All agents first calculate their unified representation in the UAS, and then generate their heterogeneous action policies using different available-action-masks. To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information. As a universal method for solving the physically heterogeneous MARL problem, we implement the UAS adding to both value-based and policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and U-MAPPO. Experimental results in the SMAC environment prove the effectiveness of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.


Robust Deep Reinforcement Learning for Inverter-based Volt-Var Control in Partially Observable Distribution Networks

arXiv.org Artificial Intelligence

Inverter-based volt-var control is studied in this paper. One key issue in DRL-based approaches is the limited measurement deployment in active distribution networks, which leads to problems of a partially observable state and unknown reward. To address those problems, this paper proposes a robust DRL approach with a conservative critic and a surrogate reward. The conservative critic utilizes the quantile regression technology to estimate conservative state-action value function based on the partially observable state, which helps to train a robust policy; the surrogate rewards of power loss and voltage violation are designed that can be calculated from the limited measurements. The proposed approach optimizes the power loss of the whole network and the voltage profile of buses with measurable voltages while indirectly improving the voltage profile of other buses. Extensive simulations verify the effectiveness of the robust DRL approach in different limited measurement conditions, even when only the active power injection of the root bus and less than 10% of bus voltages are measurable.