q-table
Inference of Deterministic Finite Automata via Q-Learning
Hosseinkhani, Elaheh, Leucker, Martin
Traditional approaches to inference of deterministic finite-state automata (DFA) stem from symbolic AI, including both active learning methods (e.g., Angluin's L* algorithm and its variants) and passive techniques (e.g., Biermann and Feldman's method, RPNI). Meanwhile, sub-symbolic AI, particularly machine learning, offers alternative paradigms for learning from data, such as supervised, unsupervised, and reinforcement learning (RL). This paper investigates the use of Q-learning, a well-known reinforcement learning algorithm, for the passive inference of deterministic finite automata. It builds on the core insight that the learned Q-function, which maps state-action pairs to rewards, can be reinterpreted as the transition function of a DFA over a finite domain. This provides a novel bridge between sub-symbolic learning and symbolic representations. The paper demonstrates how Q-learning can be adapted for automaton inference and provides an evaluation on several examples.
- Oceania > Australia (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- (3 more...)
(P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration
Lin, Yi, Zhao, Lujin, Shi, Yijie
Recent studies have shown that carefully designed workflows coordinating large language models(LLMs) significantly enhance task-solving capabilities compared to using a single model. While an increasing number of works focus on autonomous workflow construction, most existing approaches rely solely on historical experience, leading to limitations in efficiency and adaptability. We argue that while historical experience is valuable, workflow construction should also flexibly respond to the unique characteristics of each task. To this end, we propose an a priori dynamic framework for automated workflow construction. Our framework first leverages Q-table learning to optimize the decision space, guiding agent decisions and enabling effective use of historical experience. At the same time, agents evaluate the current task progress and make a priori decisions regarding the next executing agent, allowing the system to proactively select the more suitable workflow structure for each given task. Additionally, we incorporate mechanisms such as cold-start initialization, early stopping, and pruning to further improve system efficiency. Experimental evaluations on four benchmark datasets demonstrate the feasibility and effectiveness of our approach. Compared to state-of-the-art baselines, our method achieves an average improvement of 4.05%, while reducing workflow construction and inference costs to only 30.68%-48.31% of those required by existing methods.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Workflow (1.00)
- Research Report (1.00)
Dilution, Diffusion and Symbiosis in Spatial Prisoner's Dilemma with Reinforcement Learning
Mangold, Gustavo C., Fernandes, Heitor C. M., Vainstein, Mendeli H.
Recent studies in the spatial prisoner's dilemma games with reinforcement learning have shown that static agents can learn to cooperate through a diverse sort of mechanisms, including noise injection, different types of learning algorithms and neighbours' payoff knowledge. In this work, using an independent multi-agent Q-learning algorithm, we study the effects of dilution and mobility in the spatial version of the prisoner's dilemma. Within this setting, different possible actions for the algorithm are defined, connecting with previous results on the classical, non-reinforcement learning spatial prisoner's dilemma, showcasing the versatility of the algorithm in modeling different game-theoretical scenarios and the benchmarking potential of this approach. As a result, a range of effects is observed, including evidence that games with fixed update rules can be qualitatively equivalent to those with learned ones, as well as the emergence of a symbiotic mutualistic effect between populations that forms when multiple actions are defined.
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- North America > United States > Michigan (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Late Breaking Results: Breaking Symmetry- Unconventional Placement of Analog Circuits using Multi-Level Multi-Agent Reinforcement Learning
Maji, Supriyo, Zhao, Linran, Poddar, Souradip, Pan, David Z.
Layout-dependent effects (LDEs) significantly impact analog circuit performance. Traditionally, designers have relied on symmetric placement of circuit components to mitigate variations caused by LDEs. However, due to non-linear nature of these effects, conventional methods often fall short. We propose an objective-driven, multi-level, multi-agent Q-learning framework to explore unconventional design space of analog layout, opening new avenues for optimizing analog circuit performance. Our approach achieves better variation performance than the state-of-the-art layout techniques. Notably, this is the first application of multi-agent RL in analog layout automation. The proposed approach is compared with non-ML approach based on simulated annealing.
Reinforcement Learning for Quantum Circuit Design: Using Matrix Representations
Wang, Zhiyuan, Feng, Chunlin, Poon, Christopher, Huang, Lijian, Zhao, Xingjian, Ma, Yao, Fu, Tianfan, Liu, Xiao-Yang
Quantum computing promises advantages over classical computing. The manufacturing of quantum hardware is in the infancy stage, called the Noisy Intermediate-Scale Quantum (NISQ) era. A major challenge is automated quantum circuit design that map a quantum circuit to gates in a universal gate set. In this paper, we present a generic MDP modeling and employ Q-learning and DQN algorithms for quantum circuit design. By leveraging the power of deep reinforcement learning, we aim to provide an automatic and scalable approach over traditional hand-crafted heuristic methods.
AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL
Stennett, Tyler, Kim, Myeongsoo, Sinha, Saurabh, Orso, Alessandro
As REST APIs have become widespread in modern web services, comprehensive testing of these APIs has become increasingly crucial. Due to the vast search space consisting of operations, parameters, and parameter values along with their complex dependencies and constraints, current testing tools suffer from low code coverage, leading to suboptimal fault detection. To address this limitation, we present a novel tool, AutoRestTest, which integrates the Semantic Operation Dependency Graph (SODG) with Multi-Agent Reinforcement Learning (MARL) and large language models (LLMs) for effective REST API testing. AutoRestTest determines operation-dependent parameters using the SODG and employs five specialized agents (operation, parameter, value, dependency, and header) to identify dependencies of operations and generate operation sequences, parameter combinations, and values. AutoRestTest provides a command-line interface and continuous telemetry on successful operation count, unique server errors detected, and time elapsed. Upon completion, AutoRestTest generates a detailed report highlighting errors detected and operations exercised. In this paper, we introduce our tool and present preliminary results.
- North America > United States > New York > New York County > New York City (0.05)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
An Open-source Sim2Real Approach for Sensor-independent Robot Navigation in a Grid
Abrar, Murad Mehrab, Mondal, Souryadeep, Hickner, Michelle
This paper presents a Sim2Real (Simulation to Reality) approach to bridge the gap between a trained agent in a simulated environment and its real-world implementation in navigating a robot in a similar setting. Specifically, we focus on navigating a quadruped robot in a real-world grid-like environment inspired by the Gymnasium Frozen Lake -- a highly user-friendly and free Application Programming Interface (API) to develop and test Reinforcement Learning (RL) algorithms. We detail the development of a pipeline to transfer motion policies learned in the Frozen Lake simulation to a physical quadruped robot, thus enabling autonomous navigation and obstacle avoidance in a grid without relying on expensive localization and mapping sensors. The work involves training an RL agent in the Frozen Lake environment and utilizing the resulting Q-table to control a 12 Degrees-of-Freedom (DOF) quadruped robot. In addition to detailing the RL implementation, inverse kinematics-based quadruped gaits, and the transfer policy pipeline, we open-source the project on GitHub and include a demonstration video of our Sim2Real transfer approach. This work provides an accessible, straightforward, and low-cost framework for researchers, students, and hobbyists to explore and implement RL-based robot navigation in real-world grid environments.
- North America > United States > Washington > King County > Seattle (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks
Alizadeh, Alireza, Lim, Byungju, Vu, Mai
As next generation cellular networks become denser, associating users with the optimal base stations at each time while ensuring no base station is overloaded becomes critical for achieving stable and high network performance. We propose multi-agent online Q-learning (QL) algorithms for performing real-time load balancing user association and handover in dense cellular networks. The load balancing constraints at all base stations couple the actions of user agents, and we propose two multi-agent action selection policies, one centralized and one distributed, to satisfy load balancing at every learning step. In the centralized policy, the actions of UEs are determined by a central load balancer (CLB) running an algorithm based on swapping the worst connection to maximize the total learning reward. In the distributed policy, each UE takes an action based on its local information by participating in a distributed matching game with the BSs to maximize the local reward. We then integrate these action selection policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility, using a reward function that considers a handover cost to reduce handover frequency. The proposed multi-agent QL algorithm features low-complexity and fast convergence, outperforming 3GPP max-SINR association. Both policies adapt well to network dynamics at various UE speed profiles from walking, running, to biking and suburban driving, illustrating their robustness and real-time adaptability.
- North America > United States (0.28)
- Asia > South Korea > Busan > Busan (0.04)
- Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)
- Research Report (0.64)
- Workflow (0.46)
- Telecommunications (1.00)
- Energy > Power Industry (1.00)
- Leisure & Entertainment > Games > Computer Games (0.75)
Decoding fairness: a reinforcement learning perspective
Zheng, Guozhong, Zhang, Jiqiang, Ou, Xin, Deng, Shengfeng, Chen, Li
Behavioral experiments on the ultimatum game (UG) reveal that we humans prefer fair acts, which contradicts the prediction made in orthodox Economics. Existing explanations, however, are mostly attributed to exogenous factors within the imitation learning framework. Here, we adopt the reinforcement learning paradigm, where individuals make their moves aiming to maximize their accumulated rewards. Specifically, we apply Q-learning to UG, where each player is assigned two Q-tables to guide decisions for the roles of proposer and responder. In a two-player scenario, fairness emerges prominently when both experiences and future rewards are appreciated. In particular, the probability of successful deals increases with higher offers, which aligns with observations in behavioral experiments. Our mechanism analysis reveals that the system undergoes two phases, eventually stabilizing into fair or rational strategies. These results are robust when the rotating role assignment is replaced by a random or fixed manner, or the scenario is extended to a latticed population. Our findings thus conclude that the endogenous factor is sufficient to explain the emergence of fairness, exogenous factors are not needed.
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- Asia > China > Shaanxi Province > Xi'an (0.04)
- Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)
Towards Measuring Goal-Directedness in AI Systems
Recent advances in deep learning have brought attention to the possibility of creating advanced, general AI systems that outperform humans across many tasks. However, if these systems pursue unintended goals, there could be catastrophic consequences. A key prerequisite for AI systems pursuing unintended goals is whether they will behave in a coherent and goal-directed manner in the first place, optimizing for some unknown goal; there exists significant research trying to evaluate systems for said behaviors. However, the most rigorous definitions of goal-directedness we currently have are difficult to compute in real-world settings. Drawing upon this previous literature, we explore policy goal-directedness within reinforcement learning (RL) environments. In our findings, we propose a different family of definitions of the goal-directedness of a policy that analyze whether it is well-modeled as near-optimal for many (sparse) reward functions. We operationalize this preliminary definition of goal-directedness and test it in toy Markov decision process (MDP) environments. Furthermore, we explore how goal-directedness could be measured in frontier large-language models (LLMs). Our contribution is a definition of goal-directedness that is simpler and more easily computable in order to approach the question of whether AI systems could pursue dangerous goals. We recommend further exploration of measuring coherence and goal-directedness, based on our findings.
- North America > United States > California > Alameda County > Berkeley (0.14)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > Dominican Republic (0.04)