AITopics | q-table

Collaborating Authors

q-table

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Inference of Deterministic Finite Automata via Q-Learning

Hosseinkhani, Elaheh, Leucker, Martin

arXiv.org Artificial IntelligenceOct-21-2025

Traditional approaches to inference of deterministic finite-state automata (DFA) stem from symbolic AI, including both active learning methods (e.g., Angluin's L* algorithm and its variants) and passive techniques (e.g., Biermann and Feldman's method, RPNI). Meanwhile, sub-symbolic AI, particularly machine learning, offers alternative paradigms for learning from data, such as supervised, unsupervised, and reinforcement learning (RL). This paper investigates the use of Q-learning, a well-known reinforcement learning algorithm, for the passive inference of deterministic finite automata. It builds on the core insight that the learned Q-function, which maps state-action pairs to rewards, can be reinterpreted as the transition function of a DFA over a finite domain. This provides a novel bridge between sub-symbolic learning and symbolic representations. The paper demonstrates how Q-learning can be adapted for automaton inference and provides an evaluation on several examples.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2510.17386

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

(P)rior(D)yna(F)low: A Priori Dynamic Workflow Construction via Multi-Agent Collaboration

Lin, Yi, Zhao, Lujin, Shi, Yijie

arXiv.org Artificial IntelligenceSep-19-2025

Recent studies have shown that carefully designed workflows coordinating large language models(LLMs) significantly enhance task-solving capabilities compared to using a single model. While an increasing number of works focus on autonomous workflow construction, most existing approaches rely solely on historical experience, leading to limitations in efficiency and adaptability. We argue that while historical experience is valuable, workflow construction should also flexibly respond to the unique characteristics of each task. To this end, we propose an a priori dynamic framework for automated workflow construction. Our framework first leverages Q-table learning to optimize the decision space, guiding agent decisions and enabling effective use of historical experience. At the same time, agents evaluate the current task progress and make a priori decisions regarding the next executing agent, allowing the system to proactively select the more suitable workflow structure for each given task. Additionally, we incorporate mechanisms such as cold-start initialization, early stopping, and pruning to further improve system efficiency. Experimental evaluations on four benchmark datasets demonstrate the feasibility and effectiveness of our approach. Compared to state-of-the-art baselines, our method achieves an average improvement of 4.05%, while reducing workflow construction and inference costs to only 30.68%-48.31% of those required by existing methods.

large language model, machine learning, node, (19 more...)

arXiv.org Artificial Intelligence

2509.14547

Country:

Asia (0.46)
North America > Mexico (0.28)

Genre:

Workflow (1.00)
Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Dilution, Diffusion and Symbiosis in Spatial Prisoner's Dilemma with Reinforcement Learning

Mangold, Gustavo C., Fernandes, Heitor C. M., Vainstein, Mendeli H.

arXiv.org Artificial IntelligenceJul-8-2025

Recent studies in the spatial prisoner's dilemma games with reinforcement learning have shown that static agents can learn to cooperate through a diverse sort of mechanisms, including noise injection, different types of learning algorithms and neighbours' payoff knowledge. In this work, using an independent multi-agent Q-learning algorithm, we study the effects of dilution and mobility in the spatial version of the prisoner's dilemma. Within this setting, different possible actions for the algorithm are defined, connecting with previous results on the classical, non-reinforcement learning spatial prisoner's dilemma, showcasing the versatility of the algorithm in modeling different game-theoretical scenarios and the benchmarking potential of this approach. As a result, a range of effects is observed, including evidence that games with fixed update rules can be qualitatively equivalent to those with learned ones, as well as the emergence of a symbiotic mutualistic effect between populations that forms when multiple actions are defined.

machine learning, reinforcement, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2507.02211

Country: South America > Brazil (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Reinforcement Learning for Quantum Circuit Design: Using Matrix Representations

Wang, Zhiyuan, Feng, Chunlin, Poon, Christopher, Huang, Lijian, Zhao, Xingjian, Ma, Yao, Fu, Tianfan, Liu, Xiao-Yang

arXiv.org Artificial IntelligenceJan-27-2025

Quantum computing promises advantages over classical computing. The manufacturing of quantum hardware is in the infancy stage, called the Noisy Intermediate-Scale Quantum (NISQ) era. A major challenge is automated quantum circuit design that map a quantum circuit to gates in a universal gate set. In this paper, we present a generic MDP modeling and employ Q-learning and DQN algorithms for quantum circuit design. By leveraging the power of deep reinforcement learning, we aim to provide an automatic and scalable approach over traditional hand-crafted heuristic methods.

cnot 01, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2501.16509

Genre: Research Report (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

AutoRestTest: A Tool for Automated REST API Testing Using LLMs and MARL

Stennett, Tyler, Kim, Myeongsoo, Sinha, Saurabh, Orso, Alessandro

arXiv.org Artificial IntelligenceJan-15-2025

As REST APIs have become widespread in modern web services, comprehensive testing of these APIs has become increasingly crucial. Due to the vast search space consisting of operations, parameters, and parameter values along with their complex dependencies and constraints, current testing tools suffer from low code coverage, leading to suboptimal fault detection. To address this limitation, we present a novel tool, AutoRestTest, which integrates the Semantic Operation Dependency Graph (SODG) with Multi-Agent Reinforcement Learning (MARL) and large language models (LLMs) for effective REST API testing. AutoRestTest determines operation-dependent parameters using the SODG and employs five specialized agents (operation, parameter, value, dependency, and header) to identify dependencies of operations and generate operation sequences, parameter combinations, and values. AutoRestTest provides a command-line interface and continuous telemetry on successful operation count, unique server errors detected, and time elapsed. Upon completion, AutoRestTest generates a detailed report highlighting errors detected and operations exercised. In this paper, we introduce our tool and present preliminary results.

autoresttest, dependency, opération, (14 more...)

arXiv.org Artificial Intelligence

2501.086

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Open-source Sim2Real Approach for Sensor-independent Robot Navigation in a Grid

Abrar, Murad Mehrab, Mondal, Souryadeep, Hickner, Michelle

arXiv.org Artificial IntelligenceJan-6-2025

This paper presents a Sim2Real (Simulation to Reality) approach to bridge the gap between a trained agent in a simulated environment and its real-world implementation in navigating a robot in a similar setting. Specifically, we focus on navigating a quadruped robot in a real-world grid-like environment inspired by the Gymnasium Frozen Lake -- a highly user-friendly and free Application Programming Interface (API) to develop and test Reinforcement Learning (RL) algorithms. We detail the development of a pipeline to transfer motion policies learned in the Frozen Lake simulation to a physical quadruped robot, thus enabling autonomous navigation and obstacle avoidance in a grid without relying on expensive localization and mapping sensors. The work involves training an RL agent in the Frozen Lake environment and utilizing the resulting Q-table to control a 12 Degrees-of-Freedom (DOF) quadruped robot. In addition to detailing the RL implementation, inverse kinematics-based quadruped gaits, and the transfer policy pipeline, we open-source the project on GitHub and include a demonstration video of our Sim2Real transfer approach. This work provides an accessible, straightforward, and low-cost framework for researchers, students, and hobbyists to explore and implement RL-based robot navigation in real-world grid environments.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2411.03494

Country: North America > United States > Washington > King County > Seattle (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.90)

Add feedback

Multi-Agent Q-Learning for Real-Time Load Balancing User Association and Handover in Mobile Networks

Alizadeh, Alireza, Lim, Byungju, Vu, Mai

arXiv.org Artificial IntelligenceDec-22-2024

As next generation cellular networks become denser, associating users with the optimal base stations at each time while ensuring no base station is overloaded becomes critical for achieving stable and high network performance. We propose multi-agent online Q-learning (QL) algorithms for performing real-time load balancing user association and handover in dense cellular networks. The load balancing constraints at all base stations couple the actions of user agents, and we propose two multi-agent action selection policies, one centralized and one distributed, to satisfy load balancing at every learning step. In the centralized policy, the actions of UEs are determined by a central load balancer (CLB) running an algorithm based on swapping the worst connection to maximize the total learning reward. In the distributed policy, each UE takes an action based on its local information by participating in a distributed matching game with the BSs to maximize the local reward. We then integrate these action selection policies into an online QL algorithm that adapts in real-time to network dynamics including channel variations and user mobility, using a reward function that considers a handover cost to reduce handover frequency. The proposed multi-agent QL algorithm features low-complexity and fast convergence, outperforming 3GPP max-SINR association. Both policies adapt well to network dynamics at various UE speed profiles from walking, running, to biking and suburban driving, illustrating their robustness and real-time adaptability.

algorithm, association vector, user association, (16 more...)

arXiv.org Artificial Intelligence

2412.19835

Country:

North America > United States (0.28)
Asia > South Korea > Busan > Busan (0.04)
Asia > Middle East > Yemen > Amanat Al Asimah > Sanaa (0.04)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Telecommunications (1.00)
Energy > Power Industry (1.00)
Leisure & Entertainment > Games > Computer Games (0.75)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Decoding fairness: a reinforcement learning perspective

Zheng, Guozhong, Zhang, Jiqiang, Ou, Xin, Deng, Shengfeng, Chen, Li

arXiv.org Artificial IntelligenceDec-19-2024

Behavioral experiments on the ultimatum game (UG) reveal that we humans prefer fair acts, which contradicts the prediction made in orthodox Economics. Existing explanations, however, are mostly attributed to exogenous factors within the imitation learning framework. Here, we adopt the reinforcement learning paradigm, where individuals make their moves aiming to maximize their accumulated rewards. Specifically, we apply Q-learning to UG, where each player is assigned two Q-tables to guide decisions for the roles of proposer and responder. In a two-player scenario, fairness emerges prominently when both experiences and future rewards are appreciated. In particular, the probability of successful deals increases with higher offers, which aligns with observations in behavioral experiments. Our mechanism analysis reveals that the system undergoes two phases, eventually stabilizing into fair or rational strategies. These results are robust when the rotating role assignment is replaced by a random or fixed manner, or the scenario is extended to a latticed population. Our findings thus conclude that the endogenous factor is sufficient to explain the emergence of fairness, exogenous factors are not needed.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2412.16249

Country:

North America > United States > New Jersey > Mercer County > Princeton (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > China > Ningxia Hui Autonomous Region > Yinchuan (0.04)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Towards Measuring Goal-Directedness in AI Systems

Xu, Dylan, Rivera, Juan-Pablo

arXiv.org Artificial IntelligenceNov-21-2024

Recent advances in deep learning have brought attention to the possibility of creating advanced, general AI systems that outperform humans across many tasks. However, if these systems pursue unintended goals, there could be catastrophic consequences. A key prerequisite for AI systems pursuing unintended goals is whether they will behave in a coherent and goal-directed manner in the first place, optimizing for some unknown goal; there exists significant research trying to evaluate systems for said behaviors. However, the most rigorous definitions of goal-directedness we currently have are difficult to compute in real-world settings. Drawing upon this previous literature, we explore policy goal-directedness within reinforcement learning (RL) environments. In our findings, we propose a different family of definitions of the goal-directedness of a policy that analyze whether it is well-modeled as near-optimal for many (sparse) reward functions. We operationalize this preliminary definition of goal-directedness and test it in toy Markov decision process (MDP) environments. Furthermore, we explore how goal-directedness could be measured in frontier large-language models (LLMs). Our contribution is a definition of goal-directedness that is simpler and more easily computable in order to approach the question of whether AI systems could pursue dangerous goals. We recommend further exploration of measuring coherence and goal-directedness, based on our findings.

large language model, machine learning, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2410.04683

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Online waveform selection for cognitive radar

Tholeti, Thulasi, Rangarajan, Avinash, Kalyani, Sheetal

arXiv.org Artificial IntelligenceOct-14-2024

Designing a cognitive radar system capable of adapting its parameters is challenging, particularly when tasked with tracking a ballistic missile throughout its entire flight. In this work, we focus on proposing adaptive algorithms that select waveform parameters in an online fashion. Our novelty lies in formulating the learning problem using domain knowledge derived from the characteristics of ballistic trajectories. We propose three reinforcement learning algorithms: bandwidth scaling, Q-learning, and Q-learning lookahead. These algorithms dynamically choose the bandwidth for each transmission based on received feedback. Through experiments on synthetically generated ballistic trajectories, we demonstrate that our proposed algorithms achieve the dual objectives of minimizing range error and maintaining continuous tracking without losing the target.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2410.10591

Country:

Asia > India > Tamil Nadu > Chennai (0.05)
North America > United States (0.04)

Genre: Research Report (0.83)

Industry:

Government > Military (0.35)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback