AITopics

2412.18426

Genre:

Workflow (0.95)
Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Su, Thet Htar, Shresthamali, Shaswot, Kondo, Masaaki

Quantum framework for Reinforcement Learning: integrating Markov Decision Process, quantum arithmetic, and trajectory search

arXiv.org Artificial IntelligenceDec-24-2024

This paper introduces a quantum framework for addressing reinforcement learning (RL) tasks, grounded in the quantum principles and leveraging a fully quantum model of the classical Markov Decision Process (MDP). By employing quantum concepts and a quantum search algorithm, this work presents the implementation and optimization of the agent-environment interactions entirely within the quantum domain, eliminating reliance on classical computations. Key contributions include the quantum-based state transitions, return calculation, and trajectory search mechanism that utilize quantum principles to demonstrate the realization of RL processes through quantum phenomena. The implementation emphasizes the fundamental role of quantum superposition in enhancing computational efficiency for RL tasks. Experimental results demonstrate the capacity of a quantum model to achieve quantum advantage in RL, highlighting the potential of fully quantum implementations in decision-making tasks. This work not only underscores the applicability of quantum computing in machine learning but also contributes the field of quantum reinforcement learning (QRL) by offering a robust framework for understanding and exploiting quantum computing in RL systems.

machine learning, reinforcement learning, trajectory, (19 more...)

2412.18208

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.61)

Han, Yinbin, Razaviyayn, Meisam, Xu, Renyuan

Stochastic Control for Fine-tuning Diffusion Models: Optimality, Regularity, and Convergence

arXiv.org Artificial IntelligenceDec-23-2024

Diffusion models have emerged as powerful tools for generative modeling, demonstrating exceptional capability in capturing target data distributions from large datasets. However, fine-tuning these massive models for specific downstream tasks, constraints, and human preferences remains a critical challenge. While recent advances have leveraged reinforcement learning algorithms to tackle this problem, much of the progress has been empirical, with limited theoretical understanding. To bridge this gap, we propose a stochastic control framework for fine-tuning diffusion models. Building on denoising diffusion probabilistic models as the pre-trained reference dynamics, our approach integrates linear dynamics control with Kullback-Leibler regularization. We establish the well-posedness and regularity of the stochastic control problem and develop a policy iteration algorithm (PI-FT) for numerical solution. We show that PI-FT achieves global convergence at a linear rate. Unlike existing work that assumes regularities throughout training, we prove that the control and value sequences generated by the algorithm maintain the regularity. Additionally, we explore extensions of our framework to parametric settings and continuous-time formulations.

arxiv preprint arxiv, machine learning, reinforcement learning, (17 more...)

2412.18164

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Nabergoj, David, Štrumbelj, Erik

Empirical evaluation of normalizing flows in Markov Chain Monte Carlo

arXiv.org Machine LearningDec-22-2024

Recent advances in MCMC use normalizing flows to precondition target distributions and enable jumps to distant regions. However, there is currently no systematic comparison of different normalizing flow architectures for MCMC. As such, many works choose simple flow architectures that are readily available and do not consider other models. Guidelines for choosing an appropriate architecture would reduce analysis time for practitioners and motivate researchers to take the recommended models as foundations to be improved. We provide the first such guideline by extensively evaluating many normalizing flow architectures on various flow-based MCMC methods and target distributions. When the target density gradient is available, we show that flow-based MCMC outperforms classic MCMC for suitable NF architecture choices with minor hyperparameter tuning. When the gradient is unavailable, flow-based MCMC wins with off-the-shelf architectures. We find contractive residual flows to be the best general-purpose models with relatively low sensitivity to hyperparameter choice. We also provide various insights into normalizing flow behavior within MCMC when varying their hyperparameters, properties of target distributions, and the overall computational budget.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2412.17136

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Rojas, Juan Sebastian, Lee, Chi-Guhn

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with the majority of RL-based efforts having been allocated to episodic and discounted MDPs. In this work, we study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various learning objectives, or subtasks, simultaneously in the average-reward setting. We introduce a family of RED learning algorithms for prediction and control, including proven-convergent algorithms for the tabular case. We then showcase the power of these algorithms by demonstrating how they can be used to learn a policy that optimizes, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online manner, without the use of an explicit bi-level optimization scheme or an augmented state-space.

algorithm, cv ar, equation, (14 more...)

2410.10578

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Learning an Adaptive Fall Recovery Controller for Quadrupeds on Complex Terrains

Lu, Yidan, Dong, Yinzhao, Ma, Ji, Zhang, Jiahui, Lu, Peng

Legged robots have made significant strides in locomotion However, in extreme or complex natural environments, capabilities, demonstrating impressive performance in robots still face the inevitability of falling. A major challenge tasks such as dynamic walking, running, and even complex in current research lies in developing adaptive controllers maneuvers like backflips [8], [2]. However, the ability to for robots to effectively recover from falls, allowing them recover from falls, especially on challenging and unpredictable to resume movement or efficiently complete tasks. However, terrains, remains a critical challenge in the field of legged model-based methods are often inadequate for these dynamic robotics. While substantial progress has been made in recovery tasks. For example, Mordatch et al. [12] proposed a framework strategies for flat or moderately uneven surfaces [7], [13], that optimizes automatic recovery through contact invariance, the problem of robust recovery on highly irregular terrains - but the reliance on predefined potential contact points limits such as rocky landscapes, steep inclines, or complex gaps - the exploration of flexible behaviors. In addition, classical has received limited attention.

artificial intelligence, machine learning, robot, (10 more...)

2412.16924

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Preventing Non-intrusive Load Monitoring Privacy Invasion: A Precise Adversarial Attack Scheme for Networked Smart Meters

He, Jialing, Wang, Jiacheng, Wang, Ning, Guo, Shangwei, Zhu, Liehuang, Niyato, Dusit, Xiang, Tao

Smart grid, through networked smart meters employing the non-intrusive load monitoring (NILM) technique, can considerably discern the usage patterns of residential appliances. However, this technique also incurs privacy leakage. To address this issue, we propose an innovative scheme based on adversarial attack in this paper. The scheme effectively prevents NILM models from violating appliance-level privacy, while also ensuring accurate billing calculation for users. To achieve this objective, we overcome two primary challenges. First, as NILM models fall under the category of time-series regression models, direct application of traditional adversarial attacks designed for classification tasks is not feasible. To tackle this issue, we formulate a novel adversarial attack problem tailored specifically for NILM and providing a theoretical foundation for utilizing the Jacobian of the NILM model to generate imperceptible perturbations. Leveraging the Jacobian, our scheme can produce perturbations, which effectively misleads the signal prediction of NILM models to safeguard users' appliance-level privacy. The second challenge pertains to fundamental utility requirements, where existing adversarial attack schemes struggle to achieve accurate billing calculation for users. To handle this problem, we introduce an additional constraint, mandating that the sum of added perturbations within a billing period must be precisely zero. Experimental validation on real-world power datasets REDD and UK-DALE demonstrates the efficacy of our proposed solutions, which can significantly amplify the discrepancy between the output of the targeted NILM model and the actual power signal of appliances, and enable accurate billing at the same time. Additionally, our solutions exhibit transferability, making the generated perturbation signal from one target model applicable to other diverse NILM models.

artificial intelligence, data mining, machine learning, (18 more...)

2412.16893

Country: Asia > China (0.47)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)
Energy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Understanding Individual Agent Importance in Multi-Agent System via Counterfactual Reasoning

Chen, Jianming, Wang, Yawen, Wang, Junjie, Xie, Xiaofei, Hu, jun, Wang, Qing, Xu, Fanjiang

Explaining multi-agent systems (MAS) is urgent as these systems become increasingly prevalent in various applications. Previous work has proveided explanations for the actions or states of agents, yet falls short in understanding the black-boxed agent's importance within a MAS and the overall team strategy. To bridge this gap, we propose EMAI, a novel agent-level explanation approach that evaluates the individual agent's importance. Inspired by counterfactual reasoning, a larger change in reward caused by the randomized action of agent indicates its higher importance. We model it as a MARL problem to capture interactions across agents. Utilizing counterfactual reasoning, EMAI learns the masking agents to identify important agents. Specifically, we define the optimization function to minimize the reward difference before and after action randomization and introduce sparsity constraints to encourage the exploration of more action randomization of agents during training. The experimental results in seven multi-agent tasks demonstratee that EMAI achieves higher fidelity in explanations than baselines and provides more effective guidance in practical applications concerning understanding policies, launching attacks, and patching policies.

agent, artificial intelligence, machine learning, (14 more...)

2412.15619

Country:

Asia > China (0.15)
Asia > Singapore (0.14)

Genre: Research Report (0.64)

Industry:

Transportation (0.69)
Information Technology > Security & Privacy (0.68)
Government > Military (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Novitsky, Michael, Barenboim, Moran, Indelman, Vadim

Previous Knowledge Utilization In Online Anytime Belief Space Planning

arXiv.org Artificial IntelligenceDec-21-2024

Online planning under uncertainty remains a critical challenge in robotics and autonomous systems. While tree search techniques are commonly employed to construct partial future trajectories within computational constraints, most existing methods discard information from previous planning sessions considering continuous spaces. This study presents a novel, computationally efficient approach that leverages historical planning data in current decision-making processes. We provide theoretical foundations for our information reuse strategy and introduce an algorithm based on Monte Carlo Tree Search (MCTS) that implements this approach. Experimental results demonstrate that our method significantly reduces computation time while maintaining high performance levels. Our findings suggest that integrating historical planning information can substantially improve the efficiency of online decision-making in uncertain environments, paving the way for more responsive and adaptive autonomous systems.

artificial intelligence, machine learning, trajectory, (20 more...)

2412.13128

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Zhou, Yanying, Li, Shijie, Garcke, Jochen

Foresight Social-aware Reinforcement Learning for Robot Navigation

arXiv.org Artificial IntelligenceDec-20-2024

When robots handle navigation tasks while avoiding collisions, they perform in crowded and complex environments not as good as in stable and homogeneous environments. This often results in a low success rate and poor efficiency. Therefore, we propose a novel Foresight Social-aware Reinforcement Learning (FSRL) framework for mobile robots to achieve collision-free navigation. Compared to previous learning-based methods, our approach is foresighted. It not only considers the current human-robot interaction to avoid an immediate collision, but also estimates upcoming social interactions to still keep distance in the future. Furthermore, an efficiency constraint is introduced in our approach that significantly reduces navigation time. Comparative experiments are performed to verify the effectiveness and efficiency of our proposed method under more realistic and challenging simulated environments.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

doi: 10.1109/CCDC58219.2023.10327485

2105.13409

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.05)
Asia > China (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)