AITopics

2309.04272

Country: Europe > Switzerland > Zürich > Zürich (0.54)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningOct-14-2023

DPZero: Dimension-Independent and Differentially Private Zeroth-Order Optimization

Zhang, Liang, Thekumparampil, Kiran Koshy, Oh, Sewoong, He, Niao

The widespread practice of fine-tuning pretrained large language models (LLMs) on domain-specific data faces two major challenges in memory and privacy. First, as the size of LLMs continue to grow, encompassing billions of parameters, the memory demands of gradient-based training methods via backpropagation become prohibitively high. Second, given the tendency of LLMs to memorize and disclose sensitive training data, the privacy of fine-tuning data must be respected. To this end, we explore the potential of zeroth-order methods in differentially private optimization for fine-tuning LLMs. Zeroth-order methods, which rely solely on forward passes, substantially reduce memory consumption during training. However, directly combining them with standard differential privacy mechanism poses dimension-dependent complexity. To bridge the gap, we introduce DPZero, a novel differentially private zeroth-order algorithm with nearly dimension-independent rates. Our theoretical analysis reveals that its complexity hinges primarily on the problem's intrinsic dimension and exhibits only a logarithmic dependence on the ambient dimension. This renders DPZero a highly practical option for real-world LLMs deployments.

data mining, machine learning, natural language, (16 more...)

2310.09639

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.47)

arXiv.org Machine LearningOct-13-2023

On the Statistical Efficiency of Mean Field Reinforcement Learning with General Function Approximation

Huang, Jiawei, Yardim, Batuhan, He, Niao

In this paper, we study the fundamental statistical efficiency of Reinforcement Learning in Mean-Field Control (MFC) and Mean-Field Game (MFG) with general model-based function approximation. We introduce a new concept called Mean-Field Model-Based Eluder Dimension (MF-MBED), which characterizes the inherent complexity of mean-field model classes. We show that low MF-MBED subsumes a rich family of Mean-Field RL problems. Additionally, we propose algorithms based on maximal likelihood estimation, which can return an $\epsilon$-optimal policy for MFC or an $\epsilon$-Nash Equilibrium policy for MFG, with sample complexity polynomial w.r.t. relevant parameters and independent of the number of states, actions and agents. Compared with previous works, our results only require the minimal assumptions including realizability and Lipschitz continuity.

artificial intelligence, fuzzy logic, machine learning, (14 more...)

2305.11283

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.62)

arXiv.org Machine LearningOct-10-2023

Robust Knowledge Transfer in Tiered Reinforcement Learning

Huang, Jiawei, He, Niao

In this paper, we study the Tiered Reinforcement Learning setting, a parallel transfer learning framework, where the goal is to transfer knowledge from the low-tier (source) task to the high-tier (target) task to reduce the exploration risk of the latter while solving the two tasks in parallel. Unlike previous work, we do not assume the low-tier and high-tier tasks share the same dynamics or reward functions, and focus on robust knowledge transfer without prior knowledge on the task similarity. We identify a natural and necessary condition called the ``Optimal Value Dominance'' for our objective. Under this condition, we propose novel online learning algorithms such that, for the high-tier task, it can achieve constant regret on partial states depending on the task similarity and retain near-optimal regret when the two tasks are dissimilar, while for the low-tier task, it can keep near-optimal without making sacrifice. Moreover, we further study the setting with multiple low-tier tasks, and propose a novel transfer source selection mechanism, which can ensemble the information from all low-tier tasks and allow provable benefits on a much larger state-action space.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2302.05534

Country: North America > United States (0.27)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Machine LearningSep-14-2023

Kernel Conditional Moment Constraints for Confounding Robust Inference

Ishikawa, Kei, He, Niao

We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.

artificial intelligence, constraint, machine learning, (16 more...)

2302.13348

Country:

North America > United States (0.14)
Europe (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningAug-30-2023

Cancellation-Free Regret Bounds for Lagrangian Approaches in Constrained Markov Decision Processes

Müller, Adrian, Alatur, Pragnya, Ramponi, Giorgia, He, Niao

Constrained Markov Decision Processes (CMDPs) are one of the common ways to model safe reinforcement learning problems, where constraint functions model the safety objectives. Lagrangian-based dual or primal-dual algorithms provide efficient methods for learning in CMDPs. For these algorithms, the currently known regret bounds in the finite-horizon setting allow for a "cancellation of errors"; one can compensate for a constraint violation in one episode with a strict constraint satisfaction in another. However, we do not consider such a behavior safe in practical applications. In this paper, we overcome this weakness by proposing a novel model-based dual algorithm OptAug-CMDP for tabular finite-horizon CMDPs. Our algorithm is motivated by the augmented Lagrangian method and can be performed efficiently. We show that during $K$ episodes of exploring the CMDP, our algorithm obtains a regret of $\tilde{O}(\sqrt{K})$ for both the objective and the constraint violation. Unlike existing Lagrangian approaches, our algorithm achieves this regret without the need for the cancellation of errors.

algorithm, artificial intelligence, machine learning, (17 more...)

2306.07001

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

arXiv.org Artificial IntelligenceJul-19-2023

Finite-Time Analysis of Natural Actor-Critic for POMDPs

Cayci, Semih, He, Niao, Srikant, R.

We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces, where the controller has access to only noisy observations of the underlying controlled Markov chain. We consider a natural actor-critic method that employs a finite internal memory for policy parameterization, and a multi-step temporal difference learning algorithm for policy evaluation. We establish, to the best of our knowledge, the first non-asymptotic global convergence of actor-critic methods for partially observed systems under function approximation. In particular, in addition to the function approximation and statistical errors that also arise in MDPs, we explicitly characterize the error due to the use of finite-state controllers. This additional error is stated in terms of the total variation distance between the traditional belief state in POMDPs and the posterior distribution of the hidden state when using a finite-state controller. Further, we show that this error can be made small in the case of sliding-block controllers by using larger block sizes.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2202.09753

Country:

North America > United States > Illinois (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceJun-26-2023

On Imitation in Mean-field Games

Ramponi, Giorgia, Kolev, Pavel, Pietquin, Olivier, He, Niao, Laurière, Mathieu, Geist, Matthieu

Imitation learning (IL) is a popular framework involving an apprentice agent who learns to imitate the behavior of an expert agent by observing its actions and transitions. In the context of mean-field games (MFGs), IL is used to learn a policy that imitates the behavior of a population of infinitely-many expert agents that are following a Nash equilibrium policy, according to some unknown payoff function. Mean-field games are an approximation introduced to simplify the analysis of games with a large (but finite) number of identical players, where we can look at the interaction between a representative infinitesimal player and a term capturing the population's behavior. The MFG framework enables to scale to an infinite number of agents, where both the reward and the transition are population-dependent. The aim is to learn effective policies that can effectively learn and imitate the behavior of a large population of agents, which is a crucial problem in many real-world applications, such as traffic management [12, 30, 31], crowd control [11, 1], and financial markets [6, 5].

artificial intelligence, machine learning, reinforcement learning, (18 more...)

2306.14799

Country: Europe (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

arXiv.org Artificial IntelligenceJun-25-2023

Provably Convergent Policy Optimization via Metric-aware Trust Region Methods

Song, Jun, He, Niao, Ding, Lijun, Zhao, Chaoyue

Trust-region methods based on Kullback-Leibler divergence are pervasively used to stabilize policy optimization in reinforcement learning. In this paper, we exploit more flexible metrics and examine two natural extensions of policy optimization with Wasserstein and Sinkhorn trust regions, namely Wasserstein policy optimization (WPO) and Sinkhorn policy optimization (SPO). Instead of restricting the policy to a parametric distribution class, we directly optimize the policy distribution and derive their closed-form policy updates based on the Lagrangian duality. Theoretically, we show that WPO guarantees a monotonic performance improvement, and SPO provably converges to WPO as the entropic regularizer diminishes. Moreover, we prove that with a decaying Lagrangian multiplier to the trust region constraint, both methods converge to global optimality. Experiments across tabular domains, robotic locomotion, and continuous control tasks further demonstrate the performance improvement of both approaches, more robustness of WPO to sample insufficiency, and faster convergence of SPO, over state-of-art policy gradient methods.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2306.14133

Country:

Europe (0.28)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
(2 more...)

arXiv.org Artificial IntelligenceJun-13-2023

Provably Learning Nash Policies in Constrained Markov Potential Games

Alatur, Pragnya, Ramponi, Giorgia, He, Niao, Krause, Andreas

Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents, where each agent optimizes its own objective. In many real-world instances, the agents may not only want to optimize their objectives, but also ensure safe behavior. For example, in traffic routing, each car (agent) aims to reach its destination quickly (objective) while avoiding collisions (safety). Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable. In this work, we introduce and study Constrained Markov Potential Games (CMPGs), an important class of CMGs. We first show that a Nash policy for CMPGs can be found via constrained optimization. One tempting approach is to solve it by Lagrangian-based primal-dual methods. As we show, in contrast to the single-agent setting, however, CMPGs do not satisfy strong duality, rendering such approaches inapplicable and potentially unsafe. To solve the CMPG problem, we propose our algorithm Coordinate-Ascent for CMPGs (CA-CMPG), which provably converges to a Nash policy in tabular, finite-horizon CMPGs. Furthermore, we provide the first sample complexity bounds for learning Nash policies in unknown CMPGs, and, which under additional assumptions, guarantee safe exploration.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

2306.07749

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)