Goto

Collaborating Authors

 mvi



General Munchausen Reinforcement Learning with Tsallis Kullback-Leibler Divergence

Neural Information Processing Systems

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence---called the Tsallis KL divergence. Tsallis KL defined by the $q$-logarithm is a strict generalization, as $q = 1$ corresponds to the standard KL divergence; $q > 1$ provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when $q > 1$ could be beneficial. To obtain a practical algorithm that incorporates Tsallis KL regularization, we extend MVI, which is one of the simplest approaches to incorporate KL regularization. We show that this generalized MVI($q$) obtains significant improvements over the standard MVI($q = 1$) across 35 Atari games.



General Munchausen Reinforcement Learning with Tsallis Kullback-Leibler Divergence

Neural Information Processing Systems

Many policy optimization approaches in reinforcement learning incorporate a Kullback-Leilbler (KL) divergence to the previous policy, to prevent the policy from changing too quickly. This idea was initially proposed in a seminal paper on Conservative Policy Iteration, with approximations given by algorithms like TRPO and Munchausen Value Iteration (MVI). We continue this line of work by investigating a generalized KL divergence---called the Tsallis KL divergence. Tsallis KL defined by the q -logarithm is a strict generalization, as q 1 corresponds to the standard KL divergence; q 1 provides a range of new options. We characterize the types of policies learned under the Tsallis KL, and motivate when q 1 could be beneficial.


Automated Assessment of Residual Plots with Computer Vision Models

arXiv.org Machine Learning

Plotting the residuals is a recommended procedure to diagnose deviations from linear model assumptions, such as non-linearity, heteroscedasticity, and non-normality. The presence of structure in residual plots can be tested using the lineup protocol to do visual inference. There are a variety of conventional residual tests, but the lineup protocol, used as a statistical test, performs better for diagnostic purposes because it is less sensitive and applies more broadly to different types of departures. However, the lineup protocol relies on human judgment which limits its scalability. This work presents a solution by providing a computer vision model to automate the assessment of residual plots. It is trained to predict a distance measure that quantifies the disparity between the residual distribution of a fitted classical normal linear regression model and the reference distribution, based on Kullback-Leibler divergence. From extensive simulation studies, the computer vision model exhibits lower sensitivity than conventional tests but higher sensitivity than human visual tests. It is slightly less effective on non-linearity patterns. Several examples from classical papers and contemporary data illustrate the new procedures, highlighting its usefulness in automating the diagnostic process and supplementing existing methods.


How Many Data Samples is an Additional Instruction Worth?

arXiv.org Artificial Intelligence

Recently introduced instruction-paradigm empowers non-expert users to leverage NLP resources by defining a new task in natural language. Instruction-tuned models have significantly outperformed multitask learning models (without instruction); however they are far from state-of-the-art task-specific models. Conventional approaches to improve model performance via creating datasets with large number of task instances or architectural changes in the model may not be feasible for non-expert users. However, they can write alternate instructions to represent an instruction task. Is Instruction-augmentation helpful? We augment a subset of tasks in the expanded version of NATURAL INSTRUCTIONS with additional instructions and find that it significantly improves model performance (up to 35%), especially in the low-data regime. Our results indicate that an additional instruction can be equivalent to ~200 data samples on average across tasks.


Efficient Methods for Structured Nonconvex-Nonconcave Min-Max Optimization

arXiv.org Machine Learning

The use of min-max optimization in adversarial training of deep neural network classifiers and training of generative adversarial networks has motivated the study of nonconvex-nonconcave optimization objectives, which frequently arise in these applications. Unfortunately, recent results have established that even approximate first-order stationary points of such objectives are intractable, even under smoothness conditions, motivating the study of min-max objectives with additional structure. We introduce a new class of structured nonconvex-nonconcave min-max optimization problems, proposing a generalization of the extragradient algorithm which provably converges to a stationary point. The algorithm applies not only to Euclidean spaces, but also to general $\ell_p$-normed finite-dimensional real vector spaces. We also discuss its stability under stochastic oracles and provide bounds on its sample complexity. Our iteration complexity and sample complexity bounds either match or improve the best known bounds for the same or less general nonconvex-nonconcave settings, such as those that satisfy variational coherence or in which a weak solution to the associated variational inequality problem is assumed to exist.


Higher-order methods for convex-concave min-max optimization and monotone variational inequalities

arXiv.org Machine Learning

We provide improved convergence rates for constrained convex-concave min-max problems and monotone variational inequalities with higher-order smoothness. In min-max settings where the $p^{th}$-order derivatives are Lipschitz continuous, we give an algorithm HigherOrderMirrorProx that achieves an iteration complexity of $O(1/T^{\frac{p+1}{2}})$ when given access to an oracle for finding a fixed point of a $p^{th}$-order equation. We give analogous rates for the weak monotone variational inequality problem. For $p>2$, our results improve upon the iteration complexity of the first-order Mirror Prox method of Nemirovski [2004] and the second-order method of Monteiro and Svaiter [2012]. We further instantiate our entire algorithm in the unconstrained $p=2$ case.


On the modes of convergence of Stochastic Optimistic Mirror Descent (OMD) for saddle point problems

arXiv.org Machine Learning

In this article, we study the convergence of Mirror Descent (MD) and Optimistic Mirror Descent (OMD) for saddle point problems satisfying the notion of coherence as proposed in Mertikopoulos et al. We prove convergence of OMD with exact gradients for coherent saddle point problems, and show that monotone convergence only occurs after some sufficiently large number of iterations. This is in contrast to the claim in Mertikopoulos et al. of monotone convergence of OMD with exact gradients for coherent saddle point problems. Besides highlighting this important subtlety, we note that the almost sure convergence guarantees of MD and OMD with stochastic gradients for strictly coherent saddle point problems that are claimed in Mertikopoulos et al. are not fully justified by their proof. As such, we fill out the missing details in the proof and as a result have only been able to prove convergence with high probability. We would like to note that our analysis relies heavily on the core ideas and proof techniques introduced in Zhou et al. and Mertikopoulos et al., and we only aim to re-state and correct the results in light of what we were able to prove rigorously while filling in the much needed missing details in their proofs.


Commitment Tracking via the Reactive Event Calculus

AAAI Conferences

Runtime commitment verification is an important, open issue in multiagent research. To address it, we build on Yolum and Singh's formalization of commitment operations, on Chittaro and Montanari's cached event calculus, and on the SCIFF abductive logic programming proof-procedure. We propose a framework consisting of a declarative and compact language to express the domain knowledge, and a reactive and complete procedure to track the status of commitments effectively, producing provably sound and irrevocable answers.