Plotting

Generalization Analysis on Learning with a Concurrent Verifier

Neural Information Processing Systems

Machine learning technologies have been used in a wide range of practical systems. In practical situations, it is natural to expect the input-output pairs of a machine learning model to satisfy some requirements. However, it is difficult to obtain a model that satisfies requirements by just learning from examples. A simple solution is to add a module that checks whether the input-output pairs meet the requirements and then modifies the model's outputs. Such a module, which we call a concurrent verifier (CV), can give a certification, although how the generalizability of the machine learning model changes using a CV is unclear. This paper gives a generalization analysis of learning with a CV. We analyze how the learnability of a machine learning model changes with a CV and show a condition where we can obtain a guaranteed hypothesis using a verifier only in the inference time. We also show that typical error bounds based on Rademacher complexity will be no larger than that of the original model when using a CV in multi-class classification and structured prediction settings.


Invariant subspaces and PCA in nearly matrix multiplication time

Neural Information Processing Systems

Approximating invariant subspaces of generalized eigenvalue problems (GEPs) is a fundamental computational problem at the core of machine learning and scientific computing. It is, for example, the root of Principal Component Analysis (PCA) for dimensionality reduction, data visualization, and noise filtering, and of Density Functional Theory (DFT), arguably the most popular method to calculate the electronic structure of materials.





2c53bc01e30711a08f6ac86919193022-Supplemental-Conference.pdf

Neural Information Processing Systems

Assumption 2. The following conditions are assumed throughout: To do so, let's assume Note that the exploratory state dynamics in (34) is governed by a general Itรด process. Theorem 6. Assume that for a policy ฯ€ and for every x, ฯƒ Assumption 3. Assume the following conditions hold: By [56, Section 3.1], the HJ equation in (40) has a unique (subquadratic) viscosity solution Lemma 9. Let ฯ€, ห†ฯ€ be two feedback policies. Proof of Theorem 2. Note that in (13), the equation to be proven, the right hand side can be written as d Thus, we have shown LHS=RHS in (13). We need a lemma for the perturbation bounds. If we also assume that the upper bounds, i.e. ฯƒ By Grรถnwall's inequality, we have This correction will not affect the two numerical examples as both had set ฮฒ = 1 (as a hyper-parameter).


Policy Optimization for Continuous Reinforcement Learning

Neural Information Processing Systems

We study reinforcement learning (RL) in the setting of continuous time and space, for an infinite horizon with a discounted objective and the underlying dynamics driven by a stochastic differential equation. Built upon recent advances in the continuous approach to RL, we develop a notion of occupation time (specifically for a discounted objective), and show how it can be effectively used to derive performance-difference and local-approximation formulas. We further extend these results to illustrate their applications in the PG (policy gradient) and TRPO/PPO (trust region policy optimization/ proximal policy optimization) methods, which have been familiar and powerful tools in the discrete RL setting but under-developed in continuous RL. Through numerical experiments, we demonstrate the effectiveness and advantages of our approach.


Image Reconstruction Via Autoencoding Sequential Deep Image Prior

Neural Information Processing Systems

Recently, Deep Image Prior (DIP) has emerged as an effective unsupervised oneshot learner, delivering competitive results across various image recovery problems. This method only requires the noisy measurements and a forward operator, relying solely on deep networks initialized with random noise to learn and restore the structure of the data. However, DIP is notorious for its vulnerability to overfitting due to the overparameterization of the network. Building upon insights into the impact of the DIP input and drawing inspiration from the gradual denoising process in cutting-edge diffusion models, we introduce Autoencoding Sequential DIP (aSeqDIP) for image reconstruction.



Policy Learning from Tutorial Books via Understanding, Rehearsing and Introspecting

Neural Information Processing Systems

When humans need to learn a new skill, we can acquire knowledge through written books, including textbooks, tutorials, etc. However, current research for decisionmaking, like reinforcement learning (RL), has primarily required numerous real interactions with the target environment to learn a skill, while failing to utilize the existing knowledge already summarized in the text.