Goto

Collaborating Authors

 cdot


Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease

Neural Information Processing Systems

We only observe their transformed versions $h(\mathbf{x_s^i})$ and $g(\mathbf{x_t^i})$, for some known function class $h(\cdot)$ and $g(\cdot)$. Our goal is to perform a statistical test checking if $P_{\rm source}$ = $P_{\rm target}$ while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization. On a dataset of individuals at risk for neurological disease, our results are competitive with alternative procedures that are twice as expensive and in some cases operationally infeasible to implement.



Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

This paper studies first order methods for solving smooth minimax optimization problems $\min_x \max_y g(x,y)$ where $g(\cdot,\cdot)$ is smooth and $g(x,\cdot)$ is concave for each $x$. In terms of $g(\cdot,y)$, we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex $g(\cdot, y),\ \forall y$, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in $\widetilde{O}\left(1/k^2 \right)$ iterations, improving over current state-of-the-art rate of $O(1/k)$. We use this result along with an inexact proximal point method to provide $\widetilde{O}\left(1/k^{1/3} \right)$ rate for finding stationary points in the nonconvex setting where $g(\cdot, y)$ can be nonconvex. This improves over current best-known rate of $O(1/k^{1/5})$.



A Single Recipe for Online Submodular Maximization with Adversarial or Stochastic Constraints

Neural Information Processing Systems

In this paper, we consider an online optimization problem in which the reward functions are DR-submodular, and in addition to maximizing the total reward, the sequence of decisions must satisfy some convex constraints on average. Specifically, at each round $t\in\{1,\dots,T\}$, upon committing to an action $x_t$, a DR-submodular utility function $f_t(\cdot)$ and a convex constraint function $g_t(\cdot)$ are revealed, and the goal is to maximize the overall utility while ensuring the average of the constraint functions $\frac{1}{T}\sum_{t=1}^T g_t(x_t)$ is non-positive. Such cumulative constraints arise naturally in applications where the average resource consumption is required to remain below a prespecified threshold. We study this problem under an adversarial model and a stochastic model for the convex constraints, where the functions $g_t$ can vary arbitrarily or according to an i.i.d.


Class-Disentanglement and Applications in Adversarial Detection and Defense

Neural Information Processing Systems

What is the minimum necessary information required by a neural net $D(\cdot)$ from an image $x$ to accurately predict its class? Extracting such information in the input space from $x$ can allocate the areas $D(\cdot)$ mainly attending to and shed novel insights to the detection and defense of adversarial attacks. In this paper, we propose ''class-disentanglement'' that trains a variational autoencoder $G(\cdot)$ to extract this class-dependent information as $x - G(x)$ via a trade-off between reconstructing $x$ by $G(x)$ and classifying $x$ by $D(x-G(x))$, where the former competes with the latter in decomposing $x$ so the latter retains only necessary information for classification in $x-G(x)$. We apply it to both clean images and their adversarial images and discover that the perturbations generated by adversarial attacks mainly lie in the class-dependent part $x-G(x)$. The decomposition results also provide novel interpretations to classification and attack models. Inspired by these observations, we propose to conduct adversarial detection and adversarial defense respectively on $x - G(x)$ and $G(x)$, which consistently outperform the results on the original $x$. In experiments, this simple approach substantially improves the detection and defense against different types of adversarial attacks.


Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease

Neural Information Processing Systems

We only observe their transformed versions $h(\mathbf{x_s^i})$ and $g(\mathbf{x_t^i})$, for some known function class $h(\cdot)$ and $g(\cdot)$. Our goal is to perform a statistical test checking if $P_{\rm source}$ = $P_{\rm target}$ while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization. On a dataset of individuals at risk for neurological disease, our results are competitive with alternative procedures that are twice as expensive and in some cases operationally infeasible to implement.


Efficient Algorithms for Smooth Minimax Optimization

Neural Information Processing Systems

This paper studies first order methods for solving smooth minimax optimization problems \min_x \max_y g(x,y) where g(\cdot,\cdot) is smooth and g(x,\cdot) is concave for each x . In terms of g(\cdot,y), we consider two settings -- strongly convex and nonconvex -- and improve upon the best known rates in both. For strongly-convex g(\cdot, y),\ \forall y, we propose a new direct optimal algorithm combining Mirror-Prox and Nesterov's AGD, and show that it can find global optimum in \widetilde{O}\left(1/k 2 \right) iterations, improving over current state-of-the-art rate of O(1/k) . We use this result along with an inexact proximal point method to provide \widetilde{O}\left(1/k {1/3} \right) rate for finding stationary points in the nonconvex setting where g(\cdot, y) can be nonconvex. This improves over current best-known rate of O(1/k {1/5}) .


Hypothesis Testing in Unsupervised Domain Adaptation with Applications in Alzheimer's Disease

Neural Information Processing Systems

We only observe their transformed versions h(\mathbf{x_s i}) and g(\mathbf{x_t i}), for some known function class h(\cdot) and g(\cdot) . Our goal is to perform a statistical test checking if P_{\rm source} P_{\rm target} while removing the distortions induced by the transformations. This problem is closely related to concepts underlying numerous domain adaptation algorithms, and in our case, is motivated by the need to combine clinical and imaging based biomarkers from multiple sites and/or batches, where this problem is fairly common and an impediment in the conduct of analyses with much larger sample sizes. We develop a framework that addresses this problem using ideas from hypothesis testing on the transformed measurements, where in the distortions need to be estimated {\it in tandem} with the testing. We derive a simple algorithm and study its convergence and consistency properties in detail, and we also provide lower-bound strategies based on recent work in continuous optimization.


Review for NeurIPS paper: Relative gradient optimization of the Jacobian term in unsupervised deep learning

Neural Information Processing Systems

The focus of the work is deep density estimation (also called normalizing flows). Particularly, the authors focus on the generative model x f(s) as defined in (1) where the observation (x) is described as the invertible non-linear function (f) of a latent variable (s). They take a maximum-likelihood perspective (2) where g_{\theta}, the approximation of the inverse of f, is the composition of g_1 \sigma_1(W_1 \cdot), ..., g_L \sigma_L(W_L \cdot) invertible and differentiable component functions. They propose to use the relative gradient method to optimize \theta to speed up computations. Deep density estimation is an important problem in machine learning.