Plotting

Sparse Learning with CART

Neural Information Processing Systems

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART methodology. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal complexity/goodnessof-fit tradeoff when the depth scales with the logarithm of the sample size. Data dependent quantities, which adapt to the dimensionality and latent structure of the regression model, are seen to govern the convergence rates of the prediction error.


Supplementary Material A Distances and divergences for quantifying domain shift 15 A.1 The Wasserstein distance

Neural Information Processing Systems

Besides analyzing the performance drop when evaluating a model using source statistics on a target dataset, we consider the mismatch in model statistics directly. We first take an ImageNet trained model and adapt it to each of the 95 conditions in IN-C.


ODRL: A Benchmark for Off-Dynamics Reinforcement Learning

Neural Information Processing Systems

We consider off-dynamics reinforcement learning (RL) where one needs to transfer policies across different domains with dynamics mismatch. Despite the focus on developing dynamics-aware algorithms, this field is hindered due to the lack of a standard benchmark. To bridge this gap, we introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods. ODRL contains four experimental settings where the source and target domains can be either online or offline, and provides diverse tasks and a broad spectrum of dynamics shifts, making it a reliable platform to comprehensively evaluate the agent's adaptation ability to the target domain. Furthermore, ODRL includes recent off-dynamics RL algorithms in a unified framework and introduces some extra baselines for different settings, all implemented in a single-file manner. To unpack the true adaptation capability of existing methods, we conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts. We hope this benchmark can serve as a cornerstone for future research endeavors.


Fairness constraints can help exact inference in structured prediction

Neural Information Processing Systems

Many inference problems in structured prediction can be modeled as maximizing a score function on a space of labels, where graphs are a natural representation to decompose the total score into a sum of unary (nodes) and pairwise (edges) scores. Given a generative model with an undirected connected graph G and true vector of binary labels y, it has been previously shown that when G has good expansion properties, such as complete graphs or d-regular expanders, one can exactly recover y (with high probability and in polynomial time) from a single noisy observation of each edge and node. We analyze the previously studied generative model by Globerson et al. (2015) under a notion of statistical parity. That is, given a fair binary node labeling, we ask the question whether it is possible to recover the fair assignment, with high probability and in polynomial time, from single edge and node observations. We find that, in contrast to the known trade-offs between fairness and model performance, the addition of the fairness constraint improves the probability of exact recovery. We effectively explain this phenomenon and empirically show how graphs with poor expansion properties, such as grids, are now capable of achieving exact recovery. Finally, as a byproduct of our analysis, we provide a tighter minimum-eigenvalue bound than that which can be derived from Weyl's inequality.


Optical Diffusion Models for Image Generation

Neural Information Processing Systems

Diffusion models generate new samples by progressively decreasing the noise from the initially provided random distribution. This inference procedure generally utilizes a trained neural network numerous times to obtain the final output, creating significant latency and energy consumption on digital electronic hardware such as GPUs. In this study, we demonstrate that the propagation of a light beam through a semi-transparent medium can be programmed to implement a denoising diffusion model on image samples. This framework projects noisy image patterns through passive diffractive optical layers, which collectively only transmit the predicted noise term in the image. The optical transparent layers, which are trained with an online training approach, backpropagating the error to the analytical model of the system, are passive and kept the same across different steps of denoising. Hence this method enables high-speed image generation with minimal power consumption, benefiting from the bandwidth and energy efficiency of optical information processing.


Learning De-Biased Representations for Remote-Sensing Imagery

Neural Information Processing Systems

Remote sensing (RS) imagery, which requires specialized satellites to collect and is difficult to annotate, suffers from data scarcity and class imbalance in certain spectrums. Due to their data scarcity, training large-scale RS models from scratch is unrealistic, and the alternative is to transfer pre-trained models by fine-tuning or a more data-efficient method LoRA [22]. Due to class imbalance, transferred models exhibit strong bias, where features of the major class dominate over those of the minor class. In this paper, we propose debLoRA--a generic training approach that works with any LoRA variants to yield debiased features. It is an unsupervised learning approach that can diversify minor class features based on the shared attributes with major classes, where the attributes are obtained by a simple step of clustering. To evaluate it, we conduct extensive experiments in two transfer learning scenarios in the RS domain: from natural to optical RS images, and from optical RS to multi-spectrum RS images. We perform object classification and oriented object detection tasks on the optical RS dataset DOTA and the SAR dataset FUSRS. Results show that our debLoRA consistently surpasses prior arts across these RS adaptation settings, yielding up to 3.3 and 4.7 percentage points gains on the tail classes for natural optical RS and optical RS multi-spectrum RS adaptations, respectively, while preserving the performance on head classes, substantiating its efficacy and adaptability


Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D Representations Wen Liu

Neural Information Processing Systems

Implicit neural 3D representation has achieved impressive results in surface or scene reconstruction and novel view synthesis, which typically uses the coordinatebased multi-layer perceptrons (MLPs) to learn a continuous scene representation. However, existing approaches, such as Neural Radiance Field (NeRF) [15], and its variants [16, 26, 29], usually require dense input views (i.e.



Functionally Constrained Algorithm Solves Convex Simple Bilevel Problems Lesi Chen

Neural Information Processing Systems

This paper studies simple bilevel problems, where a convex upper-level function is minimized over the optimal solutions of a convex lower-level problem. We first show the fundamental difficulty of simple bilevel problems, that the approximate optimal value of such problems is not obtainable by first-order zero-respecting algorithms. Then we follow recent works to pursue the weak approximate solutions. For this goal, we propose a novel method by reformulating them into functionally constrained problems. Our method achieves near-optimal rates for both smooth and nonsmooth problems. To the best of our knowledge, this is the first near-optimal algorithm that works under standard assumptions of smoothness or Lipschitz continuity for the objective functions.


SCOP: Scientific Control for Reliable Neural Network Pruning (Supplementary Material)

Neural Information Processing Systems

From top to bottom are input images, features in shallow layers and features in the deep layers. Through standard Schur complement calculation, the semi-definite condition can be derived, i.e., cov[b We train a generative adversarial network [2] to construct knockoff data. The knockoff data are generated by the generator and then sent to the discriminator to verify whether the knockoff condition (Definition 1) holds. The generator and discriminator are optimized alternately and the loss function provided by [4] is adopted. Considering that images are high dimension data, we take multiple pixels as a whole when exchanging elements between real data and knockoff data.