9bdb8b1faffa4b3d41779bb495d79fb9-AuthorFeedback.pdf

Neural Information Processing Systems

Thank you very much for the thorough and generally positive feedback. R1.1 However, I worry about the reproducibility since most of the results are run by only once. A1.1 Upon acceptance we will publish the source code, implemented in Tensorflow, that was also submitted with this R2.1 For the equation between line 135 and 136(why does it not have a equation number?): A2.1 We will add an equation number. The experiments stops on L=20.



Deep ReLU Networks Have Surprisingly Few Activation Patterns

Neural Information Processing Systems

The success of deep networks has been attributed in part to their expressivity: per parameter, deep networks can approximate a richer class of functions than shallow networks. In ReLU networks, the number of activation patterns is one measure of expressivity; and the maximum number of patterns grows exponentially with the depth. However, recent work has showed that the practical expressivity of deep networks - the functions they can learn rather than express - is often far from the theoretical maximum. In this paper, we show that the average number of activation patterns for ReLU networks at initialization is bounded by the total number of neurons raised to the input dimension. We show empirically that this bound, which is independent of the depth, is tight both at initialization and during training, even on memorization tasks that should maximize the number of activation patterns. Our work suggests that realizing the full expressivity of deep networks may not be possible in practice, at least with current methods.


Unlearnable 3D Point Clouds: Class-wise Transformation Is All You Need Xianlong Wang

Neural Information Processing Systems

Traditional unlearnable strategies have been proposed to prevent unauthorized users from training on the 2D image data. With more 3D point cloud data containing sensitivity information, unauthorized usage of this new type data has also become a serious concern. To address this, we propose the first integral unlearnable framework for 3D point clouds including two processes: (i) we propose an unlearnable data protection scheme, involving a class-wise setting established by a categoryadaptive allocation strategy and multi-transformations assigned to samples; (ii) we propose a data restoration scheme that utilizes class-wise inverse matrix transformation, thus enabling authorized-only training for unlearnable data. This restoration process is a practical issue overlooked in most existing unlearnable literature, i.e., even authorized users struggle to gain knowledge from 3D unlearnable data. Both theoretical and empirical results (including 6 datasets, 16 models, and 2 tasks) demonstrate the effectiveness of our proposed unlearnable framework. Our code is available at https://github.com/CGCL-codes/UnlearnablePC.


On the Efficiency of ERM in Feature Learning

Neural Information Processing Systems

Given a collection of feature maps indexed by a set T, we study the performance of empirical risk minimization (ERM) on regression problems with square loss over the union of the linear classes induced by these feature maps. This setup aims at capturing the simplest instance of feature learning, where the model is expected to jointly learn from the data an appropriate feature map and a linear predictor. We start by studying the asymptotic quantiles of the excess risk of sequences of empirical risk minimizers. Remarkably, we show that when the set T is not too large and when there is a unique optimal feature map, these quantiles coincide, up to a factor of two, with those of the excess risk of the oracle procedure, which knows a priori this optimal feature map and deterministically outputs an empirical risk minimizer from the associated optimal linear class. We complement this asymptotic result with a non-asymptotic analysis that quantifies the decaying effect of the global complexity of the set T on the excess risk of ERM, and relates it to the size of the sublevel sets of the suboptimality of the feature maps. As an application of our results, we obtain new guarantees on the performance of the best subset selection procedure in sparse linear regression under general assumptions.


8c66bb19847dd8c21413c5c8c9d68306-AuthorFeedback.pdf

Neural Information Processing Systems

Please see below for our response. Reviewer 2: Regarding the baseline methods in the experiments: Original LiNGAM assumes that there is no confounder. So the issue is that it is not clear how to compare its result with the groundtruth graph (with confounders). For clustering methods, clearly, different assumptions for clustering will lead to different clustering results. Reviewer 3: 1. Regarding "the setting is limited": We totally agree, and at the same time would like to mention Theorem 1 and Proposition 2 ensure the correctness of Phase 2 of our method (Algorithm 2).


Statistical-Computational Trade-offs for Density Estimation

Neural Information Processing Systems

Recently [1] gave the first and only known result that achieves sublinear bounds in both the sampling complexity and the query time while preserving polynomial data structure space. However, their improvement over linear samples and time is only by subpolynomial factors. Our main result is a lower bound showing that, for a broad class of data structures, their bounds cannot be significantly improved.


Bayesian Batch Active Learning as Sparse Subset Approximation

Neural Information Processing Systems

Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.


A Image Classification

Neural Information Processing Systems

To verify the effectiveness of PABEE on Computer Vision, we follow the experimental settings in Shallow-Deep [5], we conduct experiments on two image classification datasets, CIFAR-10 and CIFAR-100 [55]. We use ResNet-56 [10] as the backbone and compare PABEE with BranchyNet [26] and Shallow-Deep [5]. After every two convolutional layers, an internal classifier is added. We set the batch size to 128 and use SGD optimizer with learning rate of 0.1. Table 6: Experimental results (median of 5 runs) of ResNet based models on CIFAR-10 and CIFAR-100 datasets.


Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates

Neural Information Processing Systems

We consider the core reinforcement-learning problem of on-policy value function approximation from a batch of trajectory data, and focus on various issues of Temporal Difference (TD) learning and Monte Carlo (MC) policy evaluation. The two methods are known to achieve complementary bias-variance trade-off properties, with TD tending to achieve lower variance but potentially higher bias. In this paper, we argue that the larger bias of TD can be a result of the amplification of local approximation errors. We address this by proposing an algorithm that adaptively switches between TD and MC in each state, thus mitigating the propagation of errors. Our method is based on learned confidence intervals that detect biases of TD estimates. We demonstrate in a variety of policy evaluation tasks that this simple adaptive algorithm performs competitively with the best approach in hindsight, suggesting that learned confidence intervals are a powerful technique for adapting policy evaluation to use TD or MC returns in a data-driven way.