estimation
A Bayesian method for reducing bias in neural representational similarity analysis
In neuroscience, the similarity matrix of neural activity patterns in response to different sensory stimuli or under different cognitive states reflects the structure of neural representational space. Existing methods derive point estimations of neural activity patterns from noisy neural imaging data, and the similarity is calculated from these point estimations. We show that this approach translates structured noise from estimated patterns into spurious bias structure in the resulting similarity matrix, which is especially severe when signal-to-noise ratio is low and experimental conditions cannot be fully randomized in a cognitive task. We propose an alternative Bayesian framework for computing representational similarity in which we treat the covariance structure of neural activity patterns as a hyper-parameter in a generative model of the neural data, and directly estimate this covariance structure from imaging data while marginalizing over the unknown activity patterns. Converting the estimated covariance structure into a correlation matrix offers a much less biased estimate of neural representational similarity.
image modalities proposed by Reviewer 1 is an interesting idea, we will consider for future work
We would like to thank all reviewers for their time and effort writing these valuable reviews. Reviewer 3 mentioned that a performance measure with other recent methods would be beneficial. The code for this paper will be released with the camera-ready version. In the following, we focus on the questions given by Reviewer 2. The presented network does not contain fewer parameters compared to the classical B-spline method for optimization. Furthermore, it is straightforward to extend for the 3D case.
Reverse Transition Kernel: A Flexible Framework to Accelerate Diffusion Inference
To generate data from trained diffusion models, most inference algorithms, such as DDPM [17], DDIM [31], and other variants, rely on discretizing the reverse SDEs or their equivalent ODEs. In this paper, we view such approaches as decomposing the entire denoising diffusion process into several segments, each corresponding to a reverse transition kernel (RTK) sampling subproblem. Specifically, DDPM uses a Gaussian approximation for the RTK, resulting in low per-subproblem complexity but requiring a large number of segments (i.e., subproblems), which is conjectured to be inefficient. To address this, we develop a general RTK framework that enables a more balanced subproblem decomposition, resulting in Õ(1) subproblems, each with strongly log-concave targets. We then propose leveraging two fast sampling algorithms, the Metropolis-Adjusted Langevin Algorithm (MALA) and Underdamped Langevin Dynamics (ULD), for solving these strongly log-concave subproblems. This gives rise to the RTK-MALA and RTK-ULD algorithms for diffusion inference.
3ab6be46e1d6b21d59a3c3a0b9d0f6ef-AuthorFeedback.pdf
R1: "I am not entirely convinced that an amortized explanation model is a reasonable thing. R2: "I would appreciate some clarification about what is gained by learning  and not just reporting Ω directly." We thank R1, R2 and R3 for their insightful feedback. However, Ω can only be computed given ground truth labels. R1: "(The objective) does not attribute importance to features that change how the model goes wrong (...)" R1: "Why is the rank-based method necessary?" R2: "Additionally, can the authors clarify what is being averaged in the definition of the causal objective?" The causal objective is averaged over all N samples in the dataset. Every data point has an Ω. R2: "If the goal is to determine what might happen to our predictions if we change a particular feature slightly Our goal is not to estimate what would happen if a particular feature's value changed, but to provide a causal explanation R2: "Some additional clarity on why the authors are using a KL discrepancy is merited. R3: "Masking one by one; this is essentially equivalent to assuming that feature contributions are additive." We do not define a feature's importance as its additive contribution to the model output, but as it's marginal reduction This subtle change in definition allows us to efficiently compute feature importance one by one. R3: "Replacing a masked value by a point-wise estimation can be very bad, especially when the classifiers output Why would the average value (or, even worse, zero) be meaningful?" We will clarify this point in the next revision. R3: "It would also be interesting to compare the proposed method with causal inference technique for SEMs." Recent work [29] has explored the use of SEMs for model attribution in deep learning. CXPlain can explain any machine-learning model, and (ii) attribution time was considerably slower than CXPlain. R3: "It seems to me that the chosen performance measure may correlate much more with the Granger-causal loss
Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or on the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to "Forget About the LiDAR" (FAL), with Mirrored Exponential Disparity (MED) probability volumes for the training of monocular depth estimators from stereo images. Our MED representation allows us to obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM), which does not impose a learning burden on our FAL-net.
Automatic Outlier Rectification via Optimal Transport
In this paper, we propose a novel conceptual framework to detect outliers using optimal transport with a concave cost function. Conventional outlier detection approaches typically use a two-stage procedure: first, outliers are detected and removed, and then estimation is performed on the cleaned data. However, this approach does not inform outlier removal with the estimation task, leaving room for improvement. To address this limitation, we propose an automatic outlier rectification mechanism that integrates rectification and estimation within a joint optimization framework. We take the first step to utilize the optimal transport distance with a concave cost function to construct a rectification set in the space of probability distributions. Then, we select the best distribution within the rectification set to perform the estimation task. Notably, the concave cost function we introduced in this paper is the key to making our estimator effectively identify the outlier during the optimization process. We demonstrate the effectiveness of our approach over conventional approaches in simulations and empirical analyses for mean estimation, least absolute regression, and the fitting of option implied volatility surfaces.
Mutual Information Regularized Offline Reinforcement Learning
The major challenge of offline RL is the distribution shift that appears when outof-distribution actions are queried, which makes the policy improvement direction biased by extrapolation errors. Most existing methods address this problem by penalizing the policy or value for deviating from the behavior policy during policy improvement or evaluation. In this work, we propose a novel MISA framework to approach offline RL from the perspective of Mutual Information between States and Actions in the dataset by directly constraining the policy improvement direction.
Outlier-Robust Sparse Mean Estimation for Heavy-Tailed Distributions
We study the fundamental task of outlier-robust mean estimation for heavy-tailed distributions in the presence of sparsity. Specifically, given a small number of corrupted samples from a high-dimensional heavy-tailed distribution whose mean µ is guaranteed to be sparse, the goal is to efficiently compute a hypothesis that accurately approximates µ with high probability. Prior work had obtained efficient algorithms for robust sparse mean estimation of light-tailed distributions. In this work, we give the first sample-efficient and polynomial-time robust sparse mean estimator for heavy-tailed distributions under mild moment assumptions. Our algorithm achieves the optimal asymptotic error using a number of samples scaling logarithmically with the ambient dimension. Importantly, the sample complexity of our method is optimal as a function of the failure probability, having an additive log(1/) dependence. Our algorithm leverages the stability-based approach from the algorithmic robust statistics literature, with crucial (and necessary) adaptations required in our setting. Our analysis may be of independent interest, involving the delicate design of a (non-spectral) decomposition for positive semi-definite matrices satisfying certain sparsity properties.
Supplementary Material for Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification A Preliminaries of Conformal Prediction
Here, we provide a basic idea of conformal prediction to help understanding. To this end, we introduce the following example in [1]. To avoid this, Split Conformal Regression (SCR) is introduced which separates the samples for training and computing the residuals. Since the samples are i.i.d., for a new sample with each potential outcome, the corresponding confidence interval satisfies the miscoverage rate 1 α from Theorem A.1 as P[Y (1) Ĉ Subgroup analysis methods with recursive partitioning have been widely studied based on regression trees (RT) [2-5]. In these methods, the subgroups (i.e., leaves in the tree structure) are constructed; the treatment effects are estimated by the corresponding sample mean estimator on the leaf of the given covariates. To represent the non-linearity such as interactions between treatment and covariates [6], a parametric model is integrated into regression trees for subgroup analysis [7].
Supplementary Material: On the Epistemic Limits of Personalized Prediction
In the supplementary material we provide the following information. Appendix B provides mathematically rigorous proofs of the theorems, corollaries, and lemmas in the main paper, Appendix C further discusses the interpretation of minimax bounds, the assumptions made in the proof, and how the bounds in the main paper can be used. Appendix D gives more details for the experiments in the main paper and provides additional numerical on two datasets (HSLS (Ingels et al., 2011) and COMPAS (Angwin et al., 2016)). Code For the code used in all experiments of this paper visit the link: https://github.com/ In this section we provide proofs for all the theorems and corollaries in the main paper. Note that the proofs use assumptions stated in the main text.