Goto

Collaborating Authors

 strength


Score-Repellent Monte Carlo: Toward Efficient Non-Markovian Sampler with Constant Memory in General State Spaces

arXiv.org Machine Learning

History-dependent sampling can reduce long-run Monte Carlo variance by discouraging redundant revisits, but existing schemes typically encode history through empirical measure on finite state spaces, which is infeasible in high-dimensional discrete configuration spaces or ill-posed in continuous domains. We propose Score-Repellent Monte Carlo (SRMC) framework that summarizes trajectory history by a running average of score evaluations in $\mathbb{R}^d$, where $d$ is the dimension of the score and state representation. This history is converted into a surrogate target through an exponential score tilt, indexed with $α$ that represents the strength of repellence in controlling the magnitude of the history-based repulsion. The surrogate family is normalization-free in the standard MCMC sense, yielding a generic wrapper: at each iteration, any base kernel targeting $π$ can instead be run on the current surrogate $π_{θ_n}$ while the history is updated online. We analyze the coupled evolution of the history recursion and Monte Carlo estimators using stochastic approximation with controlled Markovian noise, establishing almost sure convergence and a joint central limit theorem. We further identify regimes in which the asymptotic covariance decreases as $α$ increases, with scaling $O(1/α)$, extending the near-zero-variance effect of finite-state history-dependent samplers to general state spaces with constant memory. Experiments on continuous targets and discrete energy-based models demonstrate improved estimator variance and mode coverage, while retaining $O(d)$ memory usage and modest per-iteration overhead.


00482b9bed15a272730fcb590ffebddd-Supplemental.pdf

Neural Information Processing Systems

A.1 Training dataset pre-processing We used 40000publicly available videos from YouTube which were available in a spatial resolution of at least 1920 1080 pixels. In an attempt not to skew the distribution of content too far from what may inform biological representation learning, we excluded most artificial content such as screenshots and videos of computer games. To reduce video compression artifacts and prevent systematic downsampling artifacts, each segment was then spatially downsampled to a randomized height between 128 and 160. Each segment was then separated into 15 pairs of neighboring frames, and a randomly placed, but spatially colocated patch of 64 64 pixels was cropped out of each frame pair. The order of the frame pairs was then randomized in a running buffer, and all RGB pixel values were normalized to the range between 0 and 1 before being fed into the model.


Provable Benefit of Cutout and CutMix for Feature Learning

Neural Information Processing Systems

Patch-level data augmentation techniques such as Cutout and CutMix have demonstrated significant efficacy in enhancing the performance of vision tasks. However, a comprehensive theoretical understanding of these methods remains elusive. In this paper, we study two-layer neural networks trained using three distinct methods: vanilla training without augmentation, Cutout training, and CutMix training. Our analysis focuses on a feature-noise data model, which consists of several label-dependent features of varying rarity and label-independent noises of differing strengths. Our theorems demonstrate that Cutout training can learn low-frequency features that vanilla training cannot, while CutMix training can learn even rarer features that Cutout cannot capture. From this, we establish that CutMix yields the highest test accuracy among the three. Our novel analysis reveals that CutMix training makes the network learn all features and noise vectors evenly regardless of the rarity and strength, which provides an interesting insight into understanding patch-level augmentation.





Appendix 1 Proof of Lemma 3.1

Neural Information Processing Systems

If hhas the form (1), we have h = g f . Hence, we obtain h h = g2 f (1 f). Since 0 f <1 and f >0, we have h h <0. Now, from the other direction, suppose we know h h <0. To prove that hmust have the form (1), we first show that h(0)h() >0 ( 0), (2) namely, h() always keeps the same sign with h(0).


Self-Adaptable Point Processes with Nonparametric Time Decays

Neural Information Processing Systems

Many applications involve multi-type event data. Understanding the complex influences of the events on each other is critical to discover useful knowledge and to predict future events and their types. Existing methods either ignore or partially account for these influences. Recent works use recurrent neural networks to model the event rate. While being highly expressive, they couple all the temporal dependencies in a black-box and can hardly extract meaningful knowledge. More important, most methods assume an exponential time decay of the influence strength, which is over-simplified and can miss many important strength varying patterns.


AUnified Game-Theoretic Interpretation of Adversarial Robustness: Supplementary Material

Neural Information Processing Systems

In this section, in order to help readers understand the metric in the paper, we first revisit the definition of the Shapley value [14], which is widely considered as an unbiased estimation of the numerical importance w.r.t. each input variable. In game theory, the complex system is usually represented as a game, where each input variable is taken as a player, and the output of this system is regarded as the total reward of all players. Given a game with multiple players (input variables) N = {1,2,,n}, some players cooperate to pursue a high reward. Thus, the task is to divide the total reward, and fairly assign the divided elementary reward to each individual player. In this way, the elementary reward can be considered as the numerical importance of the corresponding variable to the complex system. Let 2N def= {S|S N}indicate all potential subsets of N. The game v: 2N R is a function, which estimates the overall reward v(S) earned by each specific subset of players S N. In this way, the Shapley value, denoted by φ(i), represents the numerical importance of the player ito the game v. φ(i) = X Using Shapley values to explain DNNs.