Injecting Domain Knowledge from Empirical Interatomic Potentials to Neural Networks for Predicting Material Properties
A.1 Periodic Boundary Conditions Under periodic boundary conditions (PBCs), the positions of atoms outside the simulation cell are obtained by generating periodic images of those within the cell through translations commensurate with its periodicity. This methodology is capable of modeling infinite systems because the interactions between atoms separated by more than a modest cutoff distance are very small and thus ignored when defining empirical models. This limited range of interaction gives rise to the concept of an atomic environment. The environment of a given atom consists of itself and all other atoms, including periodic images, that fall within a prescribed cutoff distance of it. The consequence of this locality is that an infinite system can be modeled exactly using a finite periodic cell so long as a sufficient number of periodic images surrounding it are explicitly accounted for. An example of PBCs for a two-dimensional square cell and a local atomic environment is illustrated in Figure 1.
Data-Efficient Pipeline for Offline Reinforcement Learning with Limited Data
Offline reinforcement learning (RL) can be used to improve future performance by leveraging historical data. There exist many different algorithms for offline RL, and it is well recognized that these algorithms, and their hyperparameter settings, can lead to decision policies with substantially differing performance. This prompts the need for pipelines that allow practitioners to systematically perform algorithmhyperparameter selection for their setting. Critically, in most real-world settings, this pipeline must only involve the use of historical data. Inspired by statistical model selection methods for supervised learning, we introduce a task-and methodagnostic pipeline for automatically training, comparing, selecting, and deploying the best policy when the provided dataset is limited in size.
HHD-GP: Incorporating Helmholtz-Hodge Decomposition into Gaussian Processes for Learning Dynamical Systems Hao Xu1,2 The University of Hong Kong, Hong Kong, China
Machine learning models provide alternatives for efficiently recognizing complex patterns from data, but the main concern in applying them to modeling physical systems stems from their physics-agnostic design, leading to learning methods that lack interpretability, robustness, and data efficiency. This paper mitigates this concern by incorporating the Helmholtz-Hodge decomposition into a Gaussian process model, leading to a versatile framework that simultaneously learns the curl-free and divergence-free components of a dynamical system.
Jinliang Deng 1,2 Feiyang Ye3 Du Yin 4 Xuan Song
Long-term time series forecasting (LTSF) represents a critical frontier in time series analysis, characterized by extensive input sequences, as opposed to the shorter spans typical of traditional approaches. While longer sequences inherently offer richer information for enhanced predictive precision, prevailing studies often respond by escalating model complexity. These intricate models can inflate into millions of parameters, resulting in prohibitive parameter scales. Our study demonstrates, through both analytical and empirical evidence, that decomposition is key to containing excessive model inflation while achieving uniformly superior and robust results across various datasets. Remarkably, by tailoring decomposition to the intrinsic dynamics of time series data, our proposed model outperforms existing benchmarks, using over 99% fewer parameters than the majority of competing methods. Through this work, we aim to unleash the power of a restricted set of parameters by capitalizing on domain characteristics--a timely reminder that in the realm of LTSF, bigger is not invariably better. The code is available at https://github.com/JLDeng/SSCNN.
Forget About the LiDAR: Self-Supervised Depth Estimators with MED Probability Volumes
Self-supervised depth estimators have recently shown results comparable to the supervised methods on the challenging single image depth estimation (SIDE) task, by exploiting the geometrical relations between target and reference views in the training data. However, previous methods usually learn forward or backward image synthesis, but not depth estimation, as they cannot effectively neglect occlusions between the target and the reference images. Previous works rely on rigid photometric assumptions or on the SIDE network to infer depth and occlusions, resulting in limited performance. On the other hand, we propose a method to "Forget About the LiDAR" (FAL), with Mirrored Exponential Disparity (MED) probability volumes for the training of monocular depth estimators from stereo images. Our MED representation allows us to obtain geometrically inspired occlusion maps with our novel Mirrored Occlusion Module (MOM), which does not impose a learning burden on our FAL-net.
Breaking the curse of dimensionality in structured density estimation
We consider the problem of estimating a structured multivariate density, subject to Markov conditions implied by an undirected graph. In the worst case, without Markovian assumptions, this problem suffers from the curse of dimensionality. Our main result shows how the curse of dimensionality can be avoided or greatly alleviated under the Markov property, and applies to arbitrary graphs. While existing results along these lines focus on sparsity or manifold assumptions, we introduce a new graphical quantity called "graph resilience" and show how it controls the sample complexity. Surprisingly, although one might expect the sample complexity of this problem to scale with local graph parameters such as the degree, this turns out not to be the case. Through explicit examples, we compute uniform deviation bounds and illustrate how the curse of dimensionality in density estimation can thus be circumvented. Notable examples where the rate improves substantially include sequential, hierarchical, and spatial data.
Learning Versatile Skills with Curriculum Masking Yao Tang 1 Zichuan Lin 2 Deheng Ye2
Masked prediction has emerged as a promising pretraining paradigm in offline reinforcement learning (RL) due to its versatile masking schemes, enabling flexible inference across various downstream tasks with a unified model. Despite the versatility of masked prediction, it remains unclear how to balance the learning of skills at different levels of complexity. To address this, we propose CurrMask, a curriculum masking pretraining paradigm for sequential decision making. Motivated by how humans learn by organizing knowledge in a curriculum, CurrMask adjusts its masking scheme during pretraining for learning versatile skills. Through extensive experiments, we show that CurrMask exhibits superior zero-shot performance on skill prompting tasks, goal-conditioned planning tasks, and competitive finetuning performance on offline RL tasks. Additionally, our analysis of training dynamics reveals that CurrMask gradually acquires skills of varying complexity by dynamically adjusting its masking scheme. Code is available at here.
Faster Neighborhood Attention: Reducing the O(n) Cost of Self Attention at the Threadblock Level Ali Hassani 1, Wen-mei Hwu
Neighborhood attention reduces the cost of self attention by restricting each token's attention span to its nearest neighbors. This restriction, parameterized by a window size and dilation factor, draws a spectrum of possible attention patterns between linear projection and self attention. Neighborhood attention, and more generally sliding window attention patterns, have long been bounded by infrastructure, particularly in higher-rank spaces (2-D and 3-D), calling for the development of custom kernels, which have been limited in either functionality, or performance, if not both. In this work, we aim to massively improve upon existing infrastructure by providing two new methods for implementing neighborhood attention. We first show that neighborhood attention can be represented as a batched GEMM problem, similar to standard attention, and implement it for 1-D and 2-D neighborhood attention.
Supplementary Material Estimation of Conditional Moment Models Contents
The most prevalent approach for estimating endogenous regression models with instruments is assuming low-dimensional linear relationships, i.e. h The coefficient in the final regression is taken to be the estimate of . Then a 2SLS estimation method is applied on these transformed feature spaces. The authors show asymptotic consistency of the resulting estimator, assuming that the approximation error goes to zero. Subsequently, they also estimate the function m(z) =E[y h(x) | z] based on another growing sieve. Though it may seem at first that the approach in that paper and ours are quite distinct, the population limit of our objective function coincides with theirs. To see this, consider the simplified version of our estimator presented in (6), where the function classes are already norm-constrained and no norm based regularization is imposed. Moreover, for a moment consider the population version of this estimator, i.e. min max (h, f) kfk Thus in the population limit and without norm regularization on the test function f, our criterion is equivalent to the minimum distance criterion analyzed in Chen and Pouzo [2012]. Another point of similarity is that we prove convergence of the estimator in terms of the pseudo-metric, the projected MSE defined in Section 4 of Chen and Pouzo [2012] - and like that paper we require additional conditions to relate the pseudo-metric to the true MSE. The present paper differs in a number of ways: (i) the finite sample criterion is different; (ii) we prove our results using localized Rademacher analysis which allows for weaker assumptions; (iii) we consider a broader range of estimation approaches than linear sieves, necessitating more of a focus on optimization. Digging into the second point, Chen and Pouzo [2012] take a more traditional parameter recovery approach which requires several minimum eigenvalue conditions and several regularity conditions to be satisfied for their estimation rate to hold (see e.g. This is analogous to a mean squared error proof in an exogenous linear regression setting, that requires the minimum eigenvalue of the feature co-variance to be bounded away from zero. Moreover, such parameter recovery methods seem limited to the growing sieve approach, since only then one has a clear finite dimensional parameter vector to work on for each fixed n.