average training time
Beyond Unimodal: Generalising Neural Processes for Multimodal Uncertainty Estimation Appendix A Lemma and Proof
For the comprehensiveness of proof, we duplicate Lemma 3.1 here. If we use Lemma A.1 with diagonal covariance matrices for In this section, we outline additional details of the experimental settings including the datasets (Appendix B.1), hyperparameters of the models used (Appendix B.2), metrics (Appendix B.3), and a brief analysis of computational complexity of MGP and MNPs (Appendix B.4). We generated 1,000 synthetic training samples (i.e., Robustness to Noisy Samples Dataset In Section 5.1, we evaluated the models' robustness to The details of each dataset are outlined in Table 1. These datasets lie within a feature space where each feature extraction method can be found in [5]. Table 1: Multimodal datasets used for evaluating robustness to noisy samples.
Visualizing the Local Atomic Environment Features of Machine Learning Interatomic Potential
Shao, Xuqiang, Zhang, Yuqi, Zhang, Di, Gao, Tianxiang, Liu, Xinyuan, Gan, Zhiran, Meng, Fanshun, Li, Hao, Yang, Weijie
This paper addresses the challenges of creating efficient and high-quality datasets for machine learning potential functions. We present a novel approach, termed DV-LAE (Difference Vectors based on Local Atomic Environments), which utilizes the properties of atomic local environments and employs histogram statistics to generate difference vectors. This technique facilitates dataset screening and optimization, effectively minimizing redundancy while maintaining data diversity. We have validated the optimized datasets in high-temperature and high-pressure hydrogen systems as well as the {\alpha}-Fe/H binary system, demonstrating a significant reduction in computational resource usage without compromising prediction accuracy. Additionally, our method has revealed new structures that emerge during simulations but were underrepresented in the initial training datasets. The redundancy in the datasets and the distribution of these new structures can be visually analyzed through the visualization of difference vectors. This approach enhances our understanding of the characteristics of these newly formed structures and their impact on physical processes.
Efficient transformer with reinforced position embedding for language models
Hsiao, Yen-Che, Dutta, Abhishek
In this paper, we propose an efficient transformer architecture that uses reinforced positional embedding to obtain superior performance with half the number of encoder decoder layers. We demonstrate that concatenating positional encoding with trainable token embeddings, normalizing columns in the token embedding matrix, and using the normalized token embedding matrix as the value of the attention layer improve the training and validation loss and the training time in an encoder-decoder Transformer model for a Portuguese-English translation task with 10 epochs or 12 hours of training across 10 trials. Our method, with roughly a threefold parameter reduction compared to the baseline model, yields a mean training loss of 1.21, a mean validation loss of 1.51, and an average training time of 1352.27 Additionally, we evaluated our proposed architecture and the baseline across 14 diverse translation datasets from TensorFlow. The results indicate that our method consistently achieves lower or comparable training and validation losses, suggesting enhanced learning efficiency.
Better by Default: Strong Pre-Tuned MLPs and Boosted Trees on Tabular Data
Holzmüller, David, Grinsztajn, Léo, Steinwart, Ingo
For classification and regression on tabular data, the dominance of gradient-boosted decision trees (GBDTs) has recently been challenged by often much slower deep learning methods with extensive hyperparameter tuning. We address this discrepancy by introducing (a) RealMLP, an improved multilayer perceptron (MLP), and (b) improved default parameters for GBDTs and RealMLP. We tune RealMLP and the default parameters on a meta-train benchmark with 71 classification and 47 regression datasets and compare them to hyperparameter-optimized versions on a disjoint meta-test benchmark with 48 classification and 42 regression datasets, as well as the GBDT-friendly benchmark by Grinsztajn et al. (2022). Our benchmark results show that RealMLP offers a better time-accuracy tradeoff than other neural nets and is competitive with GBDTs. Moreover, a combination of RealMLP and GBDTs with improved default parameters can achieve excellent results on medium-sized tabular datasets (1K--500K samples) without hyperparameter tuning.
a9be4c2a4041cadbf9d61ae16dd1389e-Reviews.html
Recall that in the implementation of the proposed convex method, CVX2, each boosting step (which adds a single rank to the solution) is interleaved with local optimization. For the local optimization we use a standard LBFGS implementation with default termination conditions. For the outer boosting iterations we terminate when the relative objective improvement is less than 5e-5 or the absolute improvement is less than 1e-3. The average rank results in the above table corresponds to the number of boosting rounds used by CVX2, which also determines the rank of its final solutions. From these results, one can see that the method uses significantly less than the full O(t 2) storage.