error predictor
Guided Star-Shaped Masked Diffusion
Meshchaninov, Viacheslav, Shibaev, Egor, Makoian, Artem, Klimov, Ivan, Sheshenya, Danil, Malinin, Andrei, Balagansky, Nikita, Gavrilov, Daniil, Alanov, Aibek, Vetrov, Dmitry
The performance of pre-trained masked diffusion models is often constrained by their sampling procedure, which makes decisions irreversible and struggles in low-step generation regimes. We introduce a novel sampling algorithm that works with pre-trained models and, after a lightweight fine-tuning of a single layer, significantly improves sample quality and efficiency. Our method reformulates the generation process using a star-shaped paradigm, which inherently allows for error correction. To make this process effective, we augment it with a learnable re-masking scheduler that intelligently identifies and revises likely errors. This approach yields a substantial quality boost, particularly when using a small number of sampling steps. We extensively ablate key components of our approach and show its usability in different scenarios. In comprehensive experiments on text, and code generation, our sampling algorithm outperforms or matches existing methods. Diffusion probabilistic models have demonstrated remarkable success in generating high-fidelity data, particularly in continuous domains such as image and video synthesis (Sohl-Dickstein et al., 2015; Song & Ermon, 2019; Ho et al., 2020; Sahoo et al., 2024b). A key reason for their effectiveness is the principle of iterative refinement. This allows for a robust error correction mechanism; a mistake made early in the trajectory can be gradually amended in subsequent steps, leading to state-of-the-art results. This elegant property, however, is largely absent in the discrete domain. While discrete diffusion models are making significant strides in areas like natural language processing (Lou et al., 2024; Sahoo et al., 2024a; Schiff et al., 2024), the most successful variants, based on token masking, are built on a foundation that precludes iterative refinement. In a masked diffusion setup, the generation of each token is a one-way street: once a [MASK] is replaced with a concrete token, the model commits to that decision. The token is then frozen and cannot be revisited or updated, even if later steps reveal it to be suboptimal in the broader context.
Curiosity Driven Exploration to Optimize Structure-Property Learning in Microscopy
Vatsavai, Aditya, Narasimha, Ganesh, Liu, Yongtao, Chowdhury, Jawad, Yang, Jan-Chi, Funakubo, Hiroshi, Ziatdinov, Maxim, Vasudevan, Rama
Rapidly determining structure-property correlations in materials is an important challenge in better understanding fundamental mechanisms and greatly assists in materials design. In microscopy, imaging data provides a direct measurement of the local structure, while spectroscopic measurements provide relevant functional property information. Deep kernel active learning approaches have been utilized to rapidly map local structure to functional properties in microscopy experiments, but are computationally expensive for multi-dimensional and correlated output spaces. Here, we present an alternative lightweight curiosity algorithm which actively samples regions with unexplored structure-property relations, utilizing a deep-learning based surrogate model for error prediction. We show that the algorithm outperforms random sampling for predicting properties from structures, and provides a convenient tool for efficient mapping of structure-property relationships in materials science.
DEUP: Direct Epistemic Uncertainty Prediction
Jain, Moksh, Lahlou, Salem, Nekoei, Hadi, Butoi, Victor, Bertin, Paul, Rector-Brooks, Jarrid, Korablyov, Maksym, Bengio, Yoshua
Epistemic uncertainty is the part of out-of-sample prediction error due to the lack of knowledge of the learner. Whereas previous work was focusing on model variance, we propose a principled approach for directly estimating epistemic uncertainty by learning to predict generalization error and subtracting an estimate of aleatoric uncertainty, i.e., intrinsic unpredictability. This estimator of epistemic uncertainty includes the effect of model bias and can be applied in non-stationary learning environments arising in active learning or reinforcement learning. In addition to demonstrating these properties of Direct Epistemic Uncertainty Prediction (DEUP), we illustrate its advantage against existing methods for uncertainty estimation on downstream tasks including sequential model optimization and reinforcement learning. We also evaluate the quality of uncertainty estimates from DEUP for probabilistic classification of images and for estimating uncertainty about synergistic drug combinations.
Space Expansion of Feature Selection for Designing more Accurate Error Predictors
Nikkhah, Shayan Tabatabaei, Kamal, Mehdi, Afzali-Kusha, Ali, Pedram, Massoud
Approximate computing is being considered as a promising design paradigm to overcome the energy and performance challenges in computationally demanding applications. If the case where the accuracy can be configured, the quality level versus energy efficiency or delay also may be traded-off. For this technique to be used, one needs to make sure a satisfactory user experience. This requires employing error predictors to detect unacceptable approximation errors. In this work, we propose a scheduling-aware feature selection method which leverages the intermediate results of the hardware accelerator to improve the prediction accuracy. Additionally, it configures the error predictors according to the energy consumption and latency of the system. The approach enjoys the flexibility of the prediction time for a higher accuracy. The results on various benchmarks demonstrate significant improvements in the prediction accuracy compared to the prior works which used only the accelerator inputs for the prediction.