Goto

Collaborating Authors

 importance


On the Importance of Gradient Norm in PAC-Bayesian Bounds

Neural Information Processing Systems

Generalization bounds which assess the difference between the true risk and the empirical risk have been studied extensively. However, to obtain bounds, current techniques use strict assumptions such as a uniformly bounded or a Lipschitz loss function. To avoid these assumptions, in this paper, we follow an alternative approach: we relax uniform bounds assumptions by using on-average bounded loss and on-average bounded gradient norm assumptions. Following this relaxation, we propose a new generalization bound that exploits the contractivity of the log-Sobolev inequalities. These inequalities add an additional loss-gradient norm term to the generalization bound, which is intuitively a surrogate of the model complexity. We apply the proposed bound on Bayesian deep nets and empirically analyze the effect of this new loss-gradient norm term on different neural architectures.


On the Importance of Exploration for Generalization in Reinforcement Learning

Neural Information Processing Systems

Existing approaches for improving generalization in deep reinforcement learning (RL) have mostly focused on representation learning, neglecting RL-specific aspects such as exploration. We hypothesize that the agent's exploration strategy plays a key role in its ability to generalize to new environments.Through a series of experiments in a tabular contextual MDP, we show that exploration is helpful not only for efficiently finding the optimal policy for the training environments but also for acquiring knowledge that helps decision making in unseen environments. Based on these observations, we propose EDE: Exploration via Distributional Ensemble, a method that encourages the exploration of states with high epistemic uncertainty through an ensemble of Q-value distributions. The proposed algorithm is the first value-based approach to achieve strong performance on both Procgen and Crafter, two benchmarks for generalization in RL with high-dimensional observations.


On the Theory of Transfer Learning: The Importance of Task Diversity

Neural Information Processing Systems

We provide new statistical guarantees for transfer learning via representation learning--when transfer is achieved by learning a feature representation shared across different tasks. This enables learning on new tasks using far less data than is required to learn them in isolation. Formally, we consider $t+1$ tasks parameterized by functions of the form $f_j \circ h$ in a general function class $F \circ H$, where each $f_j$ is a task-specific function in $F$ and $h$ is the shared representation in $H$. Letting $C(\cdot)$ denote the complexity measure of the function class, we show that for diverse training tasks (1) the sample complexity needed to learn the shared representation across the first $t$ training tasks scales as $C(H) + t C(F)$, despite no explicit access to a signal from the feature representation and (2) with an accurate estimate of the representation, the sample complexity needed to learn a new task scales only with $C(F)$. Our results depend upon a new general notion of task diversity--applicable to models with general tasks, features, and losses--as well as a novel chain rule for Gaussian complexities.


On the Importance of Gradients for Detecting Distributional Shifts in the Wild

Neural Information Processing Systems

Detecting out-of-distribution (OOD) data has become a critical component in ensuring the safe deployment of machine learning models in the real world. Existing OOD detection approaches primarily rely on the output or feature space for deriving OOD scores, while largely overlooking information from the gradient space. In this paper, we present GradNorm, a simple and effective approach for detecting OOD inputs by utilizing information extracted from the gradient space. GradNorm directly employs the vector norm of gradients, backpropagated from the KL divergence between the softmax output and a uniform probability distribution. Our key idea is that the magnitude of gradients is higher for in-distribution (ID) data than that for OOD data, making it informative for OOD detection. GradNorm demonstrates superior performance, reducing the average FPR95 by up to 16.33% compared to the previous best method.


The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains

Neural Information Processing Systems

Scaling has been a critical factor in improving model performance and generalization across various fields of machine learning.It involves how a model's performance changes with increases in model size or input data, as well as how efficiently computational resources are utilized to support this growth. Despite successes in scaling other types of machine learning models, the study of scaling in Neural Network Interatomic Potentials (NNIPs) remains limited. NNIPs act as surrogate models for ab initio quantum mechanical calculations, predicting the energy and forces between atoms in molecules and materials based on atomic configurations. The dominant paradigm in this field is to incorporate numerous physical domain constraints into the model, such as symmetry constraints like rotational equivariance. We contend that these increasingly complex domain constraints inhibit the scaling ability of NNIPs, and such strategies are likely to cause model performance to plateau in the long run.


Towards Understanding the Importance of Shortcut Connections in Residual Networks

Neural Information Processing Systems

Residual Network (ResNet) is undoubtedly a milestone in deep learning. ResNet is equipped with shortcut connections between layers, and exhibits efficient training using simple first order algorithms. Despite of the great empirical success, the reason behind is far from being well understood. In this paper, we study a two-layer non-overlapping convolutional ResNet. Training such a network requires solving a non-convex optimization problem with a spurious local optimum.


Reviews: The Importance of Communities for Learning to Influence

Neural Information Processing Systems

This work marries influence maximization (IM) with recent work on submodular optimization from samples. The work salvages some positive results from the wreckage of previous impossibility results on IM from samples, by showing that under an SBM model of community structure in graphs, positive results for IM under sampling are possible with a new algorithm (COPS) that is a new variation on other greedy algorithms for IM. It's surprising that the removal step in the COPS algorithm is sufficient from producing the improvement seen between Margl and COPS in Figure 2 (where Margl sometimes does worse than random). Overall this is a strong contribution to the IM literature. Pros: - Brings IM closer to practical contexts by studying IM under learned influence functions - Gives rigorous analysis of this problem for SBMs - Despite simplicity of SBMs, solid evaluation shows good performance on real data Cons: - The paper is very well written, but sometimes feels like it oversimplifies the literature in the service of somewhat overstating the importance of the paper.


Reviews: The Importance of Sampling inMeta-Reinforcement Learning

Neural Information Processing Systems

The paper shows the importance of the used training setup for MAML and RL 2. A setup can include "exploratory episodes" and measure the loss only on the next "reporting" episodes. The paper presents interesting results. The introduced E-MAML and E-RL 2 variants clearly help. The main problem with the paper: The paper does not define well the objective. I only deduced from the Appendix C that the setup is: After starting in a new environment, do 3 exploratory episodes and report the collected reward on the next 2 episodes.


Demystifying the Magic: The Importance of Machine Learning Explainability

#artificialintelligence

Machine learning explainability refers to the ability to understand and interpret the reasoning behind the predictions made by a machine learning model. It is important for ensuring transparency and accountability in the decision-making process. Explainable AI techniques, such as feature importance analysis and model interpretability, help to provide insights into how a model arrives at its output. This can help to detect and prevent bias, increase trust in AI systems, and facilitate regulatory compliance. Model insights, also known as model interpretability or explainability, refer to the ability to understand how a machine learning model works and why it makes certain predictions or decisions.


The Importance of Machine Learning Pipelines – The Official Blog of BigML.com

#artificialintelligence

As Machine Learning solutions to real-world problems spread, people are beginning to acknowledge the glaring need for solutions that go beyond training a single model and deploying it. The simplest process should at least cover feature extraction, feature generation, modeling, and monitoring in a traceable and reproducible way. In BigML, it's been a while since we realized that, and the platform has constantly added features designed to help our users easily build both basic and complex solutions. Those solutions often need to be deployed in particular environments. Our white-box approach is totally compatible with that, as users can download the models created in BigML and predict with them wherever needed by using bindings to Python or other popular programming languages.