AITopics | Reservoir

Collaborating Authors

Reservoir

Understanding Square Loss in Training Overparametrized Neural Network Classifiers

Neural Information Processing SystemsOct-11-2024, 10:57:52 GMT

Deep learning has achieved many breakthroughs in modern classification tasks. Numerous architectures have been proposed for different data structures but when it comes to the loss function, the cross-entropy loss is the predominant choice. Recently, several alternative losses have seen revived interests for deep classifiers. In particular, empirical evidence seems to promote square loss but a theoretical justification is still lacking. In this work, we contribute to the theoretical understanding of square loss in classification by systematically investigating how it performs for overparametrized neural networks in the neural tangent kernel (NTK) regime.

generalization error, square loss, training overparametrized neural network classifier, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.40)

Add feedback

PAC: Assisted Value Factorization with Counterfactual Predictions in Multi-Agent Reinforcement Learning

Neural Information Processing SystemsOct-11-2024, 09:10:26 GMT

Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods. It allows optimizing a joint action-value function through the maximization of factorized per-agent utilities. In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints (across different states) on the representable function class, causing significant estimation errors during training. We tackle this limitation and propose PAC, a new framework leveraging Assistive information generated from Counterfactual Predictions of optimal joint action selection, which enable explicit assistance to value function factorization through a novel counterfactual loss. A variational inference-based information encoding method is developed to collect and encode the counterfactual predictions from an estimated baseline.

assisted value factorization, counterfactual prediction, multi-agent reinforcement learning, (2 more...)

Neural Information Processing Systems

Genre:

Play > Prospect > Charge (1.00)
Play > Prospect > Container > Reservoir (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Iron: Private Inference on Transformers

Neural Information Processing SystemsOct-11-2024, 09:09:45 GMT

We initiate the study of private inference on Transformer-based models in the client-server setting, where clients have private inputs and servers hold proprietary models. Our main contribution is to provide several new secure protocols for matrix multiplication and complex non-linear functions like Softmax, GELU activations, and LayerNorm, which are critical components of Transformers. Specifically, we first propose a customized homomorphic encryption-based protocol for matrix multiplication that crucially relies on a novel compact packing technique. This design achieves \sqrt{m} \times less communication ( m is the number of rows of the output matrix) over the most efficient work. Second, we design efficient protocols for three non-linear functions via integrating advanced underlying protocols and specialized optimizations.

private inference, protocol, transformer, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.85)

Add feedback

VTC-LFC: Vision Transformer Compression with Low-Frequency Components

Neural Information Processing SystemsOct-11-2024, 05:01:15 GMT

Although Vision transformers (ViTs) have recently dominated many vision tasks, deploying ViT models on resource-limited devices remains a challenging problem. To address such a challenge, several methods have been proposed to compress ViTs. Most of them borrow experience in convolutional neural networks (CNNs) and mainly focus on the spatial domain. However, the compression only in the spatial domain suffers from a dramatic performance drop without fine-tuning and is not robust to noise, as the noise in the spatial domain can easily confuse the pruning criteria, leading to some parameters/channels being pruned incorrectly. Inspired by recent findings that self-attention is a low-pass filter and low-frequency signals/components are more informative to ViTs, this paper proposes compressing ViTs with low-frequency components. Two metrics named low-frequency sensitivity (LFS) and low-frequency energy (LFE) are proposed for better channel pruning and token pruning.

low-frequency component, vision transformer compression, vtc-lfc, (5 more...)

Neural Information Processing Systems

Genre:

Play > Prospect > Container > Reservoir (0.72)
Play > Prospect > Container > Trap (0.65)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.63)

Add feedback

Learning Energy Networks with Generalized Fenchel-Young Losses

Neural Information Processing SystemsOct-11-2024, 01:22:25 GMT

This allows one to capture potentially complex relationships between inputs andoutputs.To learn the parameters of the energy function, the solution to thatoptimization problem is typically fed into a loss function.The key challenge for training energy networks lies in computing loss gradients,as this typically requires argmin/argmax differentiation.In this paper, building upon a generalized notion of conjugate function,which replaces the usual bilinear pairing with a general energy function,we propose generalized Fenchel-Young losses, a natural loss construction forlearning energy networks. Our losses enjoy many desirable properties and theirgradients can be computed efficiently without argmin/argmax differentiation.We also prove the calibration of their excess risk in the case of linear-concaveenergies. We demonstrate our losses on multilabel classification and imitation learning tasks.

energy function, generalized fenchel-young loss, learning energy network, (1 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (1.00)

Industry: Energy > Power Industry (0.97)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Is L 2 Physics Informed Loss Always Suitable for Training Physics Informed Neural Network?

Neural Information Processing SystemsOct-10-2024, 15:26:15 GMT

The Physics-Informed Neural Network (PINN) approach is a new and promising way to solve partial differential equations using deep learning. The L 2 Physics-Informed Loss is the de-facto standard in training Physics-Informed Neural Networks. In this paper, we challenge this common practice by investigating the relationship between the loss function and the approximation quality of the learned solution. In particular, we leverage the concept of stability in the literature of partial differential equation to study the asymptotic behavior of the learned solution as the loss approaches zero. With this concept, we study an important class of high-dimensional non-linear PDEs in optimal control, the Hamilton-Jacobi-Bellman (HJB) Equation, and prove that for general L p Physics-Informed Loss, a wide class of HJB equation is stable only if p is sufficiently large.

equation, neural network, training physics informed neural network, (8 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spherical Channels for Modeling Atomic Interactions

Neural Information Processing SystemsOct-10-2024, 14:58:44 GMT

Modeling the energy and forces of atomic systems is a fundamental problem in computational chemistry with the potential to help address many of the world's most pressing problems, including those related to energy scarcity and climate change. These calculations are traditionally performed using Density Functional Theory, which is computationally very expensive. Machine learning has the potential to dramatically improve the efficiency of these calculations from days or hours to seconds.We propose the Spherical Channel Network (SCN) to model atomic energies and forces. The SCN is a graph neural network where nodes represent atoms and edges their neighboring atoms. The atom embeddings are a set of spherical functions, called spherical channels, represented using spherical harmonics.

energy and force, modeling atomic interaction, spherical channel, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Incorporating Bias-aware Margins into Contrastive Loss for Collaborative Filtering

Neural Information Processing SystemsOct-10-2024, 14:36:26 GMT

Collaborative filtering (CF) models easily suffer from popularity bias, which makes recommendation deviate from users' actual preferences. However, most current debiasing strategies are prone to playing a trade-off game between head and tail performance, thus inevitably degrading the overall recommendation accuracy. To reduce the negative impact of popularity bias on CF models, we incorporate Bias-aware margins into Contrastive loss and propose a simple yet effective BC Loss, where the margin tailors quantitatively to the bias degree of each user-item interaction. We investigate the geometric interpretation of BC loss, then further visualize and theoretically prove that it simultaneously learns better head and tail representations by encouraging the compactness of similar users/items and enlarging the dispersion of dissimilar users/items. Over six benchmark datasets, we use BC loss to optimize two high-performing CF models.

bc loss, contrastive loss, incorporating bias-aware margin, (5 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (1.00)

Technology:

Information Technology > Communications > Social Media (0.64)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.64)

Add feedback

Which Explanation Should I Choose? A Function Approximation Perspective to Characterizing Post Hoc Explanations

Neural Information Processing SystemsOct-10-2024, 07:37:56 GMT

A critical problem in the field of post hoc explainability is the lack of a common foundational goal among methods. For example, some methods are motivated by function approximation, some by game theoretic notions, and some by obtaining clean visualizations. This fragmentation of goals causes not only an inconsistent conceptual understanding of explanations but also the practical challenge of not knowing which method to use when.In this work, we begin to address these challenges by unifying eight popular post hoc explanation methods (LIME, C-LIME, KernelSHAP, Occlusion, Vanilla Gradients, Gradients Input, SmoothGrad, and Integrated Gradients). We show that these methods all perform local function approximation of the black-box model, differing only in the neighbourhood and loss function used to perform the approximation. This unification enables us to (1) state a no free lunch theorem for explanation methods, demonstrating that no method can perform optimally across all neighbourhoods, and (2) provide a guiding principle to choose among methods based on faithfulness to the black-box model.

characterizing post hoc explanation, explanation method, function approximation perspective, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (0.85)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.91)

Add feedback

Theseus: A Library for Differentiable Nonlinear Optimization

Neural Information Processing SystemsOct-10-2024, 00:45:45 GMT

We present Theseus, an efficient application-agnostic open source library for differentiable nonlinear least squares (DNLS) optimization built on PyTorch, providing a common framework for end-to-end structured learning in robotics and vision. Existing DNLS implementations are application specific and do not always incorporate many ingredients important for efficiency. Theseus is application-agnostic, as we illustrate with several example applications that are built using the same underlying differentiable components, such as second-order optimizers, standard costs functions, and Lie groups. For efficiency, Theseus incorporates support for sparse solvers, automatic vectorization, batching, GPU acceleration, and gradient computation with implicit differentiation and direct loss minimization. We do extensive performance evaluation in a set of applications, demonstrating significant efficiency gains and better scalability when these features are incorporated.

differentiable nonlinear optimization, library, theseus, (2 more...)

Neural Information Processing Systems

Genre: Play > Prospect > Container > Reservoir (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.84)

Add feedback