AITopics | Neural Information Processing Systems

Collaborating Authors

Neural Information Processing Systems

Sharper Convergence Guarantees for Asynchronous SGD for Distributed and Federated Learning

Neural Information Processing SystemsMay-21-2025, 23:18:18 GMT

We study the asynchronous stochastic gradient descent algorithm for distributed training over n workers which have varying computation and communication frequency over time. In this algorithm, workers compute stochastic gradients in parallel at their own pace and return those to the server without any synchronization.

artificial intelligence, gradient, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.76)

Add feedback

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Neural Information Processing SystemsMay-21-2025, 22:43:03 GMT

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new "intrinsic learning rate" parameter that is the product of the normal learning rate η and weight decay factor λ. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR.

artificial intelligence, equilibrium, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.84)

Add feedback

Feature-fortified Unrestricted Graph Alignment

Neural Information Processing SystemsMay-21-2025, 22:38:03 GMT

The necessity to align two graphs, minimizing a structural distance metric, is prevalent in biology, chemistry, recommender systems, and social network analysis. Due to the problem's NP-hardness, prevailing graph alignment methods follow a modular and mediated approach, solving the problem restricted to the domain of intermediary graph representations or products like embeddings, spectra, and graph signals. Restricting the problem to this intermediate space may distort the original problem and are hence predisposed to miss high-quality solutions.

artificial intelligence, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country: Europe > Denmark (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education > Educational Setting (0.46)
Information Technology > Services (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness Long Zhao 1 Ting Liu 2 Xi Peng 3

Neural Information Processing SystemsMay-21-2025, 22:32:03 GMT

Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. In this paper, we propose a novel and effective regularization term for adversarial data augmentation. We theoretically derive it from the information bottleneck principle, which results in a maximum-entropy formulation. Intuitively, this regularization term encourages perturbing the underlying source distribution to enlarge predictive uncertainty of the current model, so that the generated "hard" adversarial perturbations can improve the model robustness during training. Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin.

artificial intelligence, machine learning, proceedings, (13 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

a novel constraint optimization method to encode the generic knowledge into a BN without requiring any training data

Neural Information Processing SystemsMay-21-2025, 22:19:22 GMT

Our proposed approach can be applied to other AUs as well. In Tab.6, LP-SM also considers apex frames on CK+, and The comparison to LP-SM is consistent. In Tab.8, we apply FMPN-FER and DeepEmotion to our pre-processed We will consider a pre-trained VGGFace model in our further work. R2 2.1 The novelty compared to prior work. Facial expression can be a group of AUs.

artificial intelligence, machine learning, optimization problem, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.53)
Information Technology > Artificial Intelligence > Vision > Face Recognition (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.43)

Add feedback

A Appendix

Neural Information Processing SystemsMay-21-2025, 22:19:08 GMT

A.1 Speech Translation Evaluation One hyperparameter in our speech translation evaluation is the threshold on the alignment scores. Mined speech-text pairs are included in the train set if their alignment scores are greater than or equal to the threshold. Speech translation models are trained on the combination of CoVoST2 train set and mined data at different thresholds. We report the performance of each model on the dev set of Common Voice in Figure 5, and find the optimal value for the threshold. Figure 5: BLEU on dev set achieved by S2T models trained on CoVoST train set + mined data at different thresholds.

artificial intelligence, machine learning, threshold, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

Neural Information Processing SystemsMay-21-2025, 22:19:05 GMT

We present an approach to encode a speech signal into a fixed-size representation which minimizes the cosine loss with the existing massively multilingual LASER text embedding space. Sentences are close in this embedding space, independently of their language and modality, either text or audio. Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl. This yielded more than twenty thousand hours of aligned speech translations. To evaluate the automatically mined speech/text corpora, we train neural speech translation systems for several languages pairs.

machine learning, natural language, translation, (20 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Calibration of Shared Equilibria in General Sum Partially Observable Markov Games - Supplementary Nelson Vadori, Sumitra Ganesh, Prashant Reddy, Manuela Veloso J.P. Morgan AI Research A Proofs, y

Neural Information Processing SystemsMay-21-2025, 21:56:53 GMT

B.4 Complete set of experimental results associated to section 4 In this section we display the complete set of results associated to figures shown in section 4. We display in figure 2 the rewards of all agents during training (calibrator, merchant on supertype 1 and n 1 merchants on supertype 2) for experiments 1-5 previously described.

artificial intelligence, bayesian optimization, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Banking & Finance (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.40)

Add feedback

Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation

Neural Information Processing SystemsMay-21-2025, 21:46:34 GMT

We investigate the robustness of vision transformers (ViTs) through the lens of their special patch-based architectural structure, i.e., they process an image as a sequence of image patches. We find that ViTs are surprisingly insensitive to patchbased transformations, even when the transformation largely destroys the original semantics and makes the image unrecognizable by humans. This indicates that ViTs heavily use features that survived such transformations but are generally not indicative of the semantic class to humans. Further investigations show that these features are useful but non-robust, as ViTs trained on them can achieve high in-distribution accuracy, but break down under distribution shifts. From this understanding, we ask: can training the model to rely less on these features improve ViT robustness and out-of-distribution performance? We use the images transformed with our patch-based operations as negatively augmented views and offer losses to regularize the training away from using non-robust features. This is a complementary view to existing research that mostly focuses on augmenting inputs with semantic-preserving transformations to enforce models' invariance. We show that patch-based negative augmentation consistently improves robustness of ViTs on ImageNet based robustness benchmarks across 20+ different experimental settings. Furthermore, we find our patch-based negative augmentation are complementary to traditional (positive) data augmentation techniques and batchbased negative examples in contrastive learning.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

supplementary material for paper: Constant-Expansion Suffices for Compressed Sensing with Generative Priors

Neural Information Processing SystemsMay-21-2025, 21:31:57 GMT

In this section we prove Theorem 3.2. The two arguments are essentially identical, and we will focus on the former. See [20] for a reference on the first bound. The second bound is by concentration of chisquared with k degrees of freedom. We check that f and g satisfy the three conditions of Theorem 4.4 with appropriate parameters. Finally, since Pr[W Θ] 1/2, it follows that conditioning on Θ at most doubles the failure probability.

artificial intelligence, deep learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback