Goto

Collaborating Authors

Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate

Neural Information Processing Systems

Recent works (e.g., (Li and Arora, 2020)) suggest that the use of popular normalization schemes (including Batch Normalization) in today's deep learning can move it far from a traditional optimization viewpoint, e.g., use of exponentially increasing learning rates. The current paper highlights other ways in which behavior of normalized nets departs from traditional viewpoints, and then initiates a formal framework for studying their mathematics via suitable adaptation of the conventional framework namely, modeling SGD-induced training trajectory via a suitable stochastic differential equation (SDE) with a noise term that captures gradient noise. This yields: (a) A new "intrinsic learning rate" parameter that is the product of the normal learning rate η and weight decay factor λ. Analysis of the SDE shows how the effective speed of learning varies and equilibrates over time under the control of intrinsic LR.


Feature-fortified Unrestricted Graph Alignment

Neural Information Processing Systems

The necessity to align two graphs, minimizing a structural distance metric, is prevalent in biology, chemistry, recommender systems, and social network analysis. Due to the problem's NP-hardness, prevailing graph alignment methods follow a modular and mediated approach, solving the problem restricted to the domain of intermediary graph representations or products like embeddings, spectra, and graph signals. Restricting the problem to this intermediate space may distort the original problem and are hence predisposed to miss high-quality solutions.



Maximum-Entropy Adversarial Data Augmentation for Improved Generalization and Robustness Long Zhao 1 Ting Liu 2 Xi Peng 3

Neural Information Processing Systems

Adversarial data augmentation has shown promise for training robust deep neural networks against unforeseen data shifts or corruptions. However, it is difficult to define heuristics to generate effective fictitious target distributions containing "hard" adversarial perturbations that are largely different from the source distribution. In this paper, we propose a novel and effective regularization term for adversarial data augmentation. We theoretically derive it from the information bottleneck principle, which results in a maximum-entropy formulation. Intuitively, this regularization term encourages perturbing the underlying source distribution to enlarge predictive uncertainty of the current model, so that the generated "hard" adversarial perturbations can improve the model robustness during training. Experimental results on three standard benchmarks demonstrate that our method consistently outperforms the existing state of the art by a statistically significant margin.


Apple design legend Jony Ive joins OpenAI to work on AI hardware

The Japan Times

The legendary designer behind Apple's iPhone, Jony Ive, has joined OpenAI to create devices tailored for using generative artificial intelligence, according to a video posted Wednesday by the ChatGPT maker. Ive and his team will take over design at OpenAI as part of an acquisition of his startup named "IO" valued at 6.5 billion. Sharing no details, OpenAI chief executive Sam Altman said in the video that a prototype Ive shared with him "is the coolest piece of technology that the world will have ever seen."


Adversarial Attacks on Linear Contextual Bandits Baptiste Rozière? Laurent Meunier

Neural Information Processing Systems

Contextual bandit algorithms are applied in a wide range of domains, from advertising to recommender systems, from clinical trials to education. In many of these domains, malicious agents may have incentives to force a bandit algorithm into a desired behavior. For instance, an unscrupulous ad publisher may try to increase their own revenue at the expense of the advertisers; a seller may want to increase the exposure of their products, or thwart a competitor's advertising campaign. In this paper, we study several attack scenarios and show that a malicious agent can force a linear contextual bandit algorithm to pull any desired arm T o(T) times over a horizon of T steps, while applying adversarial modifications to either rewards or contexts with a cumulative cost that only grow logarithmically as O(log T). We also investigate the case when a malicious agent is interested in affecting the behavior of the bandit algorithm in a single context (e.g., a specific user). We first provide sufficient conditions for the feasibility of the attack and an efficient algorithm to perform an attack. We empirically validate the proposed approaches in synthetic and real-world datasets.


a novel constraint optimization method to encode the generic knowledge into a BN without requiring any training data

Neural Information Processing Systems

Our proposed approach can be applied to other AUs as well. In Tab.6, LP-SM also considers apex frames on CK+, and The comparison to LP-SM is consistent. In Tab.8, we apply FMPN-FER and DeepEmotion to our pre-processed We will consider a pre-trained VGGFace model in our further work. R2 2.1 The novelty compared to prior work. Facial expression can be a group of AUs.


A Appendix

Neural Information Processing Systems

A.1 Speech Translation Evaluation One hyperparameter in our speech translation evaluation is the threshold on the alignment scores. Mined speech-text pairs are included in the train set if their alignment scores are greater than or equal to the threshold. Speech translation models are trained on the combination of CoVoST2 train set and mined data at different thresholds. We report the performance of each model on the dev set of Common Voice in Figure 5, and find the optimal value for the threshold. Figure 5: BLEU on dev set achieved by S2T models trained on CoVoST train set + mined data at different thresholds.


Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

Neural Information Processing Systems

We present an approach to encode a speech signal into a fixed-size representation which minimizes the cosine loss with the existing massively multilingual LASER text embedding space. Sentences are close in this embedding space, independently of their language and modality, either text or audio. Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl. This yielded more than twenty thousand hours of aligned speech translations. To evaluate the automatically mined speech/text corpora, we train neural speech translation systems for several languages pairs.