AITopics | Xu, Winnie

Collaborating Authors

Xu, Winnie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

KTO: Model Alignment as Prospect Theoretic Optimization

Ethayarajh, Kawin, Xu, Winnie, Muennighoff, Niklas, Jurafsky, Dan, Kiela, Douwe

arXiv.org Artificial IntelligenceFeb-2-2024

Kahneman & Tversky's prospect theory tells us that humans perceive random variables in a biased To understand why these alignment methods work so well, but well-defined manner (1992); for example, humans and whether feedback needs to be in the form of preferences, are famously loss-averse. We show that we frame them through the lens of prospect theory objectives for aligning LLMs with human feedback (Kahneman & Tversky, 1979; Tversky & Kahneman, implicitly incorporate many of these biases-- 1992). Prospect theory explains why humans make decisions the success of these objectives (e.g., DPO) over about uncertain events that do not maximize expected cross-entropy minimization can partly be ascribed value. It formalizes how humans perceive random variables to them being human-aware loss functions (HAin a biased but well-defined manner; for example, relative to LOs). However, the utility functions these methods some reference point, humans are more sensitive to losses attribute to humans still differ from those in than gains, a property called loss aversion. We show that the prospect theory literature. Using a Kahneman-popular alignment methods such as PPO (Schulman et al., Tversky model of human utility, we propose a 2017), DPO (Rafailov et al., 2023), and SLiC (Zhao et al., HALO that directly maximizes the utility of generations 2023) implicitly model such biases, helping explain their instead of maximizing the log-likelihood success independently of the data used. For this reason, we of preferences, as current methods do. We call call them human-aware loss functions (HALOs).

kto, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.01306

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry:

Media > Television (0.47)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Prioritized training on points that are learnable, worth learning, and not yet learned (workshop version)

Mindermann, Sören, Razzak, Muhammed, Xu, Winnie, Kirsch, Andreas, Sharma, Mrinank, Morisot, Adrien, Gomez, Aidan N., Farquhar, Sebastian, Brauner, Jan, Gal, Yarin

arXiv.org Artificial IntelligenceOct-17-2023

We introduce Goldilocks Selection, a technique for faster model training which selects a sequence of training points that are "just right". We propose an information-theoretic acquisition function -- the reducible validation loss -- and compute it with a small proxy model -- GoldiProx -- to efficiently choose training points that maximize information about a validation set. We show that the "hard" (e.g. high loss) points usually selected in the optimization literature are typically noisy, while the "easy" (e.g. low noise) samples often prioritized for curriculum learning confer less information. Further, points with uncertain labels, typically targeted by active learning, tend to be less relevant to the task. In contrast, Goldilocks Selection chooses points that are "just right" and empirically outperforms the above approaches. Moreover, the selected sequence can transfer to other architectures; practitioners can share and reuse it without the need to recreate it.

artificial intelligence, learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2107.02565

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Neural Functional Transformers

Zhou, Allan, Yang, Kaien, Jiang, Yiding, Burns, Kaylee, Xu, Winnie, Sokota, Samuel, Kolter, J. Zico, Finn, Chelsea

arXiv.org Artificial IntelligenceMay-22-2023

The recent success of neural networks as implicit representation of data has driven growing interest in neural functionals: models that can process other neural networks as input by operating directly over their weight spaces. Nevertheless, constructing expressive and efficient neural functional architectures that can handle high-dimensional weight-space objects remains challenging. This paper uses the attention mechanism to define a novel set of permutation equivariant weight-space layers and composes them into deep equivariant models called neural functional Transformers (NFTs). NFTs respect weight-space permutation symmetries while incorporating the advantages of attention, which have exhibited remarkable success across multiple domains. In experiments processing the weights of feedforward MLPs and CNNs, we find that NFTs match or exceed the performance of prior weight-space methods. We also leverage NFTs to develop Inr2Array, a novel method for computing permutation invariant latent representations from the weights of implicit neural representations (INRs). Our proposed method improves INR classification accuracy by up to $+17\%$ over existing methods. We provide an implementation of our layers at https://github.com/AllanYangZhou/nfn.

artificial intelligence, arxiv preprint arxiv, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.13546

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Deep Latent State Space Models for Time-Series Generation

Zhou, Linqi, Poli, Michael, Xu, Winnie, Massaroli, Stefano, Ermon, Stefano

arXiv.org Artificial IntelligenceFeb-3-2023

Methods based on ordinary differential equations (ODEs) are widely used to build generative models of time-series. In addition to high computational overhead due to explicitly computing hidden states recurrence, existing ODE-based models fall short in learning sequence data with sharp transitions - common in many real-world systems - due to numerical challenges during optimization. In this work, we propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE to increase modeling capacity. Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4 which bypasses the explicit evaluation of hidden states. We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets in the Monash Forecasting Repository, and is capable of modeling highly stochastic data with sharp temporal transitions. LS4 sets state-of-the-art for continuous-time latent generative models, with significant improvement of mean squared error and tighter variational lower bounds on irregularly-sampled datasets, while also being x100 faster than other baselines on long sequences.

artificial intelligence, deep latent state space model, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2212.12749

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Language Model Cascades

Dohan, David, Xu, Winnie, Lewkowycz, Aitor, Austin, Jacob, Bieber, David, Lopes, Raphael Gontijo, Wu, Yuhuai, Michalewski, Henryk, Saurous, Rif A., Sohl-dickstein, Jascha, Murphy, Kevin, Sutton, Charles

arXiv.org Artificial IntelligenceJul-28-2022

Prompted models have demonstrated impressive In this position paper, we argue that a useful unifying few-shot learning abilities. Repeated interactions framework for understanding and extending this disparate at test-time with a single model, or the body of work is in terms of probabilistic programming languages composition of multiple models together, further (PPL) extended to work with strings, instead of expands capabilities. These compositions are more atomic data types like integers and floats. That is, probabilistic models, and may be expressed in we use a PPL to define a joint probability model on stringvalued the language of graphical models with random random variables, parameterized using LMs, and variables whose values are complex data types then condition this model on string-valued observations in such as strings. Cases with control flow and dynamic order to compute a posterior over string-valued unknowns, structure require techniques from probabilistic which we can then infer. We call such a probabilistic programming, which allow implementing program a language model cascade. We show that this disparate model structures and inference strategies framework captures many recent approaches, and also allows in a unified language. We formalize several us to tackle more complex multi-step reasoning problems.

large language model, logic & formal reasoning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2207.10342

Country: North America > United States (0.68)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.46)

Add feedback

NoisyMix: Boosting Robustness by Combining Data Augmentations, Stability Training, and Noise Injections

Erichson, N. Benjamin, Lim, Soon Hoe, Utrera, Francisco, Xu, Winnie, Cao, Ziang, Mahoney, Michael W.

arXiv.org Machine LearningFeb-2-2022

For many real-world applications, obtaining stable and robust statistical performance is more important than simply achieving state-of-the-art predictive test accuracy, and thus robustness of neural networks is an increasingly important topic. Relatedly, data augmentation schemes have been shown to improve robustness with respect to input perturbations and domain shifts. Motivated by this, we introduce NoisyMix, a training scheme that combines data augmentations with stability training and noise injections to improve both model robustness and in-domain accuracy. This combination promotes models that are consistently more robust and that provide well-calibrated estimates of class membership probabilities. We demonstrate the benefits of NoisyMix on a range of benchmark datasets, including ImageNet-C, ImageNet-R, and ImageNet-P. Moreover, we provide theory to understand implicit regularization and robustness of NoisyMix.

artificial intelligence, machine learning, robustness, (17 more...)

arXiv.org Machine Learning

2202.01263

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Noisy Feature Mixup

Lim, Soon Hoe, Erichson, N. Benjamin, Utrera, Francisco, Xu, Winnie, Mahoney, Michael W.

arXiv.org Machine LearningOct-5-2021

We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation that combines the best of interpolation based training and noise injection schemes. Rather than training with convex combinations of pairs of examples and their labels, we use noise-perturbed convex combinations of pairs of data points in both input and feature space. This method includes mixup and manifold mixup as special cases, but it has additional advantages, including better smoothing of decision boundaries and enabling improved model robustness. We provide theory to understand this as well as the implicit regularization effects of NFM. Our theory is supported by empirical results, demonstrating the advantage of NFM, as compared to mixup and manifold mixup. We show that residual networks and vision transformers trained with NFM have favorable trade-offs between predictive accuracy on clean data and robustness with respect to various types of data perturbation across a range of computer vision benchmark datasets.

artificial intelligence, machine learning, neural network, (14 more...)

arXiv.org Machine Learning

2110.0218

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.65)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations

Xu, Winnie, Chen, Ricky T. Q., Li, Xuechen, Duvenaud, David

arXiv.org Machine LearningFeb-12-2021

We perform scalable approximate inference in a recently-proposed family of continuous-depth Bayesian neural networks. In this model class, uncertainty about separate weights in each layer produces dynamics that follow a stochastic differential equation (SDE). We demonstrate gradient-based stochastic variational inference in this infinite-parameter setting, producing arbitrarily-flexible approximate posteriors. We also derive a novel gradient estimator that approaches zero variance as the approximate posterior approaches the true posterior. This approach further inherits the memory-efficient training and tunable precision of neural ODEs.

deep learning, neural network, posterior, (13 more...)

arXiv.org Machine Learning

2102.06559

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback