AITopics | Baek, Christina

Collaborating Authors

Baek, Christina

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Context-Parametric Inversion: Why Instruction Finetuning May Not Actually Improve Context Reliance

Goyal, Sachin, Baek, Christina, Kolter, J. Zico, Raghunathan, Aditi

arXiv.org Artificial IntelligenceOct-22-2024

A standard practice when using large language models is for users to supplement their instruction with an input context containing new information for the model to process. However, models struggle to reliably follow the input context, especially when it conflicts with their parametric knowledge from pretraining. In-principle, one would expect models to adapt to the user context better after instruction finetuning, particularly when handling knowledge conflicts. However, we observe a surprising failure mode: during instruction tuning, the context reliance under knowledge conflicts initially increases as expected, but then gradually decreases as instruction finetuning progresses. This happens while the performance on standard benchmarks keeps on increasing far after this drop. We call this phenomenon context-parametric inversion and observe it across multiple general purpose instruction tuning datasets such as TULU, Alpaca and Ultrachat, across different model families like Llama, Mistral, and Pythia. We perform various controlled studies and theoretical analysis to show that context-parametric inversion occurs due to examples in the instruction finetuning data where the input context provides information that aligns with model's parametric knowledge. Our analysis suggests some natural mitigation strategies with limited but insightful gains, and serves as a useful starting point in addressing this deficiency in instruction finetuning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2410.10796

Country:

Asia > Middle East (0.68)
North America > United States (0.67)
Asia > Afghanistan (0.46)

Genre:

Research Report > Strength High (0.34)
Research Report > Experimental Study (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Why is SAM Robust to Label Noise?

Baek, Christina, Kolter, Zico, Raghunathan, Aditi

arXiv.org Artificial IntelligenceMay-6-2024

Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art performances on natural image and language tasks. However, its most pronounced improvements (of tens of percent) is rather in the presence of label noise. Understanding SAM's label noise robustness requires a departure from characterizing the robustness of minimas lying in "flatter" regions of the loss landscape. In particular, the peak performance under label noise occurs with early stopping, far before the loss converges. We decompose SAM's robustness into two effects: one induced by changes to the logit term and the other induced by changes to the network Jacobian. The first can be observed in linear logistic regression where SAM provably up-weights the gradient contribution from clean examples. Although this explicit up-weighting is also observable in neural networks, when we intervene and modify SAM to remove this effect, surprisingly, we see no visible degradation in performance. We infer that SAM's effect in deeper networks is instead explained entirely by the effect SAM has on the network Jacobian. We theoretically derive the implicit regularization induced by this Jacobian effect in two layer linear networks. Motivated by our analysis, we see that cheaper alternatives to SAM that explicitly induce these regularization effects largely recover the benefits in deep networks trained on real-world datasets.

accuracy, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2405.03676

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

Predicting the Performance of Foundation Models via Agreement-on-the-Line

Mehra, Aman, Saxena, Rahul, Kim, Taeyoun, Baek, Christina, Kolter, Zico, Raghunathan, Aditi

arXiv.org Artificial IntelligenceApr-1-2024

Estimating the out-of-distribution performance in regimes where labels are scarce is critical to safely deploy foundation models. Recently, it was shown that ensembles of neural networks observe the phenomena "agreement-on-the-line", which can be leveraged to reliably predict OOD performance without labels. However, in contrast to classical neural networks that are trained on in-distribution data from scratch for numerous epochs, foundation models undergo minimal finetuning from heavily pretrained weights, which may reduce the ensemble diversity needed to observe agreement-on-the-line. In our work, we demonstrate that when lightly finetuning multiple runs from a single foundation model, the choice of randomness during training (linear head initialization, data ordering, and data subsetting) can lead to drastically different levels of agreement-on-the-line in the resulting ensemble. Surprisingly, only random head initialization is able to reliably induce agreement-on-the-line in finetuned foundation models across vision and language benchmarks. Second, we demonstrate that ensembles of multiple foundation models pretrained on different datasets but finetuned on the same task can also show agreement-on-the-line. In total, by careful construction of a diverse ensemble, we can utilize agreement-on-the-line-based methods to predict the OOD performance of foundation models with high precision. Foundation models (FM), or large models first pretrained on open world data then finetuned or prompted for a specific downstream task, have proven to be powerful solutions for many common machine learning problems. A notable trait about FMs is that they are far more robust to distribution shift than other deep learning approaches -- across image and language benchmarks, they suffer a smaller performance degradation on out-of-distribution (OOD) data, that may vary substantially from the in-distribution (ID) finetuning data (Radford et al., 2021; 2019; Brown et al., 2020; Wortsman et al., 2022; Wang et al., 2023; Devlin et al., 2018). From clinical decision-making in different hospitals to navigating robots through unseen terrains, FMs are increasingly utilized for tasks prone to distribution shift. However, evaluating these models in OOD settings remains difficult: in many cases, acquiring labels for OOD data is costly and inefficient, while unlabled OOD data is much easier to collect.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.01542

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Joint Interaction of Models, Data, and Features

Jiang, Yiding, Baek, Christina, Kolter, J. Zico

arXiv.org Artificial IntelligenceJun-7-2023

Learning features from data is one of the defining characteristics of deep learning, but our theoretical understanding of the role features play in deep learning is still rudimentary. To address this gap, we introduce a new tool, the interaction tensor, for empirically analyzing the interaction between data and model through features. With the interaction tensor, we make several key observations about how features are distributed in data and how models with different random seeds learn different features. Based on these observations, we propose a conceptual framework for feature learning. Under this framework, the expected accuracy for a single hypothesis and agreement for a pair of hypotheses can both be derived in closed-form. We demonstrate that the proposed framework can explain empirically observed phenomena, including the recently discovered Generalization Disagreement Equality (GDE) that allows for estimating the generalization error with only unlabeled data. Further, our theory also provides explicit construction of natural data distributions that break the GDE. Thus, we believe this work provides valuable new insight into our understanding of feature learning.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2306.04793

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Agreement-on-the-Line: Predicting the Performance of Neural Networks under Distribution Shift

Baek, Christina, Jiang, Yiding, Raghunathan, Aditi, Kolter, Zico

arXiv.org Artificial IntelligenceMay-10-2023

Recently, Miller et al. showed that a model's in-distribution (ID) accuracy has a strong linear correlation with its out-of-distribution (OOD) accuracy on several OOD benchmarks -- a phenomenon they dubbed ''accuracy-on-the-line''. While a useful tool for model selection (i.e., the model most likely to perform the best OOD is the one with highest ID accuracy), this fact does not help estimate the actual OOD performance of models without access to a labeled OOD validation set. In this paper, we show a similar but surprising phenomenon also holds for the agreement between pairs of neural network classifiers: whenever accuracy-on-the-line holds, we observe that the OOD agreement between the predictions of any two pairs of neural networks (with potentially different architectures) also observes a strong linear correlation with their ID agreement. Furthermore, we observe that the slope and bias of OOD vs ID agreement closely matches that of OOD vs ID accuracy. This phenomenon, which we call agreement-on-the-line, has important practical applications: without any labeled data, we can predict the OOD accuracy of classifiers}, since OOD agreement can be estimated with just unlabeled data. Our prediction algorithm outperforms previous methods both in shifts where agreement-on-the-line holds and, surprisingly, when accuracy is not on the line. This phenomenon also provides new insights into deep neural networks: unlike accuracy-on-the-line, agreement-on-the-line appears to only hold for neural network classifiers.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2206.13089

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Computational Benefits of Intermediate Rewards for Goal-Reaching Policy Learning

Zhai, Yuexiang, Baek, Christina, Zhou, Zhengyuan, Jiao, Jiantao, Ma, Yi

Journal of Artificial Intelligence ResearchMar-12-2022

Many goal-reaching reinforcement learning (RL) tasks have empirically verified that rewarding the agent on subgoals improves convergence speed and practical performance. We attempt to provide a theoretical framework to quantify the computational benefits of rewarding the completion of subgoals, in terms of the number of synchronous value iterations. In particular, we consider subgoals as one-way intermediate states, which can only be visited once per episode and propose two settings that consider these one-way intermediate states: the one-way single-path (OWSP) and the one-way multi-path (OWMP) settings. In both OWSP and OWMP settings, we demonstrate that adding intermediate rewards to subgoals is more computationally efficient than only rewarding the agent once it completes the goal of reaching a terminal state. We also reveal a trade-off between computational complexity and the pursuit of the shortest path in the OWMP setting: adding intermediate rewards significantly reduces the computational complexity of reaching the goal but the agent may not find the shortest path, whereas with sparse terminal rewards, the agent finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and some popular deep RL algorithms.

intermediate state, machine learning, reinforcement learning, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13326

AI Access Foundation

13326

Journal of Artificial Intelligence Research

Country: North America > United States > California (0.28)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Computational Benefits of Intermediate Rewards for Hierarchical Planning

Zhai, Yuexiang, Baek, Christina, Zhou, Zhengyuan, Jiao, Jiantao, Ma, Yi

arXiv.org Artificial IntelligenceJul-8-2021

Many hierarchical reinforcement learning (RL) applications have empirically verified that incorporating prior knowledge in reward design improves convergence speed and practical performance. We attempt to quantify the computational benefits of hierarchical RL from a planning perspective under assumptions about the intermediate state and intermediate rewards frequently (but often implicitly) adopted in practice. Our approach reveals a trade-off between computational complexity and the pursuit of the shortest path in hierarchical planning: using intermediate rewards significantly reduces the computational complexity in finding a successful policy but does not guarantee to find the shortest path, whereas using sparse terminal rewards finds the shortest path at a significantly higher computational cost. We also corroborate our theoretical results with extensive experiments on the MiniGrid environments using Q-learning and other popular deep RL algorithms.

artificial intelligence, intermediate state, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2107.03961

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Assessing Generalization of SGD via Disagreement

Jiang, Yiding, Nagarajan, Vaishnavh, Baek, Christina, Kolter, J. Zico

arXiv.org Artificial IntelligenceJun-25-2021

We empirically show that the test error of deep networks can be estimated by simply training the same architecture on the same training set but with a different run of Stochastic Gradient Descent (SGD), and measuring the disagreement rate between the two networks on unlabeled test data. This builds on -- and is a stronger version of -- the observation in Nakkiran & Bansal '20, which requires the second run to be on an altogether fresh training set. We further theoretically show that this peculiar phenomenon arises from the \emph{well-calibrated} nature of \emph{ensembles} of SGD-trained models. This finding not only provides a simple empirical measure to directly predict the test error using unlabeled test data, but also establishes a new conceptual connection between generalization and calibration.

artificial intelligence, calibration, neural network, (18 more...)

arXiv.org Artificial Intelligence

2106.13799

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Incremental Learning via Rate Reduction

Wu, Ziyang, Baek, Christina, You, Chong, Ma, Yi

arXiv.org Artificial IntelligenceNov-30-2020

Current deep learning architectures suffer from catastrophic forgetting, a failure to retain knowledge of previously learned classes when incrementally trained on new classes. The fundamental roadblock faced by deep learning methods is that deep learning models are optimized as "black boxes," making it difficult to properly adjust the model parameters to preserve knowledge about previously seen data. To overcome the problem of catastrophic forgetting, we propose utilizing an alternative "white box" architecture derived from the principle of rate reduction, where each layer of the network is explicitly computed without back propagation. Under this paradigm, we demonstrate that, given a pre-trained network and new data classes, our approach can provably construct a new network that emulates joint training with all past and new classes. Finally, our experiments show that our proposed learning algorithm observes significantly less decay in classification performance, outperforming state of the art methods on MNIST and CIFAR-10 by a large margin and justifying the use of "white box" algorithms for incremental learning even for sufficiently complex image data.

deep learning, neural network, redunet, (17 more...)

arXiv.org Artificial Intelligence

2011.14593

Genre: Research Report (0.70)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Perception-Action-Learning System for Mobile Social-Service Robots Using Deep Learning

Lee, Beom-Jin (Seoul National University) | Choi, Jinyoung (Seoul National University) | Lee, Chung-Yeon (Seoul National University) | Park, Kyung-Wha (Seoul National University) | Choi, Sungjun (Seoul National University) | Han, Cheolho (Seoul National University) | Han, Dong-Sig (Seoul National University) | Baek, Christina (Seoul National University) | Emaase, Patrick Mokodir (Seoul National University) | Zhang, Byoung-Tak (Seoul National University)

AAAI ConferencesFeb-8-2018

We introduce a novel perception-action-learning system for mobile social-service robots. The state-of-the-art deep learning techniques were incorporated into each module which significantly improves the performance in solving social service tasks. The system not only demonstrated fast and robust performance in a homelike environment but also achieved the highest score in the RoboCup2017@Home Social Standard Platform League (SSPL) held in Nagoya, Japan.

deep learning, neural network, robot, (18 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Asia > Japan > Honshū > Chūbu > Aichi Prefecture > Nagoya (0.25)

Industry:

Health & Medicine > Health Care Providers & Services (0.89)
Government > Social Services (0.89)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.76)

Add feedback