AITopics | in-context example in-context example

Collaborating Authors

in-context example in-context example

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

c529dba08a146ea8d6cf715ae8930cbe-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-18-2025, 19:08:16 GMT

artificial intelligence, in-context example in-context example, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Looped Transformers are Better at Learning Learning Algorithms

Yang, Liu, Lee, Kangwook, Nowak, Robert, Papailiopoulos, Dimitris

arXiv.org Artificial IntelligenceDec-11-2023

Transformers have demonstrated effectiveness in in-context solving data-fitting problems from various (latent) models, as reported by Garg et al. (2022). However, the absence of an inherent iterative structure in the transformer architecture presents a challenge in emulating the iterative algorithms, which are commonly employed in traditional machine learning methods. To address this, we propose the utilization of looped transformer architecture and its associated training methodology, with the aim of incorporating iterative characteristics into the transformer architectures. Experimental results suggest that the looped transformer achieves performance comparable to the standard transformer in solving various data-fitting problems, while utilizing less than 10% of the parameter count. Transformers (Vaswani et al., 2017; Brown et al., 2020; Devlin et al., 2019) have emerged as the preferred model in the field of natural language processing (NLP) and other domains requiring sequence-to-sequence modeling. Besides their state-of-art performance in natural language processing tasks, large language models (LLM) such as GPT-3 (Brown et al., 2020) and PaLM (Chowdhery et al., 2022) also exhibit the ability to learn in-context: they can adapt to various downstream tasks based on a brief prompt, thus bypassing the need for additional model fine-tuning. This intriguing ability of in-context learning has sparked interest in the research community, leading numerous studies (Min et al., 2022; Olsson et al., 2022; Li et al., 2023). However, the underlying mechanisms enabling these transformers to perform in-context learning remain unclear. In an effort to understand the in-context learning behavior of LLMs, Garg et al. (2022) investigated the performance of transformers, when trained from scratch, in solving specific function class learning problems in-context. Notably, transformers exhibited strong performance across all tasks, matching or even surpassing traditional solvers. Building on this, Akyürek et al. (2022) explored the transformerbased model's capability to address the linear regression learning problem, interpreting it as an implicit form of established learning algorithms. Their study included both theoretical and empirical perspectives to understand how transformers learn these functions. Subsequently, von Oswald et al. (2022) demonstrated empirically that, when trained to predict the linear function output, a linear self-attention-only transformer inherently learns to perform a single step of gradient descent to solve the linear regression task in-context. While the approach and foundational theory presented by von Oswald et al. (2022) are promising, there exists a significant gap between the simplified architecture they examined and the standard decoder transformer used in practice.

loop iteration, looped transformer, transformer, (12 more...)

arXiv.org Artificial Intelligence

2311.12424

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Education > Focused Education > Special Education (0.44)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

What Can Transformers Learn In-Context? A Case Study of Simple Function Classes

Garg, Shivam, Tsipras, Dimitris, Liang, Percy, Valiant, Gregory

arXiv.org Artificial IntelligenceAug-11-2023

In-context learning refers to the ability of a model to condition on a prompt sequence consisting of in-context examples (input-output pairs corresponding to some task) along with a new query input, and generate the corresponding output. Crucially, in-context learning happens only at inference time without any parameter updates to the model. While large language models such as GPT-3 exhibit some ability to perform in-context learning, it is unclear what the relationship is between tasks on which this succeeds and what is present in the training data. To make progress towards understanding in-context learning, we consider the well-defined problem of training a model to in-context learn a function class (e.g., linear functions): that is, given data derived from some functions in the class, can we train a model to in-context learn "most" functions from this class? We show empirically that standard Transformers can be trained from scratch to perform in-context learning of linear functions -- that is, the trained model is able to learn unseen linear functions from in-context examples with performance comparable to the optimal least squares estimator. In fact, in-context learning is possible even under two forms of distribution shift: (i) between the training data of the model and inference-time prompts, and (ii) between the in-context examples and the query input during inference. We also show that we can train Transformers to in-context learn more complex function classes -- namely sparse linear functions, two-layer neural networks, and decision trees -- with performance that matches or exceeds task-specific learning algorithms. Our code and models are available at https://github.com/dtsip/in-context-learning .

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2208.01066

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback