Goto

Collaborating Authors

 context


Learning Partitions from Context

Neural Information Processing Systems

In this paper, we study the problem of learning the structure of a discrete set of N tokens based on their interactions with other tokens. We focus on a setting where the tokens can be partitioned into a small number of classes, and there exists a real-valued function f defined on certain sets of tokens. This function, which captures the interactions between tokens, depends only on the class memberships of its arguments. The goal is to recover the class memberships of all tokens from a finite number of samples of f . We begin by analyzing this problem from both complexity-theoretic and information-theoretic viewpoints.


One-Layer Transformer Provably Learns One-Nearest Neighbor In Context

Neural Information Processing Systems

Transformers have achieved great success in recent years. Interestingly, transformers have shown particularly strong in-context learning capability -- even without fine-tuning, they are still able to solve unseen tasks well purely based on task-specific prompts. In this paper, we study the capability of one-layer transformers in learning the one-nearest neighbor prediction rule. Under a theoretical framework where the prompt contains a sequence of labeled training data and unlabeled test data, we show that, although the loss function is nonconvex, when trained with gradient descent, a single softmax attention layer can successfully learn to behave like a one-nearest neighbor classifier. Our result gives a concrete example on how transformers can be trained to implement nonparametric machine learning algorithms, and sheds light on the role of softmax attention in transformer models.


Route Sparse Autoencoder to Interpret Large Language Models

Shi, Wei, Li, Sihang, Liang, Tao, Wan, Mingyang, Ma, Gojun, Wang, Xiang, He, Xiangnan

arXiv.org Artificial Intelligence

Mechanistic interpretability of large language models (LLMs) aims to uncover the internal processes of information propagation and reasoning. Sparse autoencoders (SAEs) have demonstrated promise in this domain by extracting interpretable and monosemantic features. However, prior works primarily focus on feature extraction from a single layer, failing to effectively capture activations that span multiple layers. In this paper, we introduce Route Sparse Autoencoder (RouteSAE), a new framework that integrates a routing mechanism with a shared SAE to efficiently extract features from multiple layers. It dynamically assigns weights to activations from different layers, incurring minimal parameter overhead while achieving high interpretability and flexibility for targeted feature manipulation. We evaluate RouteSAE through extensive experiments on Llama-3.2-1B-Instruct. Specifically, under the same sparsity constraint of 64, RouteSAE extracts 22.5% more features than baseline SAEs while achieving a 22.3% higher interpretability score. These results underscore the potential of RouteSAE as a scalable and effective method for LLM interpretability, with applications in feature discovery and model intervention. Our codes are available at https://github.com/swei2001/RouteSAEs.


Kosmos-G: Generating Images in Context with Multimodal Large Language Models

Pan, Xichen, Dong, Li, Huang, Shaohan, Peng, Zhiliang, Chen, Wenhu, Wei, Furu

arXiv.org Artificial Intelligence

Recent advancements in text-to-image (T2I) and vision-language-to-image (VL2I) generation have made significant strides. However, the generation from generalized vision-language inputs, especially involving multiple images, remains under-explored. This paper presents Kosmos-G, a model that leverages the advanced perception capabilities of Multimodal Large Language Models (MLLMs) to tackle the aforementioned challenge. Our approach aligns the output space of MLLM with CLIP using the textual modality as an anchor and performs compositional instruction tuning on curated data. Kosmos-G demonstrates a unique capability of zero-shot multi-entity subject-driven generation. Notably, the score distillation instruction tuning requires no modifications to the image decoder. This allows for a seamless substitution of CLIP and effortless integration with a myriad of U-Net techniques ranging from fine-grained controls to personalized image decoder variants. We posit Kosmos-G as an initial attempt towards the goal of "image as a foreign language in image generation."


Sharing Context Between Tasks in Databricks Workflows - The Databricks Blog

#artificialintelligence

Databricks Workflows is a fully-managed service on Databricks that makes it easy to build and manage complex data and ML pipelines in your lakehouse without the need to operate complex infrastructure. Sometimes, a task in an ETL or ML pipeline depends on the output of an upstream task. An example would be to evaluate the performance of a machine learning model and then have a task determine whether to retrain the model based on model metrics. Since these are two separate steps, it would be best to have separate tasks perform the work. Previously, accessing information from a previous task required storing this information outside of the job's context, such as in a Delta table.


The AI Act: Three Things To Know About AI Regulation Worldwide

#artificialintelligence

As AI proliferates, countries and their legal systems are trying to catch up. AI regulation is emerging at industry level, city and county level, and at country and region level. The European Union AI Act could well serve as a template for AI regulation around the world. In this post, we describe three key things you should know about AI Regulation: Context - what is already around us, AI Act - the key elements of the upcoming EU legislation, and What all this is likely to mean to businesses and individuals. The AI Act is not the first piece of AI regulation.


Meaning and Context in Computer Programs

Communications of the ACM

When you look at a function program's source code, how do you know what it means--that is, what object or process is this function representing? Is the meaning found in the return values of the function, or is it located inside the function body? Answering these questions is important to understanding how to share domain knowledge among programmers using the source code as the medium. Whether debugging or adding new features to a program, programmers must read the code to understand what the program is doing. From this reading, the programmers must also know how the problem domain is represented in the code, so they can be certain that their changes to the source code won't make the program work in unexpected ways.



Why Creativity Is Now More Important Than Intelligence

#artificialintelligence

Machines can now do what you could call IQ-style thinking – covering what'multiple intelligences' theorist Howard Gardner would call visual-spatial, verbal-linguistic and logical-mathematical intelligence – pretty darn well. Artificial Intelligence (AI) is here and it's getting more sophisticated every day. But AC – Artificial Creativity – barely exists. AI has been unsettling the human world for quite some time. Can you believe it was 1997 when IBM's Deep Blue computer triumphed over chess colossus Garry Kasparov?


1380

AI Magazine

There's More to Life Than Making Plans For many years, research in AI plan generation was governed by a number of strong, simplifying assumptions: The planning agent is omniscient, its actions are deterministic and instantaneous, its goals are fixed and categorical, and its environment is static. More recently, researchers have developed expanded planning algorithms that are not predicated on such assumptions, but changing the way in which plans are formed is only part of what is required when the classical assumptions are abandoned. The demands of dynamic, uncertain environments mean that in addition to being able to form plans--even probabilistic, uncertain plans--agents must be able to effectively manage their plans. In this article, which is based on a talk given at the 1998 AAAI Fall Symposium on Distributed, Continual Planning, we first identify reasoning tasks that are involved in plan management, including commitment management, environment monitoring, alternative assessment, plan elaboration, metalevel control, and coordination with other agents. We next survey approaches we have developed to many of these tasks and discuss a plan-management system we are building to ground our theoretical work, by providing us with a platform for integrating our techniques and exploring their value in a realistic problem.