Goto

Collaborating Authors

 krl



A Adaptations of Algorithm 1 for different problems

Neural Information Processing Systems

We extend Algorithm 1 to stochastic gradient descent (SGD). Algorithm 3 here modifies Algorithm 1 to allow transformations on both parameters and data. In this section, we derive the group actions for the test functions and multi-layer neural networks. More details about group theory can be found in textbooks such as Lang (2002). B.1 Continuous symmetry in test functions B.1.1 Ellipse Consider the following loss function with a 2 R However, we will only use the 2 variable version in the experiments.


Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning

arXiv.org Artificial Intelligence

Training data mixtures greatly impact the generalization performance of large language models. Existing domain reweighting methods often rely on costly weight computations and require retraining when new data is introduced. To this end, we introduce a flexible and efficient data mixing framework, Chameleon, that employs leverage scores to quantify domain importance within a learned embedding space. We first construct a domain affinity matrix over domain embeddings. The induced leverage scores determine a mixture that upweights domains sharing common representations in embedding space. This formulation allows direct transfer to new data by computing the new domain embeddings. In experiments, we demonstrate improvements over three key scenarios: (i) our computed weights improve performance on pretraining domains with a fraction of the compute of existing methods; (ii) Chameleon can adapt to data changes without proxy retraining, boosting few-shot reasoning accuracies when transferred to new data; (iii) our method enables efficient domain reweighting in finetuning, consistently improving test perplexity on all finetuning domains over uniform mixture. Our code is available at https://github.com/LIONS-EPFL/Chameleon.


KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning

arXiv.org Artificial Intelligence

In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First, we use a faster generation procedure that samples from independent next-word distributions after training the language model (LM) with supervised learning. We then introduce a fine-grained reward function to help the model focus on learning key information in a dialog, by measuring the importance and semantic closeness of each generated token. Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance on the end-to-end response generation task, with a 15% training time reduction compared to a standard RL algorithm using auto-regressive generation.


NYTRO: When Subsampling Meets Early Stopping

arXiv.org Machine Learning

Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines. On the other hand, memory/space (rather than time) complexity is the main constraint in many applications, and randomized subsampling techniques have been proposed to tackle this issue. In this paper we ask whether early stopping and subsampling ideas can be combined in a fruitful way. We consider the question in a least squares regression setting and propose a form of randomized iterative regularization based on early stopping and subsampling. In this context, we analyze the statistical and computational properties of the proposed method. Theoretical results are complemented and validated by a thorough experimental analysis.


Graphical Nonconvex Optimization for Optimal Estimation in Gaussian Graphical Models

arXiv.org Machine Learning

We consider the problem of learning high-dimensional Gaussian graphical models. The graphical lasso is one of the most popular methods for estimating Gaussian graphical models. However, it does not achieve the oracle rate of convergence. In this paper, we propose the graphical nonconvex optimization for optimal estimation in Gaussian graphical models, which is then approximated by a sequence of convex programs. Our proposal is computationally tractable and produces an estimator that achieves the oracle rate of convergence. The statistical error introduced by the sequential approximation using the convex programs are clearly demonstrated via a contraction property. The rate of convergence can be further improved using the notion of sparsity pattern. The proposed methodology is then extended to semiparametric graphical models. We show through numerical studies that the proposed estimator outperforms other popular methods for estimating Gaussian graphical models.


An overview of KRL, a knowledge representation language

Classics

This paper describes krl, a Knowledge Representation Language designed for use in understander systems. It outlines both the general concepts which underlie our research and the details of KRL-0, an experimental implementation of some of these concepts. These forms provide a variety of ways to express the logical structure of the knowledge, in order to give flexibility in associating procedures (for memory and reasoning) with specific pieces of knowledge, and to control the relative accessibility of different facts and descriptions. The formalism for declarative knowledge is based on structured conceptual objects with associated descriptions. These objects form a network of memory units with several different sorts of linkages, each having well-specified implications for the retrieval process.