instantiate
New Bounds for Hyperparameter Tuning of Regression Problems Across Instances
The task of tuning regularization coefficients in regularized regression models with provable guarantees across problem instances still poses a significant challenge in the literature. This paper investigates the sample complexity of tuning regularization parameters in linear and logistic regressions under $\ell_1$ and $\ell_2$-constraints in the data-driven setting. For the linear regression problem, by more carefully exploiting the structure of the dual function class, we provide a new upper bound for the pseudo-dimension of the validation loss function class, which significantly improves the best-known results on the problem. Remarkably, we also instantiate the first matching lower bound, proving our results are tight. For tuning the regularization parameters of logistic regression, we introduce a new approach to studying the learning guarantee via an approximation of the validation loss function class. We examine the pseudo-dimension of the approximation class and construct a uniform error bound between the validation loss function class and its approximation, which allows us to instantiate the first learning guarantee for the problem of tuning logistic regression regularization coefficients.
MASA: LLM-Driven Multi-Agent Systems for Autoformalization
Zhang, Lan, Valentino, Marco, Freitas, André
Autoformalization serves a crucial role in connecting natural language and formal reasoning. This paper presents MASA, a novel framework for building multi-agent systems for autoformalization driven by Large Language Models (LLMs). MASA leverages collaborative agents to convert natural language statements into their formal representations. The architecture of MASA is designed with a strong emphasis on modularity, flexibility, and extensibility, allowing seamless integration of new agents and tools to adapt to a fast-evolving field. We showcase the effectiveness of MASA through use cases on real-world mathematical definitions and experiments on formal mathematics datasets. This work highlights the potential of multi-agent systems powered by the interaction of LLMs and theorem provers in enhancing the efficiency and reliability of autoformalization, providing valuable insights and support for researchers and practitioners in the field.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Middle East > Jordan (0.04)
- (9 more...)
From domain-landmark graph learning to problem-landmark graph generation
Pérez-Corral, Cristian, Garrido, Antonio, Sebastia, Laura
Landmarks have long played a pivotal role in automated planning, serving as crucial elements for improving the planning algorithms. The main limitation of classical landmark extraction methods is their sensitivity to specific planning tasks. This results in landmarks fully tailored to individual instances, thereby limiting their applicability across other instances of the same planning domain. We propose a novel approach that learns landmark relationships from multiple planning tasks of a planning domain. This leads to the creation of a \textit{probabilistic lifted ordering graph}, as a structure that captures weighted abstractions of relationships between parameterized landmarks. Although these orderings are not 100\% true (they are probabilistic), they can still be very useful in planning. Next, given a new planning task for that domain, we instantiate the relationships from that graph to this particular instance. This instantiation operates in two phases. First, it generates two graphs: the former instantiating information from the initial state and the latter from the goal state. Second, it combines these two graphs into one unified graph by searching equivalences to extract landmark orderings. We evaluate the precision and recallof the information found by our approach over well-known planning domains.
- Europe > Spain (0.04)
- Europe > United Kingdom (0.04)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
New Bounds for Hyperparameter Tuning of Regression Problems Across Instances
The task of tuning regularization coefficients in regularized regression models with provable guarantees across problem instances still poses a significant challenge in the literature. This paper investigates the sample complexity of tuning regularization parameters in linear and logistic regressions under \ell_1 and \ell_2 -constraints in the data-driven setting. For the linear regression problem, by more carefully exploiting the structure of the dual function class, we provide a new upper bound for the pseudo-dimension of the validation loss function class, which significantly improves the best-known results on the problem. Remarkably, we also instantiate the first matching lower bound, proving our results are tight. For tuning the regularization parameters of logistic regression, we introduce a new approach to studying the learning guarantee via an approximation of the validation loss function class.
Soup-of-Experts: Pretraining Specialist Models via Parameters Averaging
Ablin, Pierre, Katharopoulos, Angelos, Seto, Skyler, Grangier, David
Machine learning models are routinely trained on a mixture of different data domains. Different domain weights yield very different downstream performances. We propose the Soup-of-Experts, a novel architecture that can instantiate a model at test time for any domain weights with minimal computational cost and without re-training the model. Our architecture consists of a bank of expert parameters, which are linearly combined to instantiate one model. We learn the linear combination coefficients as a function of the input domain weights. To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights. We demonstrate how our approach obtains small specialized models on several language modeling tasks quickly. Soup-of-Experts are particularly appealing when one needs to ship many different specialist models quickly under a model size constraint.
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
New Bounds for Hyperparameter Tuning of Regression Problems Across Instances
The task of tuning regularization coefficients in regularized regression models with provable guarantees across problem instances still poses a significant challenge in the literature. This paper investigates the sample complexity of tuning regularization parameters in linear and logistic regressions under \ell_1 and \ell_2 -constraints in the data-driven setting. For the linear regression problem, by more carefully exploiting the structure of the dual function class, we provide a new upper bound for the pseudo-dimension of the validation loss function class, which significantly improves the best-known results on the problem. Remarkably, we also instantiate the first matching lower bound, proving our results are tight. For tuning the regularization parameters of logistic regression, we introduce a new approach to studying the learning guarantee via an approximation of the validation loss function class.
Using a neural net to instantiate a deformable model
Deformable models are an attractive approach to recognizing non(cid:173) rigid objects which have considerable within class variability. How(cid:173) ever, there are severe search problems associated with fitting the models to data. We show that by using neural networks to provide better starting points, the search time can be significantly reduced. The method is demonstrated on a character recognition task. In previous work we have developed an approach to handwritten character recogni(cid:173) tion based on the use of deformable models (Hinton, Williams and Revow, 1992a; Revow, Williams and Hinton, 1993).