Goto

Collaborating Authors

 Friede, David


AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents

arXiv.org Artificial Intelligence

The advances made by Large Language Models (LLMs) have led to the pursuit of LLM agents that can solve intricate, multi-step reasoning tasks. As with any research pursuit, benchmarking and evaluation are key corner stones to efficient and reliable progress. However, existing benchmarks are often narrow and simply compute overall task success. To face these issues, we propose AgentQuest -- a framework where (i) both benchmarks and metrics are modular and easily extensible through well documented and easy-to-use APIs; (ii) we offer two new evaluation metrics that can reliably track LLM agent progress while solving a task. We exemplify the utility of the metrics on two use cases wherein we identify common failure points and refine the agent architecture to obtain a significant performance increase. Together with the research community, we hope to extend AgentQuest further and therefore we make it available under https://github.com/nec-research/agentquest.


Efficient Learning of Discrete-Continuous Computation Graphs

arXiv.org Artificial Intelligence

Numerous models for supervised and reinforcement learning benefit from combinations of discrete and continuous model components. End-to-end learnable discrete-continuous models are compositional, tend to generalize better, and are more interpretable. A popular approach to building discrete-continuous computation graphs is that of integrating discrete probability distributions into neural networks using stochastic softmax tricks. Prior work has mainly focused on computation graphs with a single discrete component on each of the graph's execution paths. We analyze the behavior of more complex stochastic computations graphs with multiple sequential discrete components. We show that it is challenging to optimize the parameters of these models, mainly due to small gradients and local minima. We then propose two new strategies to overcome these challenges. First, we show that increasing the scale parameter of the Gumbel noise perturbations during training improves the learning behavior. Second, we propose dropout residual connections specifically tailored to stochastic, discrete-continuous computation graphs. With an extensive set of experiments, we show that we can train complex discrete-continuous models which one cannot train with standard stochastic softmax tricks. We also show that complex discrete-stochastic models generalize better than their continuous counterparts on several benchmark datasets.


Learning Disentangled Discrete Representations

arXiv.org Artificial Intelligence

Recent successes in image generation, model-based reinforcement learning, and text-to-image generation have demonstrated the empirical advantages of discrete latent representations, although the reasons behind their benefits remain unclear. We explore the relationship between discrete latent spaces and disentangled representations by replacing the standard Gaussian variational autoencoder (VAE) with a tailored categorical variational autoencoder. We show that the underlying grid structure of categorical distributions mitigates the problem of rotational invariance associated with multivariate Gaussian distributions, acting as an efficient inductive prior for disentangled representations. We provide both analytical and empirical findings that demonstrate the advantages of discrete VAEs for learning disentangled representations. Furthermore, we introduce the first unsupervised model selection strategy that favors disentangled representations.


Smooth Variational Graph Embeddings for Efficient Neural Architecture Search

arXiv.org Artificial Intelligence

This leads to the desire of an accurate space encoding that enables performance prediction In this paper, we propose an approach to neural architecture via surrogates and black-box optimization to find search (NAS) based on graph embeddings. NAS has high-performing architectures in a continuous search space been addressed previously using discrete, sampling based [67]. Zhang et al. [67] propose D-VAE, a graph neural network methods, which are computationally expensive as well as (GNN) [14, 23, 56] based variational neural architecture differentiable approaches, which come at lower costs but embedding with emphasis on the information flow and enforce stronger constraints on the search space. The proposed thereby achieve good results in architecture performance approach leverages advantages from both sides by prediction and BO on the ENAS search space [39] and on a building a smooth variational neural architecture embedding dataset of Bayesian Networks.