Goto

Collaborating Authors

 coef std


Scaling LLM Planning: NL2FLOW for Parametric Problem Generation and Rigorous Evaluation

Kang, Jungkoo

arXiv.org Artificial Intelligence

Robust workflow composition is critical for effective agent performance, yet progress in Large Language Model (LLM) planning and reasoning is hindered by a scarcity of scalable evaluation data. This work introduces NL2Flow, a fully automated pipeline for generating and evaluating workflow planning problems. NL2Flow generates problems parametrically in a structured intermediate representation, translating them into both natural language and formal PDDL. I evaluate several open-source, instruct-tuned LLMs on a dataset of 2296 low-difficulty problems generated by NL2Flow. Results demonstrate that the best-performing model achieved 86% success in generating valid plans and 69% in generating optimal plans (for solvable problems). Regression analysis shows that the influence of problem characteristics on plan generation is contingent on both model and prompt design. Importantly, translating natural language problems into a structured JSON representation prior to symbolic planning significantly improved success rates, suggesting a benefit from neuro-symbolic integration. These findings underscore the importance of understanding error sources within LLM reasoning as systems scale to more complex tasks. As LLM reasoning scales to increasingly complex problems, understanding the shifting bottlenecks and sources of error within these systems will be crucial.


Deep State Space Recurrent Neural Networks for Time Series Forecasting

Inzirillo, Hugo

arXiv.org Artificial Intelligence

We explore various neural network architectures for modeling the dynamics of the cryptocurrency market. Traditional linear models often fall short in accurately capturing the unique and complex dynamics of this market. In contrast, Deep Neural Networks (DNNs) have demonstrated considerable proficiency in time series forecasting. This papers introduces novel neural network framework that blend the principles of econometric state space models with the dynamic capabilities of Recurrent Neural Networks (RNNs). We propose state space models using Long Short Term Memory (LSTM), Gated Residual Units (GRU) and Temporal Kolmogorov-Arnold Networks (TKANs). According to the results, TKANs, inspired by Kolmogorov-Arnold Networks (KANs) and LSTM, demonstrate promising outcomes.


An Introduction to Bayesian Reasoning

#artificialintelligence

The coefficients are constrained by the prior and end up smaller in the second example. Although the model is not fit here with Bayesian techniques, it has a Bayesian interpretation. The output here does not quite give a distribution over the coefficient (though other packages can), but does give something related: a 95% confidence interval around the coefficient, in addition to its point estimate. By now you may have a taste for Bayesian techniques and what they can do for you, from a few simple examples. Things get more interesting, however, when we see what priors and posteriors can do for a real-world use case. For part 2, please click here.