LEXICON: a Benchmark for Planning under Temporal Constraints in Natural Language
–Neural Information Processing Systems
Owing to their reasoning capabilities, large language models (LLMs) have been evaluated on planning tasks described in natural language. However, LLMs have largely been tested on planning domains without constraints. In order to deploy them in real-world settings where adherence to constraints, in particular safety constraints, is critical, we need to evaluate their performance on constrained planning tasks. We introduce LEXICON--a natural language-based (LEXI) constrained (CON) planning benchmark, consisting of a suite of environments, that can be used to evaluate the planning capabilities of LLMs in a principled fashion. The core idea behind LEXICON is to take existing planning environments and impose temporal constraints on the states.
Neural Information Processing Systems
Jun-19-2026, 17:38:10 GMT
- Country:
- Europe (0.46)
- Genre:
- Overview (0.67)
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Industry:
- Education (0.46)
- Technology: