Goto

Collaborating Authors

 shiq


ShiQ: Bringing back Bellman to LLMs

arXiv.org Artificial Intelligence

The fine-tuning of pre-trained large language models (LLMs) using reinforcement learning (RL) is generally formulated as direct policy optimization. This approach was naturally favored as it efficiently improves a pretrained LLM, seen as an initial policy. Another RL paradigm, Q-learning methods, has received far less attention in the LLM community while demonstrating major success in various non-LLM RL tasks. In particular, Q-learning effectiveness comes from its sample efficiency and ability to learn offline, which is particularly valuable given the high computational cost of sampling with LLMs. However, naively applying a Q-learning-style update to the model's logits is ineffective due to the specificity of LLMs. Our core contribution is to derive theoretically grounded loss functions from Bellman equations to adapt Q-learning methods to LLMs. To do so, we carefully adapt insights from the RL literature to account for LLM-specific characteristics, ensuring that the logits become reliable Q-value estimates. We then use this loss to build a practical algorithm, ShiQ for Shifted-Q, that supports off-policy, token-wise learning while remaining simple to implement. Finally, we evaluate ShiQ on both synthetic data and real-world benchmarks, e.g., UltraFeedback and BFCL-V3, demonstrating its effectiveness in both single-turn and multi-turn LLM settings


Higher-Order Description Logics for Domain Metamodeling

AAAI Conferences

We investigate an extension of Description Logics (DL) with higher-order capabilities, based on Henkin-style semantics. Our study starts from the observation that the various possibilities of adding higher-order con- structs to a DL form a spectrum of increasing expres- sive power, including domain metamodeling, i.e., using concepts and roles as predicate arguments. We argue that higher-order features of this type are sufficiently rich and powerful for the modeling requirements aris- ing in many relevant situations, and therefore we carry out an investigation of the computational complexity of satisfiability and conjunctive query answering in DLs extended with such higher-order features. In particular, we show that adding domain metamodeling capabilities to SHIQ (the core of OWL 2) has no impact on the complexity of the various reasoning tasks. This is also true for DL-LiteR (the core of OWL 2 QL) under suit- able restrictions on the queries.


On the Complexity of Dealing with Inconsistency in Description Logic Ontologies

AAAI Conferences

We study the problem of dealing with inconsistency in Description Logic (DL) ontologies. We consider inconsistency-tolerant semantics recently proposed in the literature, called AR-semantics and CAR-semantics, which are based on repairing (i.e., modifying) in a minimal way the extensional knowledge (ABox) while keeping the intensional knowledge (TBox) untouched. We study instance checking and conjunctive query entailment under the above inconsistency-tolerant semantics for a wide spectrum of DLs, ranging from tractable ones (EL) to very expressive ones (SHIQ), showing that reasoning under the above semantics is inherently intractable, even for very simple DLs. To the aim of overcoming such a high computational complexity of reasoning, we study sound approximations of the above semantics. Surprisingly, our computational analysis shows that reasoning under the approximated semantics is intractable even for tractable DLs. Finally, we identify suitable language restrictions of such DLs allowing for tractable reasoning under inconsistency-tolerant semantics.