selectall
Evaluating LLMs with Multiple Problems at once: A New Paradigm for Probing LLM Capabilities
Wang, Zhengxiang, Kodner, Jordan, Rambow, Owen
Current LLM evaluation predominantly performs evaluation with prompts comprising single problems. We propose multi-problem evaluation as an additional approach to study the multiple problem handling capabilities of LLMs. We present a systematic study in this regard by comprehensively examining 7 LLMs on 4 related types of tasks constructed from 6 classification benchmarks. The 4 task types include traditional single-problem tasks, homogeneous multi-problem tasks, and two index selection tasks that embed the multi-problem tasks. We find that LLMs are competent multi-problem solvers: they generally perform (nearly) as well on multi-problem tasks as on single-problem tasks. Furthermore, contrary to common expectation, they often do not suffer from a positional bias with long inputs. This makes multi-problem prompting a simple and cost-efficient prompting method of practical significance. However, our results also strongly indicate that LLMs lack true understanding: they perform significantly worse in the two index selection tasks than in the multi-problem task under various evaluation settings, although they can indeed do index selection in general.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Singapore (0.04)
- (6 more...)
Improving Complex Knowledge Base Question Answering via Question-to-Action and Question-to-Question Alignment
Tang, Yechun, Cheng, Xiaoxia, Lu, Weiming
Complex knowledge base question answering can be achieved by converting questions into sequences of predefined actions. However, there is a significant semantic and structural gap between natural language and action sequences, which makes this conversion difficult. In this paper, we introduce an alignment-enhanced complex question answering framework, called ALCQA, which mitigates this gap through question-to-action alignment and question-to-question alignment. We train a question rewriting model to align the question and each action, and utilize a pretrained language model to implicitly align the question and KG artifacts. Moreover, considering that similar questions correspond to similar action sequences, we retrieve top-k similar question-answer pairs at the inference stage through question-to-question alignment and propose a novel reward-guided action sequence selection strategy to select from candidate action sequences. We conduct experiments on CQA and WQSP datasets, and the results show that our approach outperforms state-of-the-art methods and obtains a 9.88\% improvements in the F1 metric on CQA dataset. Our source code is available at https://github.com/TTTTTTTTy/ALCQA.
- Europe > Slovakia > Trenčín > Považská Bystrica (0.04)
- Europe > United Kingdom (0.04)
- Asia > China > Zhejiang Province (0.04)
- Workflow (1.00)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)