Goto

Collaborating Authors

 Beger, Claas


CoCoNUT: Structural Code Understanding does not fall out of a tree

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown impressive performance across a wide array of tasks involving both structured and unstructured textual data. Recent results on various benchmarks for code generation, repair, or completion suggest that certain models have programming abilities comparable to or even surpass humans. In this work, we demonstrate that high performance on such benchmarks does not correlate to humans' innate ability to understand structural control flow in code. To this end, we extract solutions from the HumanEval benchmark, which the relevant models perform strongly on, and trace their execution path using function calls sampled from the respective test set. Using this dataset, we investigate the ability of seven state-of-the-art LLMs to match the execution trace and find that, despite their ability to generate semantically identical code, they possess limited ability to trace execution paths, especially for longer traces and specific control structures. We find that even the top-performing model, Gemini, can fully and correctly generate only 47% of HumanEval task traces. Additionally, we introduce a subset for three key structures not contained in HumanEval: Recursion, Parallel Processing, and Object-Oriented Programming, including concepts like Inheritance and Polymorphism. Besides OOP, we show that none of the investigated models achieve an accuracy over 5% on the relevant traces. Aggregating these specialized parts with HumanEval tasks, we present CoCoNUT: Code Control Flow for Navigation Understanding and Testing, which measures a model's ability to trace execution of code upon relevant calls, including advanced structural components. We conclude that current LLMs need significant improvement to enhance code reasoning abilities. We hope our dataset helps researchers bridge this gap.


Unlocking Transparent Alignment Through Enhanced Inverse Constitutional AI for Principle Extraction

arXiv.org Artificial Intelligence

Multiple options exist to align pre-trained Large Language Models (LLMs) to better adhere to human preferences. Popular methods include Reinforcement Learning from Human Feedback (RLHF), which trains a reward model to act as a proxy for human feedback to rate model outputs, and Direct Preference Optimization (DPO), which eliminates an explicit reward model to represent human preferences, and instead, implicitly defines this in their loss function for fine-tuning. Both approaches heavily rely on pairwise human-annotated preference data that ranks model outputs. As an alternative method to alignment, Anthropic introduced Constitutional AI (CAI) [1], which offers a rule-based alternative to alignment based on a core set of principles/values called constitution. This set contains key ethical, moral, and safety standards that guide the outputs and promote desired behaviors through repeated critiquing of model outputs. Having an explicitly defined set of core values aids in the interpretability of the changes induced through the alignment procedure, as typical approaches like DPO or RLHF rely on an implicitly defined set of principles embedded in the pairwise preference data. Building on the idea of CAI, [2] proposed an Inverse Constitutional AI (ICAI) algorithm.