Goto

Collaborating Authors

 Li, Vincent


Automating Mathematical Proof Generation Using Large Language Model Agents and Knowledge Graphs

arXiv.org Artificial Intelligence

Large Language Models have demonstrated remarkable capabilities in natural language processing tasks, including mathematical problem-solving that requires multi-step logical reasoning. However, challenges persist in automating the identification of key mathematical concepts, understanding their interrelations, and formalizing proofs within a rigorous framework. We present a novel framework that leverages knowledge graphs to augment LLMs to construct and formalize mathematical proofs. Our results demonstrate significant performance improvements across multiple datasets, with using knowledge graphs, achieving up to a 34% success rate on the MUSTARDSAUCE dataset on o1-mini and consistently outperforming baseline approaches by 2-11% across different models. We show how this approach bridges the gap between natural language understanding and formal logic proof systems and achieve elevated results for foundation models over baseline.


Stackelberg POMDP: A Reinforcement Learning Approach for Economic Design

arXiv.org Artificial Intelligence

We introduce a reinforcement learning framework for economic design where the interaction between the environment designer and the participants is modeled as a Stackelberg game. In this game, the designer (leader) sets up the rules of the economic system, while the participants (followers) respond strategically. We integrate algorithms for determining followers' response strategies into the leader's learning environment, providing a formulation of the leader's learning problem as a POMDP that we call the Stackelberg POMDP. We prove that the optimal leader's strategy in the Stackelberg game is the optimal policy in our Stackelberg POMDP under a limited set of possible policies, establishing a connection between solving POMDPs and Stackelberg games. We solve our POMDP under a limited set of policy options via the centralized training with decentralized execution framework. For the specific case of followers that are modeled as no-regret learners, we solve an array of increasingly complex settings, including problems of indirect mechanism design where there is turn-taking and limited communication by agents. We demonstrate the effectiveness of our training framework through ablation studies. We also give convergence results for no-regret learners to a Bayesian version of a coarse-correlated equilibrium, extending known results to the case of correlated types.


Prompting Code Interpreter to Write Better Unit Tests on Quixbugs Functions

arXiv.org Artificial Intelligence

Unit testing is a commonly-used approach in software engineering to test the correctness and robustness of written code. Unit tests are tests designed to test small components of a codebase in isolation, such as an individual function or method. Although unit tests have historically been written by human programmers, recent advancements in AI, particularly LLMs, have shown corresponding advances in automatic unit test generation. In this study, we explore the effect of different prompts on the quality of unit tests generated by Code Interpreter, a GPT-4-based LLM, on Python functions provided by the Quixbugs dataset, and we focus on prompting due to the ease with which users can make use of our findings and observations. We find that the quality of the generated unit tests is not sensitive to changes in minor details in the prompts provided. However, we observe that Code Interpreter is often able to effectively identify and correct mistakes in code that it writes, suggesting that providing it runnable code to check the correctness of its outputs would be beneficial, even though we find that it is already often able to generate correctly-formatted unit tests. Our findings suggest that, when prompting models similar to Code Interpreter, it is important to include the basic information necessary to generate unit tests, but minor details are not as important.