Opening the AI black box: program synthesis via mechanistic interpretability
Michaud, Eric J., Liao, Isaac, Lad, Vedang, Liu, Ziming, Mudide, Anish, Loughridge, Chloe, Guo, Zifan Carl, Kheirkhah, Tara Rezaei, Vukelić, Mateja, Tegmark, Max
–arXiv.org Artificial Intelligence
We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability The goal of the present paper is to take a modest first step in of neural networks trained to perform this direction by presenting and testing MIPS (Mechanistic-the desired task, auto-distilling the learned algorithm Interpretability-based Program Synthesis), a fully automated into Python code. We test MIPS on a benchmark method that can distill simple learned algorithms of 62 algorithmic tasks that can be learned from neural networks into Python code. The rest of this by an RNN and find it highly complementary to paper is organized as follows. After reviewing prior work in GPT-4: MIPS solves 32 of them, including 13 Section II, we present our method in Section III, test it on a that are not solved by GPT-4 (which also solves benchmark in Section IV and summarize our conclusions in 30). MIPS uses an integer autoencoder to convert Section V. the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture
arXiv.org Artificial Intelligence
Feb-7-2024
- Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Genre:
- Research Report > New Finding (0.34)
- Industry:
- Transportation > Air (0.40)
- Technology: