Opening the AI black box: program synthesis via mechanistic interpretability

Michaud, Eric J., Liao, Isaac, Lad, Vedang, Liu, Ziming, Mudide, Anish, Loughridge, Chloe, Guo, Zifan Carl, Kheirkhah, Tara Rezaei, Vukelić, Mateja, Tegmark, Max

Feb-7-2024–arXiv.org Artificial Intelligence

We present MIPS, a novel method for program synthesis based on automated mechanistic interpretability The goal of the present paper is to take a modest first step in of neural networks trained to perform this direction by presenting and testing MIPS (Mechanistic-the desired task, auto-distilling the learned algorithm Interpretability-based Program Synthesis), a fully automated into Python code. We test MIPS on a benchmark method that can distill simple learned algorithms of 62 algorithmic tasks that can be learned from neural networks into Python code. The rest of this by an RNN and find it highly complementary to paper is organized as follows. After reviewing prior work in GPT-4: MIPS solves 32 of them, including 13 Section II, we present our method in Section III, test it on a that are not solved by GPT-4 (which also solves benchmark in Section IV and summarize our conclusions in 30). MIPS uses an integer autoencoder to convert Section V. the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Feb-7-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Transportation > Air (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning > Logic & Formal Reasoning (1.00)