Automatic Derivation of Statistical Algorithms: The EM Family and Beyond

Neural Information Processing Systems 

Machine learning has reached a point where many probabilistic meth- ods can be understood as variations, extensions and combinations of a much smaller set of abstract themes, e.g., as different instances of the EM algorithm. This enables the systematic derivation of algorithms cus- tomized for different models. Here, we describe the AUTO BAYES sys- tem which takes a high-level statistical model specification, uses power- ful symbolic techniques based on schema-based program synthesis and computer algebra to derive an efficient specialized algorithm for learning that model, and generates executable code implementing that algorithm. This capability is far beyond that of code collections such as Matlab tool- boxes or even tools for model-independent optimization such as BUGS for Gibbs sampling: complex new algorithms can be generated with- out new programming, algorithms can be highly specialized and tightly crafted for the exact structure of the model and data, and efficient and commented code can be generated for different languages or systems. We describe a symbolic program synthesis system which works as a "statistical algorithm compiler:" it compiles a statistical model specification into a custom algorithm design and from that further down into a working program implementing the algorithm design.