Latent Variable Models for Predicting File Dependencies in Large-Scale Software Development
Hu, Diane, Maaten, Laurens, Cho, Youngmin, Lerner, Sorin, Saul, Lawrence K.
–Neural Information Processing Systems
When software developers modify one or more files in a large code base, they must also identify and update other related files. Many file dependencies can be detected by mining the development history of the code base: in essence, groups of related files are revealed by the logs of previous workflows. From data of this form, we show how to detect dependent files by solving a problem in binary matrix completion. We explore different latent variable models (LVMs) for this problem, including Bernoulli mixture models, exponential family PCA, restricted Boltzmann machines, and fully Bayesian approaches. We evaluate these models on the development histories of three large, open-source software systems: Mozilla Firefox, Eclipse Subversive, and Gimp. In all of these applications, we find that LVMs improve the performance of related file prediction over current leading methods.
Neural Information Processing Systems
Dec-31-2010
- Country:
- North America > United States
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California > San Diego County
- San Diego (0.04)
- Massachusetts > Middlesex County
- Europe > Netherlands
- South Holland > Delft (0.04)
- North America > United States
- Industry:
- Information Technology > Software (0.54)