Multiple-Implementation Testing of Supervised Learning Software
Srisakaokul, Siwakorn (University of Illinois at Urbana-Champaign) | Wu, Zhengkai (University of Illinois at Urbana-Champaign) | Astorga, Angello (University of Illinois at Urbana-Champaign) | Alebiosu, Oreoluwa (University of Illinois at Urbana-Champaign) | Xie, Tao (University of Illinois at Urbana-Champaign)
Machine Learning (ML) algorithms are now used in a wide range of application domains in society. Naturally, software implementations of these algorithms have become ubiquitous. Faults in ML software can cause substantial losses in these application domains. Thus, it is very critical to conduct effective testing of ML software to detect and eliminate its faults. However, testing ML software is difficult, partly because producing test oracles used for checking behavior correctness (such as using expected properties or expected test outputs) is challenging. In this paper, we propose an approach of multiple-implementation testing to test supervised learning software, a major type of ML software. In particular, our approach derives a test input's proxy oracle from the majority-voted output running the test input of multiple implementations of the same algorithm (based on a pre-defined percentage threshold). Our approach reports likely those test inputs whose outputs (produced by an implementation under test) are different from the majority-voted outputs as failing tests. We evaluate our approach on two highly popular supervised learning algorithms: k-Nearest Neighbor (kNN) and Naive Bayes (NB). Our results show that our approach is highly effective in detecting faults in real-world supervised learning software. In particular, our approach detects 13 real faults and 1 potential fault from 19 kNN implementations and 16 real faults from 7 NB implementations. Our approach can even detect 7 real faults and 1 potential fault among the three popularly used open-source ML projects (Weka, RapidMiner, and KNIME).
Apr-6-2018