It aims to implement a wide array of machine learning methods and functions as a "swiss army knife" for machine learning researchers. This README serves as a guide for what mlpack is, how to install it, how to run it, and where to find more documentation. Citations are beneficial for the growth and improvement of mlpack. All of those should be available in your distribution's package manager. If not, you will have to compile each of them by hand.
Popular libraries make up the backbone of data science: scikit-learn, TensorFlow, Caffe, and Keras are the standard Python choices. But these libraries don't tend to implement niche techniques (scikit-learn's policy actually states that they don't consider algorithms less than three years old or with less than 200 citations!), Enter mlpack: a flexible, fast machine learning library. It's written in C, with bindings to Python and command-line programs that can be used for simpler data science tasks. Because of its use of templates for configurability, it is easy to customize the specific behavior of algorithms without any runtime penalty.
Our previous roundup of machine learning resources touched mlpack, a C -based machine learning library originally rolled out in 2011 and designed for "scalability, speed, and ease-of-use," according to the library's creators. Implementing mlpack can be done through a cache of command-line executables for quick-and-dirty, "black box" operations, or with a C API for more sophisticated work. The 2.0 version has lots of refactorings and new features, including many new kinds of algorithms, and changes to existing ones to speed them up or slim them down. For example, it ditches the Boost library's random number generator for C 11's native random functions. One long-standing disadvantage is a lack of bindings for any language other than C, meaning users of everything from R to Python can't make use of mlpack unless someone rolls their own wrappers for said languages.
This academic background has led to mlpack being used in many scientific publications both inside the machine learning community and in adjacent fields. New developments and features from our sponsored projects, straight to your inbox, once a month. New developments and features from our sponsored projects, straight to your inbox, once a month.
The Euclidean Minimum Spanning Tree problem is widely used in machine learning and data mining applications. Given a set of points in, our task is to compute lowest weight spanning tree in the complete graph on with edge weights given by the Euclidean distance between points. Among other applications, the EMST can be used to compute hierarchical clusterings of data. A single-linkage clustering can be obtained from the EMST by deleting all edges longer than a given cluster length. This technique is also referred to as a Friends-of-Friends clustering in the astronomy literature.