Glossary of Machine Learning Terms

#artificialintelligence

ROC curves are widely used because they are relatively simple to understand and capture more than one aspect of the classification.


Feature Selection Approach with Missing Values Conducted for Statistical Learning: A Case Study of Entrepreneurship Survival Dataset

arXiv.org Machine Learning

In this article, we investigate the features which enhanced discriminate the survival in the micro and small business (MSE) using the approach of data mining with feature selection. According to the complexity of the data set, we proposed a comparison of three data imputation methods such as mean imputation (MI), k-nearest neighbor (KNN) and expectation maximization (EM) using mutually the selection of variables technique, whereby t-test, then through the data mining process using logistic regression classification methods, naive Bayes algorithm, linear discriminant analysis and support vector machine hence comparing their respective performances. The experimental results will be spread in developing a model to predict the MSE survival, providing a better understanding in the topic once it is a significant part of the Brazilian' GPA and macroeconomy.


Restricted Bayes Optimal Classifiers

AAAI Conferences

We introduce the notion of restricted Bayes optimal classifiers. These classifiers attempt to combine the flexibility of the generative approach to classification with the high accuracy associated with discriminative learning. They first create a model of the joint distribution over class labels and features. Instead of choosing the decision boundary induced directly from the model, they restrict the allowable types of decision boundaries and learn the one that minimizes the probability of misclassification relative to the estimated joint distribution. In this paper, we investigate two particular instantiations of this approach. The first uses a nonparametric density estimator -- Parzen Windows with Gaussian kernels -- and hyperplane decision boundaries. We show that the resulting classifier is asymptotically equivalent to a maximal margin hyperplane classifier, a highly successful discriminative classifier. We therefore provide an alternative justification for maximal margin hyperplane classifiers. The second instantiation uses a mixture of Gaussians as the estimated density; in experiments on real-world data, we show that this approach allows data with missing values to be handled in a principled manner, leading to improved performance over regular discriminative approaches.


Classifying textual data: shallow, deep and ensemble methods

arXiv.org Machine Learning

This paper focuses on a comparative evaluation of the most common and modern methods for text classification, including the recent deep learning strategies and ensemble methods. The study is motivated by a challenging real data problem, characterized by high-dimensional and extremely sparse data, deriving from incoming calls to the customer care of an Italian phone company. We will show that deep learning outperforms many classical (shallow) strategies but the combination of shallow and deep learning methods in a unique ensemble classifier may improve the robustness and the accuracy of "single" classification methods.


A Review of Statistical Learning Machines from ATR to DNA Microarrays: design, assessment, and advice for practitioners

arXiv.org Machine Learning

Statistical Learning is the process of estimating an unknown probabilistic input-output relationship of a system using a limited number of observations; and a statistical learning machine (SLM) is the machine that learned such a process. While their roots grow deeply in Probability Theory, SLMs are ubiquitous in the modern world. Automatic Target Recognition (ATR) in military applications, Computer Aided Diagnosis (CAD) in medical imaging, DNA microarrays in Genomics, Optical Character Recognition (OCR), Speech Recognition (SR), spam email filtering, stock market prediction, etc., are few examples and applications for SLM; diverse fields but one theory. The field of Statistical Learning can be decomposed to two basic subfields, Design and Assessment. Three main groups of specializations-namely statisticians, engineers, and computer scientists (ordered ascendingly by programming capabilities and descendingly by mathematical rigor)-exist on the venue of this field and each takes its elephant bite. Exaggerated rigorous analysis of statisticians sometimes deprives them from considering new ML techniques and methods that, yet, have no "complete" mathematical theory. On the other hand, immoderate add-hoc simulations of computer scientists sometimes derive them towards unjustified and immature results. A prudent approach is needed that has the enough flexibility to utilize simulations and trials and errors without sacrificing any rigor. If this prudent attitude is necessary for this field it is necessary, as well, in other fields of Engineering.