Accuracy
Cost-Sensitive Support Vector Machines
Masnadi-Shirazi, Hamed, Vasconcelos, Nuno, Iranmehr, Arya
A new procedure for learning cost-sensitive SVM(CS-SVM) classifiers is proposed. The SVM hinge loss is extended to the cost sensitive setting, and the CS-SVM is derived as the minimizer of the associated risk. The extension of the hinge loss draws on recent connections between risk minimization and probability elicitation. These connections are generalized to cost-sensitive classification, in a manner that guarantees consistency with the cost-sensitive Bayes risk, and associated Bayes decision rule. This ensures that optimal decision rules, under the new hinge loss, implement the Bayes-optimal cost-sensitive classification boundary. Minimization of the new hinge loss is shown to be a generalization of the classic SVM optimization problem, and can be solved by identical procedures. The dual problem of CS-SVM is carefully scrutinized by means of regularization theory and sensitivity analysis and the CS-SVM algorithm is substantiated. The proposed algorithm is also extended to cost-sensitive learning with example dependent costs. The minimum cost sensitive risk is proposed as the performance measure and is connected to ROC analysis through vector optimization. The resulting algorithm avoids the shortcomings of previous approaches to cost-sensitive SVM design, and is shown to have superior experimental performance on a large number of cost sensitive and imbalanced datasets.
ggRandomForests: Visually Exploring a Random Forest for Regression
Random Forests [Breiman:2001] (RF) are a fully non-parametric statistical method requiring no distributional assumptions on covariate relation to the response. RF are a robust, nonlinear technique that optimizes predictive accuracy by fitting an ensemble of trees to stabilize model estimates. The randomForestSRC package (http://cran.r-project.org/package=randomForestSRC) is a unified treatment of Breiman's random forests for survival, regression and classification problems. Predictive accuracy make RF an attractive alternative to parametric models, though complexity and interpretability of the forest hinder wider application of the method. We introduce the ggRandomForests package (http://cran.r-project.org/package=ggRandomForests), for visually understand random forest models grown in R with the randomForestSRC package. The vignette is a tutorial for using the ggRandomForests package with the randomForestSRC package for building and post-processing a regression random forest. In this tutorial, we explore a random forest model for the Boston Housing Data, available in the MASS package. We grow a random forest for regression and demonstrate how ggRandomForests can be used when determining variable associations, interactions and how the response depends on predictive variables within the model. The tutorial demonstrates the design and usage of many of ggRandomForests functions and features how to modify and customize the resulting ggplot2 graphic objects along the way. A development version of the ggRandomForests package is available on Github. We invite comments, feature requests and bug reports for this package at (https://github.com/ehrlinger/ggRandomForests).
Massively Multitask Networks for Drug Discovery
Ramsundar, Bharath, Kearnes, Steven, Riley, Patrick, Webster, Dale, Konerding, David, Pande, Vijay
Massively multitask neural architectures provide a learning framework for drug discovery that synthesizes information from many distinct biological sources. To train these architectures at scale, we gather large amounts of data from public sources to create a dataset of nearly 40 million measurements across more than 200 biological targets. We investigate several aspects of the multitask framework by performing a series of empirical studies and obtain some interesting results: (1) massively multitask networks obtain predictive accuracies significantly better than single-task methods, (2) the predictive power of multitask networks improves as additional tasks and data are added, (3) the total amount of data and the total number of tasks both contribute significantly to multitask improvement, and (4) multitask networks afford limited transferability to tasks not in the training set. Our results underscore the need for greater data sharing and further algorithmic innovation to accelerate the drug discovery process.
Generalized Dantzig Selector: Application to the k-support norm
Chatterjee, Soumyadeep, Chen, Sheng, Banerjee, Arindam
We propose a Generalized Dantzig Selector (GDS) for linear models, in which any norm encoding the parameter structure can be leveraged for estimation. We investigate both computational and statistical aspects of the GDS. Based on conjugate proximal operator, a flexible inexact ADMM framework is designed for solving GDS, and non-asymptotic high-probability bounds are established on the estimation error, which rely on Gaussian width of unit norm ball and suitable set encompassing estimation error. Further, we consider a non-trivial example of the GDS using $k$-support norm. We derive an efficient method to compute the proximal operator for $k$-support norm since existing methods are inapplicable in this setting. For statistical analysis, we provide upper bounds for the Gaussian widths needed in the GDS analysis, yielding the first statistical recovery guarantee for estimation with the $k$-support norm. The experimental results confirm our theoretical analysis.
Cascading Randomized Weighted Majority: A New Online Ensemble Learning Algorithm
Zamani, Mohammadzaman, Beigy, Hamid, Shaban, Amirreza
With the increasing volume of data in the world, the best approach for learning from this data is to exploit an online learning algorithm. Online ensemble methods are online algorithms which take advantage of an ensemble of classifiers to predict labels of data. Prediction with expert advice is a well-studied problem in the online ensemble learning literature. The Weighted Majority algorithm and the randomized weighted majority (RWM) are the most well-known solutions to this problem, aiming to converge to the best expert. Since among some expert, The best one does not necessarily have the minimum error in all regions of data space, defining specific regions and converging to the best expert in each of these regions will lead to a better result. In this paper, we aim to resolve this defect of RWM algorithms by proposing a novel online ensemble algorithm to the problem of prediction with expert advice. We propose a cascading version of RWM to achieve not only better experimental results but also a better error bound for sufficiently large datasets.
Feature Selection with Redundancy-complementariness Dispersion
Chen, Zhijun, Wu, Chaozhong, Zhang, Yishi, Huang, Zhen, Ran, Bin, Zhong, Ming, Lyu, Nengchao
Feature selection has attracted significant attention in data mining and machine learning in the past decades. Many existing feature selection methods eliminate redundancy by measuring pairwise inter-correlation of features, whereas the complementariness of features and higher inter-correlation among more than two features are ignored. In this study, a modification item concerning the complementariness of features is introduced in the evaluation criterion of features. Additionally, in order to identify the interference effect of already-selected False Positives (FPs), the redundancy-complementariness dispersion is also taken into account to adjust the measurement of pairwise inter-correlation of features. To illustrate the effectiveness of proposed method, classification experiments are applied with four frequently used classifiers on ten datasets. Classification results verify the superiority of proposed method compared with five representative feature selection methods. Keywords: Classification, Feature selection, Relevance, Redundancy, Complementariness, Redundancy-complementariness dispersion 1. Introduction With the fast development of the world, the dimensional and size of data is fast-growing in most kinds of fields which challenge the data mining and machine learning techniques. Feature selection is an important and useful method that can effectively reduce the dimensionality of feature space while retaining a relatively high accuracy in representing the original data. The effects of feature selection [9] have been widely recognized for its abilities in facilitating data interpretation, reducing acquisition and storage requirements, increasing learning speeds, improving generalization performance, etc.
A New Intelligence Based Approach for Computer-Aided Diagnosis of Dengue Fever
Rao, Vadrevu Sree Hari, Kumar, Mallenahalli Naresh
Identification of the influential clinical symptoms and laboratory features that help in the diagnosis of dengue fever in early phase of the illness would aid in designing effective public health management and virological surveillance strategies. Keeping this as our main objective we develop in this paper, a new computational intelligence based methodology that predicts the diagnosis in real time, minimizing the number of false positives and false negatives. Our methodology consists of three major components (i) a novel missing value imputation procedure that can be applied on any data set consisting of categorical (nominal) and/or numeric (real or integer) (ii) a wrapper based features selection method with genetic search for extracting a subset of most influential symptoms that can diagnose the illness and (iii) an alternating decision tree method that employs boosting for generating highly accurate decision rules. The predictive models developed using our methodology are found to be more accurate than the state-of-the-art methodologies used in the diagnosis of the dengue fever.
High-Dimensional Longitudinal Classification with the Multinomial Fused Lasso
Adhikari, Samrachana, Lecci, Fabrizio, Becker, James T., Junker, Brian W., Kuller, Lewis H., Lopez, Oscar L., Tibshirani, Ryan J.
We study regularized estimation in high-dimensional longitudinal classification problems, using the lasso and fused lasso regularizers. The constructed coefficient estimates are piecewise constant across the time dimension in the longitudinal problem, with adaptively selected change points (break points). We present an efficient algorithm for computing such estimates, based on proximal gradient descent. We apply our proposed technique to a longitudinal data set on Alzheimer's disease from the Cardiovascular Health Study Cognition Study, and use this data set to motivate and demonstrate several practical considerations such as the selection of tuning parameters, and the assessment of model stability.
Understanding Kernel Ridge Regression: Common behaviors from simple functions to density functionals
Vu, Kevin, Snyder, John, Li, Li, Rupp, Matthias, Chen, Brandon F., Khelif, Tarek, Mรผller, Klaus-Robert, Burke, Kieron
Machine learning (ML) is a powerful data-driven method for learning patterns in high-dimensional spaces via induction, and has had widespread success in many fields including more recent applications in quantum chemistry and materials science [1-9]. Here we are interested in applications of ML to construction of density functionals [10-14], which have focused so far on approximating the kinetic energy (KE) of non-interacting electrons. An accurate, general approximation to this could make orbital-free DFT a practical reality. However, ML methods have been developed within the areas of statistics and computer science, and have been applied to a huge variety of data, including neuroscience, image and text processing, and robotics [15]. Thus, they are quite general and have not been tailored to account for specific details of the quantum problem.
MACHINE INTELLIGENCE 13
The two outstanding figures in the history of computer science are Alan Turing and John von Neumann, and they shared the view that logic was the key to understanding and automating computation. In particular, it was Turing who gave us in the mid-1930s the fundamental analysis, and the logical definition, of the concept of'computability by machine' and who discovered the surprising and beautiful basic fact that there exist universal machines which by suitable programming can be made to t This essay is an expanded and revised version of one entitled The Role of Logic in Computer Science and Artificial Intelligence, which was completed in January 1992 (and was later published in the Proceedings of the Fifth Generation computer Systems 1992 Conference). Since completing that essay I have had the benefit of extremely helpful discussions on many of the details with Professor Donald Michie and Professor I. J. Good, both of whom knew Turing well during the war years at Bletchley Park. Professor J. A. N. Lee, whose knowledge of the literature and archives of the history of computing is encyclopedic, also provided additional information, some of which is still unpublished. Further light has very recently been shed on the von Neumann side of the story by Norman Macrae's excellent biography John von Neumann (Macrae 1992). Accordingly, it seemed appropriate to undertake a more complete and thorough version of the FGCS'92 essay, focussing somewhat more on the interesting historical and biographical issues. I am grateful to Donald Michie and Stephen Muggleton for inviting me to contribute such a'second edition' to the present volume, and I would also like to thank the Institute for New Computer Technology (ICOT) for kind permission to make use of the FGCS'92 essay in this way. 1 LOGIC, COMPUTERS, TURING, AND VON NEUMANN