Goto

Collaborating Authors

 Accuracy


Scalable Bayesian Rule Lists

arXiv.org Artificial Intelligence

We present an algorithm for building probabilistic rule lists that is two orders of magnitude faster than previous work. Rule list algorithms are competitors for decision tree algorithms. They are associative classifiers, in that they are built from pre-mined association rules. They have a logical structure that is a sequence of IF-THEN rules, identical to a decision list or one-sided decision tree. Instead of using greedy splitting and pruning like decision tree algorithms, we fully optimize over rule lists, striking a practical balance between accuracy, interpretability, and computational speed. The algorithm presented here uses a mixture of theoretical bounds (tight enough to have practical implications as a screening or bounding procedure), computational reuse, and highly tuned language libraries to achieve computational efficiency. Currently, for many practical problems, this method achieves better accuracy and sparsity than decision trees; further, in many cases, the computational time is practical and often less than that of decision trees. The result is a probabilistic classifier (which estimates P(y = 1|x) for each x) that optimizes the posterior of a Bayesian hierarchical model over rule lists.


Finding Common Characteristics Among NBA Playoff and Championship Teams: A Machine Learning Approach

arXiv.org Machine Learning

In this paper, we employ machine learning techniques to analyze seventeen seasons (1999-2000 to 2015-2016) of NBA regular season data from every team to determine the common characteristics among NBA playoff teams. Each team was characterized by 26 predictor variables and one binary response variable taking on a value of "TRUE" if a team had made the playoffs, and value of "FALSE" if a team had missed the playoffs. After fitting an initial classification tree to this problem, this tree was then pruned which decreased the test error rate. Further to this, a random forest of classification trees was grown which provided a very accurate model from which a variable importance plot was generated to determine which predictor variables had the greatest influence on the response variable. The result of this work was the conclusion that the most important factors in characterizing a team's playoff eligibility are a team's opponent number of assists per game, a team's opponent number of made two point shots per game, and a team's number of steals per game. This seems to suggest that defensive factors as opposed to offensive factors are the most important characteristics shared among NBA playoff teams. We then use neural networks to classify championship teams based on regular season data. From this, we show that the most important factor in a team not winning a championship is that team's opponent number of made three-point shots per game. This once again implies that defensive characteristics are of great importance in not only determining a team's playoff eligibility, but certainly, one can conclude that a lack of perimeter defense negatively impacts a team's championship chances in a given season. Further, it is shown that made two-point shots and defensive rebounding are by far the most important factor in a team's chances at winning a championship in a given season.


Multilabel Classification with R Package mlr

arXiv.org Machine Learning

Multilabel classification is a classification problem where multiple target labels can be assigned to each observation instead of only one, like in multiclass classification. It can be regarded as a special case of multivariate classification or multi-target prediction problems, for which the scale of each response variable can be of any kind, for example nominal, ordinal or interval. Originally, multilabel classification was used for text classification (McCallum, 1999; Schapire and Singer, 2000) and is now used in several applications in different research fields. For example, in image classification, a photo can belong to the classes mountain and sunset simultaneously. Zhang and Zhou (2008) and others (Boutell et al., 2004) used multilabel algorithms to classify scenes on images of natural environments.


WWE WrestleMania 33: Live Stream Info, Free Kickoff Show, PPV Details For 2017 Show

International Business Times

The biggest WWE event of 2017 is finally here. WrestleMania 33 gets underway Sunday afternoon at Camping World Stadium in Orlando, marking the unofficial end of the wrestling season. Fans can watch WrestleMania 33 with a live stream on WWE Network, for which a subscription costs $9.99 per month. New subscribers, however, will get their first month for free, including WrestleMania 33. Buying WrestleMania 33 on pay-per-view costs $64.99, and it has a 7 p.m. EDT start time. Eleven matches are scheduled for the PPV, but the other two WrestleMania 33 matches can be seen for free by anyone.


Naive Bayes Example using Golf Dataset

#artificialintelligence

The following notebook works through a really simple example of a Naive Bayes implementation. The aim of this machine learning application is to predict whether or not to play golf based on Weather conditions. Here we are going to read in the golf.csv This will read our CSV file into a pandas data frame. As with any Data Science application, data cleansing and feature selection play a vital role.


Automated Volumetric Intravascular Plaque Classification Using Optical Coherence Tomography

AI Magazine

An estimated 17.5 million people died from a cardiovascular disease in 2012, representing 31 percent of all global deaths. Most acute coronary events result from rupture of the protective fibrous cap overlying an atherosclerotic plaque. The task of early identification of plaque types that can potentially rupture is, therefore, of great importance. The state-of-the-art approach to imaging blood vessels is intravascular optical coherence tomography (IVOCT). However, currently, this is an offline approach where the images are first collected and then manually analyzed an image at a time to identify regions at risk of thrombosis. This process is extremely laborious, time consuming and prone to human error. We are building a system that, when complete, will provide interactive 3D visualization of a blood vessel as an IVOCT is in progress. The visualization will highlight different plaque types and enable quick identification of regions at risk for thrombosis. In this paper, we describe our approach, focusing on machine learning methods that are a key enabling technology. Our empirical results using real OCT data show that our approach can identify different plaque types efficiently with high accuracy across multiple patients.


On the Reliable Detection of Concept Drift from Streaming Unlabeled Data

arXiv.org Machine Learning

Classifiers deployed in the real world operate in a dynamic environment, where the data distribution can change over time. These changes, referred to as concept drift, can cause the predictive performance of the classifier to drop over time, thereby making it obsolete. To be of any real use, these classifiers need to detect drifts and be able to adapt to them, over time. Detecting drifts has traditionally been approached as a supervised task, with labeled data constantly being used for validating the learned model. Although effective in detecting drifts, these techniques are impractical, as labeling is a difficult, costly and time consuming activity. On the other hand, unsupervised change detection techniques are unreliable, as they produce a large number of false alarms. The inefficacy of the unsupervised techniques stems from the exclusion of the characteristics of the learned classifier, from the detection process. In this paper, we propose the Margin Density Drift Detection (MD3) algorithm, which tracks the number of samples in the uncertainty region of a classifier, as a metric to detect drift. The MD3 algorithm is a distribution independent, application independent, model independent, unsupervised and incremental algorithm for reliably detecting drifts from data streams. Experimental evaluation on 6 drift induced datasets and 4 additional datasets from the cybersecurity domain demonstrates that the MD3 approach can reliably detect drifts, with significantly fewer false alarms compared to unsupervised feature based drift detectors. The reduced false alarms enables the signaling of drifts only when they are most likely to affect classification performance. As such, the MD3 approach leads to a detection scheme which is credible, label efficient and general in its applicability.


Intraoperative margin assessment of human breast tissue in optical coherence tomography images using deep neural networks

arXiv.org Machine Learning

Objective: In this work, we perform margin assessment of human breast tissue from optical coherence tomography (OCT) images using deep neural networks (DNNs). This work simulates an intraoperative setting for breast cancer lumpectomy. Methods: To train the DNNs, we use both the state-of-the-art methods (Weight Decay and DropOut) and a newly introduced regularization method based on function norms. Commonly used methods can fail when only a small database is available. The use of a function norm introduces a direct control over the complexity of the function with the aim of diminishing the risk of overfitting. Results: As neither the code nor the data of previous results are publicly available, the obtained results are compared with reported results in the literature for a conservative comparison. Moreover, our method is applied to locally collected data on several data configurations. The reported results are the average over the different trials. Conclusion: The experimental results show that the use of DNNs yields significantly better results than other techniques when evaluated in terms of sensitivity, specificity, F1 score, G-mean and Matthews correlation coefficient. Function norm regularization yielded higher and more robust results than competing methods. Significance: We have demonstrated a system that shows high promise for (partially) automated margin assessment of human breast tissue, Equal error rate (EER) is reduced from approximately 12\% (the lowest reported in the literature) to 5\%\,--\,a 58\% reduction. The method is computationally feasible for intraoperative application (less than 2 seconds per image).


Will The Undertaker Retire After WrestleMania 33 Match? WWE Future Uncertain Before 2017 PPV

International Business Times

Will The Undertaker retire after WrestleMania 33? It's a question many WWE fans are asking ahead of the Deadman's match with Roman Reigns at the biggest wrestling event of 2017. The Undertaker, 52, made his WWE debut in 1990, and he's been the one constant in the company ever since. He has a 23-1 record at WrestleMania, and he's had a match at the PPV in every year since 2001. But questions surrounding the Deadman's age and health have a lot of people wondering if Sunday night could be the last time fans get to see The Undertaker compete in a WWE ring. "He just had hip surgery. He's beaten down real bad. Put it this way--last year when he went to the dressing room after his match, he told everyone he was done," Dave Meltzer of The Wrestling Observer told Richard Deitsch on last week's "Sports Illustrated Media Podcast."


Near Perfect Protein Multi-Label Classification with Deep Neural Networks

arXiv.org Machine Learning

Artificial neural networks (ANNs) have gained a well-deserved popularity among machine learning tools upon their recent successful applications in image- and sound processing and classification problems. ANNs have also been applied for predicting the family or function of a protein, knowing its residue sequence. Here we present two new ANNs with multi-label classification ability, showing impressive accuracy when classifying protein sequences into 698 UniProt families (AUC=99.99%) and 983 Gene Ontology classes (AUC=99.45%).