Goto

Collaborating Authors

 Performance Analysis


Exploring Recommendation Systems

@machinelearnbot

While we commonly associate recommendation systems with e-commerce, their application extends to any decision-making problem which requires pairing two types of things together. To understand why recommenders don't always work as well as we'd like them to, we set out to build some basic recommendation systems using publicly available data. The first ingredient for building a recommendation system is user interaction data. We experimented with two different datasets, one from Flickr and one from Amazon. The Flickr dataset contains interactions between users and photos that they liked; the Amazon dataset contains user ratings on books.


Time Series Segmentation through Automatic Feature Learning

arXiv.org Machine Learning

Internet of things (IoT) applications have become increasingly popular in recent years, with applications ranging from building energy monitoring to personal health tracking and activity recognition. In order to leverage these data, automatic knowledge extraction - whereby we map from observations to interpretable states and transitions - must be done at scale. As such, we have seen many recent IoT data sets include annotations with a human expert specifying states, recorded as a set of boundaries and associated labels in a data sequence. These data can be used to build automatic labeling algorithms that produce labels as an expert would. Here, we refer to human-specified boundaries as breakpoints. Traditional changepoint detection methods only look for statistically-detectable boundaries that are defined as abrupt variations in the generative parameters of a data sequence. However, we observe that breakpoints occur on more subtle boundaries that are non-trivial to detect with these statistical methods. In this work, we propose a new unsupervised approach, based on deep learning, that outperforms existing techniques and learns the more subtle, breakpoint boundaries with a high accuracy. Through extensive experiments on various real-world data sets - including human-activity sensing data, speech signals, and electroencephalogram (EEG) activity traces - we demonstrate the effectiveness of our algorithm for practical applications. Furthermore, we show that our approach achieves significantly better performance than previous methods.


Loan Prediction – Using PCA and Naive Bayes Classification with R

@machinelearnbot

Nowadays, there are numerous risks related to bank loans both for the banks and the borrowers getting the loans. The risk analysis about bank loans needs understanding about the risk and the risk level. Banks need to analyze their customers for loan eligibility so that they can specifically target those customers. Banks wanted to automate the loan eligibility process (real time) based on customer details such as Gender, Marital Status, Age, Occupation, Income, debts, and others provided in their online application form. As the number of transactions in banking sector is rapidly growing and huge data volumes are available, the customers' behavior can be easily analyzed and the risks around loan can be reduced.


Information gain ratio correction: Improving prediction with more balanced decision tree splits

arXiv.org Machine Learning

Decision trees algorithms use a gain function to select the best split during the tree's induction. This function is crucial to obtain trees with high predictive accuracy. Some gain functions can suffer from a bias when it compares splits of different arities. Quinlan proposed a gain ratio in C4.5's information gain function to fix this bias. In this paper, we present an updated version of the gain ratio that performs better as it tries to fix the gain ratio's bias for unbalanced trees and some splits with low predictive interest.


Machine Learning Model Metrics

#artificialintelligence

Kangaroo Kapital is the largest credit card company in Australia. Animals across the continent use Kangaroo Kapital credit cards to make all of their daily purchases, racking up points in the company's reward system. Since Australian animals have traditionally not worn much clothing, the challenges of carrying around cash are substantial. Only having to keep track of a single credit card is a big help for your average working wallaby. But since Australian animals have typically not worn much clothing, they still have a problem keeping track of even a single credit card.


Training Set Debugging Using Trusted Items

arXiv.org Machine Learning

Training set bugs are flaws in the data that adversely affect machine learning. The training set is usually too large for man- ual inspection, but one may have the resources to verify a few trusted items. The set of trusted items may not by itself be adequate for learning, so we propose an algorithm that uses these items to identify bugs in the training set and thus im- proves learning. Specifically, our approach seeks the smallest set of changes to the training set labels such that the model learned from this corrected training set predicts labels of the trusted items correctly. We flag the items whose labels are changed as potential bugs, whose labels can be checked for veracity by human experts. To find the bugs in this way is a challenging combinatorial bilevel optimization problem, but it can be relaxed into a continuous optimization problem. Ex- periments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appro- priate changes to the labels. Our algorithm is a step toward trustworthy machine learning.


Detecting and counting tiny faces

arXiv.org Machine Learning

Finding Tiny Faces (by Hu and Ramanan) proposes a novel approach to find small objects in an image. Our contribution consists in deeply understanding the choices of the paper together with applying and extending a similar method to a real world subject which is the counting of people in a public demonstration.


Understanding Naïve Bayes Classifier Using R – R-posts.com

#artificialintelligence

Chaitanya Sagar is the Founder and CEO of Perceptive Analytics. Perceptive Analytics has been chosen as one of the top 10 analytics companies to watch out for by Analytics India Magazine.


Drug Selection via Joint Push and Learning to Rank

arXiv.org Machine Learning

Selecting the right drugs for the right patients is a primary goal of precision medicine. In this manuscript, we consider the problem of cancer drug selection in a learning-to-rank framework. We have formulated the cancer drug selection problem as to accurately predicting 1). the ranking positions of sensitive drugs and 2). the ranking orders among sensitive drugs in cancer cell lines based on their responses to cancer drugs. We have developed a new learning-to-rank method, denoted as pLETORg , that predicts drug ranking structures in each cell line via using drug latent vectors and cell line latent vectors. The pLETORg method learns such latent vectors through explicitly enforcing that, in the drug ranking list of each cell line, the sensitive drugs are pushed above insensitive drugs, and meanwhile the ranking orders among sensitive drugs are correct. Genomics information on cell lines is leveraged in learning the latent vectors. Our experimental results on a benchmark cell line-drug response dataset demonstrate that the new pLETORg significantly outperforms the state-of-the-art method in prioritizing new sensitive drugs.


The best metric to measure accuracy of classification models CleverTap

#artificialintelligence

As an analyst, if you are looking at a metric to measure and maximize the overall accuracy of the classification model, MCC seems to the best bet since it is not only easily interpretable but also robust to changes in the prediction goal.