Goto

Collaborating Authors

 Accuracy


Superensemble classifier for learning from imbalanced business school data set

arXiv.org Machine Learning

Private business schools in India face a common problem of selecting quality students for their MBA programs to achieve desired placement percentage. Business school data set is biased towards one class, i.e., imbalanced in nature. And learning from imbalanced data set is a difficult proposition. Most existing classification methods tend not to perform well on minority class examples when the data set is extremely imbalanced, because they aim to optimize the overall accuracy without considering the relative distribution of each class. The aim of the paper is twofold. We first propose an integrated sampling technique with an ensemble of classification tree (CT) and artificial neural network (ANN) model as one of the methodologies which works better compared to other similar methods. Further we propose a superensemble imbalanced classifier which works better on the original business school data set. Our proposed superensemble classifier not only handles the imbalance data set but also achieves higher accuracy in case of feature selection cum classification problems. The proposal has been compared with other state-of-the-art classifiers and found to be very competitive.


MBA: Mini-Batch AUC Optimization

arXiv.org Machine Learning

Area under the receiver operating characteristics curve (AUC) is an important metric for a wide range of signal processing and machine learning problems, and scalable methods for optimizing AUC have recently been proposed. However, handling very large datasets remains an open challenge for this problem. This paper proposes a novel approach to AUC maximization, based on sampling mini-batches of positive/negative instance pairs and computing U-statistics to approximate a global risk minimization problem. The resulting algorithm is simple, fast, and learning-rate free. We show that the number of samples required for good performance is independent of the number of pairs available, which is a quadratic function of the positive and negative instances. Extensive experiments show the practical utility of the proposed method.


Trump Crony Proves Widespread Voter Fraud Doesn't Exist

Slate

Did voter fraud swing New Hampshire away from Donald Trump in the 2016 election? Absolutely not, according to an exhaustive investigation conducted by the state's attorney general and secretary of state, which, counter to Trump's persistent allegations, turned up no evidence of "serious voter fraud." Instead, the inquiry provided further evidence that the tools Republicans use to detect voter fraud are fatally flawed, churning out a huge number of false positives. And while the New Hampshire investigation ultimately debunked Trump's paranoia, it came perilously close to disenfranchising thousands of lawful voters. Republicans have seized upon New Hampshire as the putative epicenter of American voter fraud for two reasons.


Use cases of AI-based FinTech solutions: from fraud detection to big data mining - Techfoliance

#artificialintelligence

Artificial Intelligence has caught the attention of the world, including financial institutions. While AI-based FinTech solutions may not get the same attention than autonomous cars or robot dogs, its impact will undoubtedly be felt. AI-based FinTech solutions will both save financial institutions billions in cost and create billions in additional revenues, potentially creating more than a trillion in additional profits in the financial services industry. A study done by Accenture showed that the implementation of AI in the financial sector could lead to a 31% increase in profitability rates by 2035.[1] Moreover, AI will allow to customize financial services delivered to clients, leading to an enhanced customer experience. Fraud detection and management is imperative for financial institutions now more than ever, as they are faced with new and more sophisticated threats to client data and security breaches.


Amazon needs to come clean about racial bias in its algorithms

#artificialintelligence

Yesterday, Amazon's quiet Rekognition program became very public, as new documents obtained by the ACLU of Northern California showed the system partnering with the city of Orlando and police camera vendors like Motorola Solutions for an aggressive new real-time facial recognition service. Amazon insists that the service is a simple object-recognition tool and will only be used for legal purposes. But even if we take the company at its word, the project raises serious concerns, particularly around racial bias. Facial recognition systems have long struggled with higher error rates for women and people of color -- error rates that can translate directly into more stops and arrests for marginalized groups. And while some companies have responded with public bias testing, Amazon hasn't shared any data on the issue, if it's collected data at all. At the same time, it's already deploying its software in cities across the US, its growth driven by one of the largest cloud infrastructures in the world.


Dynamic Advisor-Based Ensemble (dynABE): Case Study in Stock Trend Prediction of a Major Critical Metal Producer

arXiv.org Machine Learning

The demand of metals by modern technology has been shifting from common base metals to a variety of minor metals, such as cobalt or indium. The industrial importance and limited geological availability of some minor metals have led to them being considered more "critical," and there is a growing interest in such critical metals and their producing companies. In this research, we create a novel framework, Dynamic Advisor-Based Ensemble (dynABE), to predict the stock trend of major critical metal producers. Specifically, dynABE first utilizes domain knowledge to group the features into different "advisors," each advisor dealing with a particular economic sector. Then through ensembles of weak classifiers, each advisor produces a prediction result, and all the advisors are combined again in a biased online update fashion to dynamically make the final prediction. Based on a misclassification error of 32% for Jinchuan Group's stock (HKG: 2362), we further test a simple stock trading strategy, which leads to a back-tested return of 296%, or an excess return of 130% within one year. In addition, the feature set selected by dynABE also suggests potentially influential factors to metal criticality, because stock prices of major producers influence metal production. Therefore, not only does this research propose a novel framework for specialized stock trend prediction, it also provides domain insights into dynamic features that potentially influence metal criticality.


Root-cause Analysis for Time-series Anomalies via Spatiotemporal Graphical Modeling in Distributed Complex Systems

arXiv.org Machine Learning

Performance monitoring, anomaly detection, and root-cause analysis in complex cyber-physical systems (CPSs) are often highly intractable due to widely diverse operational modes, disparate data types, and complex fault propagation mechanisms. This paper presents a new data-driven framework for root-cause analysis, based on a spatiotemporal graphical modeling approach built on the concept of symbolic dynamics for discovering and representing causal interactions among sub-systems of complex CPSs. We formulate the root-cause analysis problem as a minimization problem via the proposed inference based metric and present two approximate approaches for root-cause analysis, namely the sequential state switching ($S^3$, based on free energy concept of a restricted Boltzmann machine, RBM) and artificial anomaly association ($A^3$, a classification framework using deep neural networks, DNN). Synthetic data from cases with failed pattern(s) and anomalous node(s) are simulated to validate the proposed approaches. Real dataset based on Tennessee Eastman process (TEP) is also used for comparison with other approaches. The results show that: (1) $S^3$ and $A^3$ approaches can obtain high accuracy in root-cause analysis under both pattern-based and node-based fault scenarios, in addition to successfully handling multiple nominal operating modes, (2) the proposed tool-chain is shown to be scalable while maintaining high accuracy, and (3) the proposed framework is robust and adaptive in different fault conditions and performs better in comparison with the state-of-the-art methods.


Why Is My Classifier Discriminatory?

arXiv.org Machine Learning

Recent attempts to achieve fairness in predictive models focus on the balance between fairness and accuracy. In sensitive applications such as healthcare or criminal justice, this trade-off is often undesirable as any increase in prediction error could have devastating consequences. In this work, we argue that the fairness of predictions should be evaluated in context of the data, and that unfairness induced by inadequate samples sizes or unmeasured predictive variables should be addressed through data collection, rather than by constraining the model. We decompose cost-based metrics of discrimination into bias, variance, and noise, and propose actions aimed at estimating and reducing each term. Finally, we perform case-studies on prediction of income, mortality, and review ratings, confirming the value of this analysis. We find that data collection is often a means to reduce discrimination without sacrificing accuracy.


The proof of equivalent formulas of ridge regression

#artificialintelligence

Let's define $ \hat{x} $ as the optimal solution of the first problem and $ \tilde{x} $ as the optimal solution of the second problem. Namely you can always have a pair of $ t $ and $ \lambda \geq 0 $ such the solution of the problem is the same. How could we find a pair? Both problems are Convex and smooth so it should make things simpler. Pay attention that the 2 base equations are equivalent.


Self-Driving Cars and the Agony of Knowing What Matters

WIRED

In medicine, false positives are expensive, scary, and even painful. Yes, the doctor eventually tells you that the follow-up biopsy after that bloop on the mammogram puts you in the clear. But the intervening weeks are excruciating. A false negative is no better: "Go home, you're fine, those headaches are nothing to worry about." Anyone who builds detection systems--medical tests, security screening equipment, or the software that makes self-driving cars perceive and evaluate their surroundings--is aware of (and afraid of) both types of scenarios. The problem with avoiding both false positives and negatives, though, is that the more you do to get away from one, the closer you get to the other.