AITopics | Performance Analysis

Collaborating Authors

Performance Analysis

News Overviews Instructional Materials AI-Alerts Classics

Interpretable Predictions of Tree-based Ensembles via Actionable Feature Tweaking

Tolomei, Gabriele, Silvestri, Fabrizio, Haines, Andrew, Lalmas, Mounia

arXiv.org Machine LearningJun-20-2017

Machine-learned models are often described as "black boxes". In many real-world applications however, models may have to sacrifice predictive power in favour of human-interpretability. When this is the case, feature engineering becomes a crucial task, which requires significant and time-consuming human effort. Whilst some features are inherently static, representing properties that cannot be influenced (e.g., the age of an individual), others capture characteristics that could be adjusted (e.g., the daily amount of carbohydrates taken). Nonetheless, once a model is learned from the data, each prediction it makes on new instances is irreversible - assuming every instance to be a static point located in the chosen feature space. There are many circumstances however where it is important to understand (i) why a model outputs a certain prediction on a given instance, (ii) which adjustable features of that instance should be modified, and finally (iii) how to alter such a prediction when the mutated instance is input back to the model. In this paper, we present a technique that exploits the internals of a tree-based ensemble classifier to offer recommendations for transforming true negative instances into positively predicted ones. We demonstrate the validity of our approach using an online advertising application. First, we design a Random Forest classifier that effectively separates between two types of ads: low (negative) and high (positive) quality ads (instances). Then, we introduce an algorithm that provides recommendations that aim to transform a low quality ad (negative instance) into a high quality one (positive instance). Finally, we evaluate our approach on a subset of the active inventory of a large ad network, Yahoo Gemini.

artificial intelligence, machine learning, recommendation, (18 more...)

arXiv.org Machine Learning

doi: 10.1145/3097983.3098039

1706.06691

Country: North America > Canada (0.16)

Genre: Research Report (1.00)

Industry:

Marketing (1.00)
Information Technology > Services (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Reviving Threshold-Moving: a Simple Plug-in Bagging Ensemble for Binary and Multiclass Imbalanced Data

Collell, Guillem, Prelec, Drazen, Patil, Kaustubh

arXiv.org Machine LearningJun-20-2017

Class imbalance presents a major hurdle in the application of data mining methods. A common practice to deal with it is to create ensembles of classifiers that learn from resampled balanced data. For example, bagged decision trees combined with random undersampling (RUS) or the synthetic minority oversampling technique (SMOTE). However, most of the resampling methods entail asymmetric changes to the examples of different classes, which in turn can introduce its own biases in the model. Furthermore, those methods require a performance measure to be specified a priori before learning. An alternative is to use a so-called threshold-moving method that a posteriori changes the decision threshold of a model to counteract the imbalance, thus has a potential to adapt to the performance measure of interest. Surprisingly, little attention has been paid to the potential of combining bagging ensemble with threshold-moving. In this paper, we present probability thresholding bagging (PT-bagging), a versatile plug-in method that fills this gap. Contrary to usual rebalancing practice, our method preserves the natural class distribution of the data resulting in well calibrated posterior probabilities. We also extend the proposed method to handle multiclass data. The method is validated on binary and multiclass benchmark data sets. We perform analyses that provide insights into the proposed method.

artificial intelligence, machine learning, threshold, (18 more...)

arXiv.org Machine Learning

1606.08698

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Deep Session Learning for Cyber Security – Gab41

@machinelearnbotJun-19-2017, 15:45:28 GMT

Recently, Lab41 teamed up with Cyber Reboot (a sister lab) to explore the intersection of deep learning (DL) and cyber security in a software defined network (SDN) environment. We called it Poseidon, based heavily on it being a cool word with the letters s, d, and n in order. The goal was to use predictions about network traffic to automatically update a network's posture. This entailed three main objectives: performing deep learning on packet data, setting up an SDN environment, and scheduling a microservice to connect the two (for more information and code visit our Github page). Since I belong to the cult of deep learning, I was tasked with the first objective.

artificial intelligence, header, machine learning, (17 more...)

@machinelearnbot

Country: North America > Canada > New Brunswick > Fredericton (0.04)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Add feedback

An a Priori Exponential Tail Bound for k-Folds Cross-Validation

Abou-Moustafa, Karim, Szepesvari, Csaba

arXiv.org Machine LearningJun-19-2017

We consider a priori generalization bounds developed in terms of cross-validation estimates and the stability of learners. In particular, we first derive an exponential Efron-Stein type tail inequality for the concentration of a general function of n independent random variables. Next, under some reasonable notion of stability, we use this exponential tail bound to analyze the concentration of the k-fold crossvalidation (KFCV) estimate around the true risk of a hypothesis generated by a general learning rule. While the accumulated literature has often attributed this concentration to the bias and variance of the estimator, our bound attributes this concentration to the stability of the learning rule and the number of folds k. This insight raises valid concerns related to the practical use of KFCV, and suggests research directions to obtain reliable empirical estimates of the actual risk.

artificial intelligence, inequality, machine learning, (18 more...)

arXiv.org Machine Learning

1706.05801

Country:

North America > Canada > Alberta (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)

Add feedback

WWE Money In The Bank 2017: Live Stream Info, Start Time, Match Card For 'SmackDown Live' PPV

International Business TimesJun-17-2017, 15:15:17 GMT

At the very least, a new No.1 contender for each of the top two titles on "SmackDown Live" will be named Sunday night at WWE Money in the Bank 2017 in St. Louis. The pay-per-view could even see three different superstars hold the same belt in the span of just a few minutes, as was the case a year ago. Money in the Bank 2017 is scheduled to start at 8 p.m. EDT, Ordering the event on PPV costs $54.99, but fans can also watch MITB with a live stream on the WWE Network. A subscription to the network costs $9.99 per month, though new subscribers get the first month free. The match card is highlighted by the two main events, which could last for more than half of the PPV.

artificial intelligence, bank 2017, machine learning, (14 more...)

International Business Times

Country: North America > United States > New York (0.06)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.85)

Add feedback

[P] Low loss but large amount of false positives? • r/MachineLearning

#artificialintelligenceJun-17-2017, 05:50:24 GMT

I'm trying to classify data into two classes and my loss is less than 0.01 under both MSE and BCE. This seems contradictory to me that my performance on the training set is still so low - the ratio of true positives to false positives is at least 1:5 even when sweeping the threshold. Does this behavior mean my net is still not learning?

artificial intelligence, false positive, machine learning, (2 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Accelerating Innovation Through Analogy Mining

Hope, Tom, Chan, Joel, Kittur, Aniket, Shahaf, Dafna

arXiv.org Machine LearningJun-17-2017

The availability of large idea repositories (e.g., the U.S. patent database) could significantly accelerate innovation and discovery by providing people with inspiration from solutions to analogous problems. However, finding useful analogies in these large, messy, real-world repositories remains a persistent challenge for either human or automated methods. Previous approaches include costly hand-created databases that have high relational structure (e.g., predicate calculus representations) but are very sparse. Simpler machine-learning/information-retrieval similarity metrics can scale to large, natural-language datasets, but struggle to account for structural similarity, which is central to analogy. In this paper we explore the viability and value of learning simpler structural representations, specifically, "problem schemas", which specify the purpose of a product and the mechanisms by which it achieves that purpose. Our approach combines crowdsourcing and recurrent neural networks to extract purpose and mechanism vector representations from product descriptions. We demonstrate that these learned vectors allow us to find analogies with higher precision and recall than traditional information-retrieval methods. In an ideation experiment, analogies retrieved by our models significantly increased people's likelihood of generating creative ideas compared to analogies retrieved by traditional methods. Our results suggest a promising approach to enabling computational analogy at scale is to learn and leverage weaker structural representations.

analogy, information retrieval, machine learning, (22 more...)

arXiv.org Machine Learning

1706.05585

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Promising Solution (0.86)

Industry: Law > Intellectual Property & Technology Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

WWE Money In The Bank 2017: Predictions, Match Card For 'SmackDown Live' PPV

International Business TimesJun-14-2017, 14:25:07 GMT

Money in the Bank 2017 isn't considered to be among WWE's "Big 4" pay-per-views, though it probably should be. It's leaped ahead of Survivor Series as one of the most important events each year, and it's set for Sunday night in St. Louis. The PPV will feature members of the "SmackDown Live" roster, and there are only five matches scheduled because of the two big co-main events. Below are predictions for the entire Money in the Bank card. The argument can be made for a few wrestlers to win this match.

artificial intelligence, machine learning, prediction, (17 more...)

International Business Times

Country: North America > United States (0.06)

Industry: Leisure & Entertainment > Sports > Martial Arts (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.65)

Add feedback

A Practical Method for Solving Contextual Bandit Problems Using Decision Trees

Elmachtoub, Adam N., McNellis, Ryan, Oh, Sechan, Petrik, Marek

arXiv.org Machine LearningJun-14-2017

Many efficient algorithms with strong theoretical guarantees have been proposed for the contextual multi-armed bandit problem. However, applying these algorithms in practice can be difficult because they require domain expertise to build appropriate features and to tune their parameters. We propose a new method for the contextual bandit problem that is simple, practical, and can be applied with little or no domain expertise. Our algorithm relies on decision trees to model the context-reward relationship. Decision trees are non-parametric, interpretable, and work well without hand-crafted features. To guide the exploration-exploitation trade-off, we use a bootstrapping approach which abstracts Thompson sampling to non-Bayesian settings. We also discuss several computational heuristics and demonstrate the performance of our method on several datasets.

algorithm, decision tree learning, upstream oil & gas, (20 more...)

arXiv.org Machine Learning

1706.04687

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports (0.67)
Health & Medicine (0.47)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.36)

Add feedback

Performance Modelling of Planners from Homogeneous Problem Sets

Rosa, Tomás de la (Universidad Carlos III de Madrid) | Cenamor, Isabel (Universidad Carlos III de Madrid) | Fernández, Fernando (Universidad Carlos III de Madrid)

AAAI ConferencesJun-14-2017

Empirical performance models play an important role in the development of planning portfolios that make a per-domain or per-problem configuration of its search components. Even though such portfolios have shown their power when compared to other systems in current benchmarks, there is no clear evidence that they are capable to differentiate problems (instances) having similar input properties (in terms of objects, goals, etc.) but fairly different runtime for a given planner. In this paper we present a study of empirical performance models that are trained using problems having the same configuration, with the objective of guiding the models to recognize the underlying differences existing among homogeneous problems. In addition we propose a set of new features that boost the prediction capabilities under such scenarios. The results show that the learned models clearly performed over random classifiers, which reinforces the hypothesis that the selection of planners can be done on a per-instance basis when configuring a portfolio.

classifier, discrimination, planning task, (15 more...)

AAAI Conferences

Twenty-Seventh International Conference on Automated Planning and Scheduling

Country: Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)

Add feedback