AITopics

1808.0967

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre:

Research Report (0.64)
Instructional Material (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Machine LearningAug-26-2018

Ensemble Learning Applied to Classify GPS Trajectories of Birds into Male or Female

Fayzur, Dewan

We describe our first-place solution to the Animal Behavior Challenge (ABC 2018) on predicting gender of bird from its GPS trajectory. The task consisted in predicting the gender of shearwater based on how they navigate themselves across a big ocean. The trajectories are collected from GPS loggers attached on shearwaters' body, and represented as a variable-length sequence of GPS points (latitude and longitude), and associated meta-information, such as the sun azimuth, the sun elevation, the daytime, the elapsed time on each GPS location after starting the trip, the local time (date is trimmed), and the indicator of the day starting the from the trip. We used ensemble of several variants of Gradient Boosting Classifier along with Gaussian Process Classifier and Support Vector Classifier after extensive feature engineering and we ranked first out of 74 registered teams. The variants of Gradient Boosting Classifier we tried are CatBoost (Developed by Yandex), LightGBM (Developed by Microsoft), XGBoost (Developed by Distributed Machine Learning Community). Our approach could easily be adapted to other applications in which the goal is to predict a classification output from a variable-length sequence.

artificial intelligence, dataset, machine learning, (15 more...)

1808.08613

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Portugal > Porto > Porto (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.36)

#artificialintelligenceAug-20-2018, 19:19:01 GMT

Random Forests · UC Business Analytics R Programming Guide

Bagging (bootstrap aggregating) regression trees is a technique that can turn a single tree model with high variance and poor predictive power into a fairly accurate prediction function. Unfortunately, bagging regression trees typically suffers from tree correlation, which reduces the overall performance of the model. Random forests are a modification of bagging that builds a large collection of de-correlated trees and have become a very popular "out-of-the-box" learning algorithm that enjoys good predictive performance. This tutorial will cover the fundamentals of random forests. This tutorial serves as an introduction to the random forests.

artificial intelligence, decision tree learning, machine learning, (17 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

arXiv.org Machine LearningAug-15-2018

Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions

Honegger, Milo

From self-driving vehicles and back-flipping robots to virtual assistants who book our next appointment at the hair salon or at that restaurant for dinner - machine learning systems are becoming increasingly ubiquitous. The main reason for this is that these methods boast remarkable predictive capabilities. However, most of these models remain black boxes, meaning that it is very challenging for humans to follow and understand their intricate inner workings. Consequently, interpretability has suffered under this ever-increasing complexity of machine learning models. Especially with regards to new regulations, such as the General Data Protection Regulation (GDPR), the necessity for plausibility and verifiability of predictions made by these black boxes is indispensable. Driven by the needs of industry and practice, the research community has recognised this interpretability problem and focussed on developing a growing number of so-called explanation methods over the past few years. These methods explain individual predictions made by black box machine learning models and help to recover some of the lost interpretability. With the proliferation of these explanation methods, it is, however, often unclear, which explanation method offers a higher explanation quality, or is generally better-suited for the situation at hand. In this thesis, we thus propose an axiomatic framework, which allows comparing the quality of different explanation methods amongst each other. Through experimental validation, we find that the developed framework is useful to assess the explanation quality of different explanation methods and reach conclusions that are consistent with independent research.

explanation, machine learning, natural language, (22 more...)

1808.05054

Country:

North America > United States > Wisconsin > Price County (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)

Genre:

Research Report > New Finding (1.00)
Overview (0.92)

Industry:

Transportation (1.00)
Law (1.00)
Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(6 more...)

#artificialintelligenceAug-14-2018, 22:50:42 GMT

Ensemble Machine Learning in Python: Random Forest, AdaBoost

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.

artificial intelligence, decision tree learning, reinforcement learning, (5 more...)

Genre: Instructional Material (0.36)

Industry:

Information Technology (0.93)
Automobiles & Trucks (0.79)
Leisure & Entertainment > Games (0.57)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
(2 more...)

#artificialintelligenceAug-10-2018, 20:09:46 GMT

Fine-tuning XGBoost in Python like a boss – Towards Data Science

XGBoost (or eXteme Gradient Boosting) is not to be introduced anymore, proved relevant in only too many data science competitions, is still one model that is tricky to fine-tune if you have only been starting playing with it. Because if you have big datasets, and you run a naive grid search on 5 different parameters and having for each of them 5 possible values, then you'll have 5⁵ 3,125 iterations to go. If one iteration takes 10 minutes to run, you'll have more than 21 days to wait before getting your parameters (I don't talk about Python crashing, without letting you know, and you waiting too long before realizing it). I suppose here that you made correctly your job of feature engineering first. Specifically with categorical features, since XGBoost does not take categorical features in input.

artificial intelligence, machine learning, xgboost, (8 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.90)

arXiv.org Machine LearningAug-10-2018

Gradient and Newton Boosting for Classification and Regression

Sigrist, Fabio

Boosting refers to a type of classification and regression algorithms that enjoy large popularity due to their excellent predictive accuracy on a wide range of datasets. The first boosting algorithms for classification, including the well known AdaBoost algorithm, were introduced by Schapire [1990], Freund and Schapire [1995], and Freund et al. [1996]. Later, several authors [Breiman, 1998, 1999, Friedman et al., 2000, Mason et al., 2000, Friedman, 2001] introduced the statistical view of boosting as a stagewise optimization approach. In particular, Friedman et al. [2000] first introduced boosting algorithms which iteratively optimize Bernoulli and multinomial likelihoods for binary and multiclass classification using Newton updates. Further, Friedman [2001] presented gradient descent based boosting algorithms for both regression and classification tasks with general loss functions.

artificial intelligence, dataset, machine learning, (16 more...)

1808.03064

Country:

Europe > Switzerland > Zug > Zug (0.04)
Europe > Italy > Apulia > Bari (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

#artificialintelligenceAug-1-2018, 18:18:38 GMT

Machine Learning Kaggle Competition Part Two: Improving

I recommend against the "lone genius" path, not only because it's exceedingly lonely, but also because you will miss out on the most important part of a Kaggle competition: learning from other data scientists. If you work by yourself, you end up relying on the same old methods while the rest of the world adopts more efficient and accurate techniques. As a concrete example, I recently have been dependent on the random forest model, automatically applying it to any supervised machine learning task. This competition finally made me realize that although the random forest is a decent starting model, everyone else has moved on to the superior gradient boosting machine. I also don't recommend the "copy and paste" approach, not because I'm against using other's code (with proper attribution), but because you are still limiting your chances to learn.

artificial intelligence, kaggle competition, machine learning

Genre: Contests & Prizes (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.86)

Pakrashi, Arjun, Mac Namee, Brian

Kalman Filter-based Heuristic Ensemble: A New Perspective on Ensemble Classification Using Kalman Filters

arXiv.org Artificial IntelligenceJul-30-2018

A classifier ensemble is a combination of multiple diverse classifier models whose outputs are aggregated into a single prediction. Ensembles have been repeatedly shown to perform better than single classifier models, therefore ensembles has been always a subject of research. The objective of this paper is to introduce a new perspective on ensemble classification by considering the training of the ensemble as a state estimation problem. The state is estimated using noisy measurements, and these measurements are then combined using a Kalman filter, within which heuristics are used. An implementation of this perspective, Kalman Filter based Heuristic Ensemble (KFHE), is also presented in this paper. Experiments performed on several datasets, indicate the effectiveness and the potential of KFHE when compared with boosting and bagging. Moreover, KFHE was found to perform comparatively better than bagging and boosting in the case of datasets with noisy class label assignments.

artificial intelligence, dataset, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1807.11429

Country: North America > United States (0.93)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

#artificialintelligenceJul-29-2018, 23:01:17 GMT

Tuning xgboost in R: Part I

Tuning a Boosting algorithm for the first time may be a very confusing task. There are so many parameters to choose and they all have different behaviour on the results. Also, the best choice may depends on the data. Every time I get a new dataset I learn something new. A good understanding of classification and regression trees (CART) is also helpful because we will be boosting trees, you can start here if you have no idea of what a CART is.

artificial intelligence, machine learning, out-of-sample performance, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.52)