AITopics

1810.11197

Country: Asia > Middle East > Israel (0.28)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Sahin, Durmus Ozkan, Kural, Oguz Emre, Kilic, Erdal, Karabina, Armagan

A Text Classification Application: Poet Detection from Poetry

arXiv.org Machine LearningOct-24-2018

With the widespread use of the internet, the size of the text data increases day by day. Poems can be given as an example of the growing text. In this study, we aim to classify poetry according to poet. Firstly, data set consisting of three different poetry of poets written in English have been constructed. Then, text categorization techniques are implemented on it. Chi-Square technique are used for feature selection. In addition, five different classification algorithms are tried. These algorithms are Sequential minimal optimization, Naive Bayes, C4.5 decision tree, Random Forest and k-nearest neighbors. Although each classifier showed very different results, over the 70% classification success rate was taken by sequential minimal optimization technique.

machine learning, natural language, processing and communication application conference, (16 more...)

1810.11414

Country: Asia > Middle East > Republic of Türkiye (1.00)

Genre: Research Report (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

#artificialintelligenceOct-23-2018, 11:43:18 GMT

Machine Learning and Credit Risk Analytics

In the last few years, new statistical algorithms have become very popular. Traditional scorecards were based on one decision tree, or "logistic regression." The newer algorithms represent a combination of hundreds of decision trees instead of one single tree. These algorithms also provide much more accurate predictions compared to traditional methods. The current hype around machine learning methods typically revolves around these algorithms in particular: random forests, XgBoost, and deep learning.

artificial intelligence, learning and credit risk analytic, machine learning, (3 more...)

Industry:

Banking & Finance > Credit (0.53)
Banking & Finance > Risk Management (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.61)

#artificialintelligenceOct-23-2018, 00:12:22 GMT

What is a Decision Tree in Machine Learning? – Hacker Noon

Decision trees, as the name implies, are trees of decisions. You have a question, usually a yes or no (binary; 2 options) question with two branches (yes and no) leading out of the tree. You can get more options than 2, but for this article, we're only using 2 options. Trees are weird in computer science. Instead of growing from a root upwards, they grow downwards.

artificial intelligence, machine learning, mushroom, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Chakrabarty, Navoneel, Biswas, Sanket

A Statistical Approach to Adult Census Income Level Prediction

arXiv.org Machine LearningOct-23-2018

The prominent inequality of wealth and income is a huge concern especially in the United States. The likelihood of diminishing poverty is one valid reason to reduce the world's surging level of economic inequality. The principle of universal moral equality ensures sustainable development and improve the economic stability of a nation. Governments in different countries have been trying their best to address this problem and provide an optimal solution. This study aims to show the usage of machine learning and data mining techniques in providing a solution to the income equality problem. The UCI Adult Dataset has been used for the purpose. Classification has been done to predict whether a person's yearly income in US falls in the income category of either greater than 50K Dollars or less equal to 50K Dollars category based on a certain set of attributes. The Gradient Boosting Classifier Model was deployed which clocked the highest accuracy of 88.16%, eventually breaking the benchmark accuracy of existing works.

artificial intelligence, category, machine learning, (14 more...)

1810.10076

Country: North America > United States > California (0.14)

Genre: Research Report (0.52)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.98)

Lorenzen, Stephan Sloth, Igel, Christian, Seldin, Yevgeny

On PAC-Bayesian Bounds for Random Forests

arXiv.org Machine LearningOct-23-2018

Existing guarantees in terms of rigorous upper bounds on the generalization error for the original random forest algorithm, one of the most frequently used machine learning methods, are unsatisfying. We discuss and evaluate various PAC-Bayesian approaches to derive such bounds. The bounds do not require additional hold-out data, because the out-of-bag samples from the bagging in the training process can be exploited. A random forest predicts by taking a majority vote of an ensemble of decision trees. The first approach is to bound the error of the vote by twice the error of the corresponding Gibbs classifier (classifying with a single member of the ensemble selected at random). However, this approach does not take into account the effect of averaging out of errors of individual classifiers when taking the majority vote. This effect provides a significant boost in performance when the errors are independent or negatively correlated, but when the correlations are strong the advantage from taking the majority vote is small. The second approach based on PAC-Bayesian C-bounds takes dependencies between ensemble members into account, but it requires estimating correlations between the errors of the individual classifiers. When the correlations are high or the estimation is poor, the bounds degrade. In our experiments, we compute generalization bounds for random forests on various benchmark data sets. Because the individual decision trees already perform well, their predictions are highly correlated and the C-bounds do not lead to satisfactory results. For the same reason, the bounds based on the analysis of Gibbs classifiers are typically superior and often reasonably tight. Bounds based on a validation set coming at the cost of a smaller training set gave better performance guarantees, but worse performance in most experiments.

artificial intelligence, classifier, machine learning, (17 more...)

1810.09746

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Mousa, Saleh, Ishak, Sherif

Comparative Evaluation of Tree-Based Ensemble Algorithms for Short-Term Travel Time Prediction

arXiv.org Artificial IntelligenceOct-23-2018

Disseminating accurate travel time information to road users helps achieve traffic equilibrium and reduce traffic congestion. The deployment of Connected Vehicles technology will provide unique opportunities for the implementation of travel time prediction models. The aim of this study is twofold: (1) estimate travel times in the freeway network at five-minute intervals using Basic Safety Messages (BSM); (2) develop an eXtreme Gradient Boosting (XGB) model for short-term travel time prediction on freeways. The XGB tree-based ensemble prediction model is evaluated against common tree-based ensemble algorithms and the evaluations are performed at five-minute intervals over a 30-minute horizon. BSMs generated by the Safety Pilot Model Deployment conducted in Ann Arbor, Michigan, were used. Nearly two billion messages were processed for providing travel time estimates for the entire freeway network. A Combination of grid search and five-fold cross-validation techniques using the travel time estimates were used for developing the prediction models and tuning their parameters. About 9.6 km freeway stretch was used for evaluating the XGB together with the most common tree-based ensemble algorithms. The results show that XGB is superior to all other algorithms, followed by the Gradient Boosting. XGB travel time predictions were accurate and consistent with variations during peak periods, with mean absolute percentage error in prediction about 5.9% and 7.8% for 5-minute and 30-minute horizons, respectively. Additionally, through applying the developed models to another 4.7 km stretch along the eastbound segment of M-14, the XGB demonstrated its considerable advantages in travel time prediction during congested and uncongested conditions.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1810.10102

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.24)

Genre:

Research Report > New Finding (0.49)
Research Report > Experimental Study (0.48)

Industry: Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Guidotti, Riccardo, Ruggieri, Salvatore

Assessing the Stability of Interpretable Models

arXiv.org Artificial IntelligenceOct-22-2018

Interpretable classification models are built with the purpose of providing a comprehensible description of the decision logic to an external oversight agent. When considered in isolation, a decision tree, a set of classification rules, or a linear model, are widely recognized as human-interpretable. However, such models are generated as part of a larger analytical process, which, in particular, comprises data collection and filtering. Selection bias in data collection or in data pre-processing may affect the model learned. Although model induction algorithms are designed to learn to generalize, they pursue optimization of predictive accuracy. It remains unclear how interpretability is instead impacted. We conduct an experimental analysis to investigate whether interpretable models are able to cope with data selection bias as far as interpretability is concerned.

artificial intelligence, machine learning, stability, (18 more...)

arXiv.org Artificial Intelligence

1810.09352

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

#artificialintelligenceOct-19-2018, 19:11:35 GMT

Machine learning predicts World Cup winner

The random-forest technique has emerged in recent years as a powerful way to analyze large data sets while avoiding some of the pitfalls of other data-mining methods. It is based on the idea that some future event can be determined by a decision tree in which an outcome is calculated at each branch by reference to a set of training data. However, decision trees suffer from a well-known problem. In the latter stages of the branching process, decisions can become severely distorted by training data that is sparse and prone to huge variation at this kind of resolution, a problem known as overfitting. The random-forest approach is different.

artificial intelligence, groll and co, machine learning, (16 more...)

Country:

Europe (0.41)
North America > United States > California > San Francisco County > San Francisco (0.15)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceOct-18-2018, 00:09:57 GMT

Gradient Boosting Decision trees: XGBoost vs LightGBM

Gradient boosting decision trees is the state of the art for structured data problems. Two modern algorithms that make gradient boosted tree models are XGBoost and LightGBM. In this article I'll summarize their introductory papers for each algorithm's approach. Gradient Boosting Decision Trees (GBDT) are currently the best techniques for building predictive models from structured data. Unlike models for analyzing images (for that you want to use a deep learning model), structured data problems can be solved very well with a lot of decision trees.

algorithm, artificial intelligence, machine learning, (15 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)