AITopics

1810.00974

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

#artificialintelligenceSep-30-2018, 14:36:55 GMT

Example of Random Forest application in Finance : Option Pricing

Let's assume we know how much Tesla share costs in 2W. Our'only' unknown is the future option value (Y_T), given all information we have at t 2W. In other terms, if you are in two weeks time (i.e. in the future), what's the expected value of your portfolio, made of this one american option. You have information at 2W and you want to predict the option value at 1M. Beforehand, we need to simulate multiple scenarios for Tesla share price. For model simplicity, we suppose Tesla Share follows a Geometric Brownian motion path with mean r (risk free rate) and volatility Sigma 20% (we refer interested readers to Stochastic processes theory).

artificial intelligence, machine learning, random forest application, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

#artificialintelligenceSep-29-2018, 17:32:36 GMT

How to visualize decision tree

The scikit tree does a good job of representing the tree structure, but we have a few quibbles. The colors aren't the best and it's not immediately obvious why some of the nodes are colored and some aren't. If the colors represent predicted class for this classifier, one would think just the leaves would be colored because only leaves have predictions. The count of samples of the various target classes in each node is somewhat useful, but a histogram would be even better. A target class color legend would be nice.

artificial intelligence, machine learning, visualize decision tree, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

#artificialintelligenceSep-27-2018, 22:42:22 GMT

Introduction to Machine Learning for Coders: Launch · fast.ai

The course, recorded at the University of San Francisco as part of the Masters of Science in Data Science curriculum, covers the most important practical foundations for modern machine learning. There are 12 lessons, each of which is around two hours long--a list of all the lessons along with a screenshot from each is at the end of this post. There are some excellent machine learning courses already, most notably the wonderful Coursera course from Andrew Ng. But that course is showing its age now, particularly since it uses Matlab for coursework. This new course uses modern tools and libraries, including python, pandas, scikit-learn, and pytorch.

artificial intelligence, decision tree learning, machine learning, (15 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.25)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

#artificialintelligenceSep-26-2018, 16:40:10 GMT

An overview of feature selection strategies

Feature selection and engineering are the most important factors which affect the success of predictive modeling. This remains true even today despite the success of deep learning, which comes with automatic feature engineering. Parsimonious and interpretable models provide simple insights into business problems and therefore they are deemed very valuable. Furthermore, in many occasions the underlying size and structure of the data being analyzed may not allow the use of complex models that have many parameters to tune. For example, in clinical settings where the number of samples is usually much lower than the number of features one could extract (e.g.

artificial intelligence, decision tree learning, machine learning, (17 more...)

Genre: Research Report > Experimental Study (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.30)

#artificialintelligenceSep-18-2018, 14:52:54 GMT

The Mathematics of Decision Trees, Random Forest and Feature Importance in Scikit-learn and Spark

This post attempts to consolidate information on tree algorithms and their implementations in Scikit-learn and Spark. In particular, it was written to provide clarification on how feature importance is calculated. There are many great resources online discussing how decision trees and random forests are created and this post is not intended to be that. Although it includes short definitions for context, it assumes the reader has a grasp on these concepts and wishes to know how the algorithms are implemented in Scikit-learn and Spark. Decision trees learn how to best split the dataset into smaller and smaller subsets to predict the target value.

artificial intelligence, machine learning, scikit-learn and spark, (13 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Kalatian, Arash, Farooq, Bilal

Mobility Mode Detection Using WiFi Signals

arXiv.org Machine LearningSep-15-2018

We utilize Wi-Fi communications from smartphones to predict their mobility mode, i.e. walking, biking and driving. Wi-Fi sensors were deployed at four strategic locations in a closed loop on streets in downtown Toronto. Deep neural network (Multilayer Perceptron) along with three decision tree based classifiers (Decision Tree, Bagged Decision Tree and Random Forest) are developed. Results show that the best prediction accuracy is achieved by Multilayer Perceptron, with 86.52% correct predictions of mobility modes.

algorithm, artificial intelligence, machine learning, (16 more...)

1809.05788

Country:

North America > Canada > Ontario > Toronto (0.25)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

#artificialintelligenceSep-11-2018, 12:42:13 GMT

Decision Trees for Classification: A Machine Learning Algorithm

It does not require any statistical knowledge to read and interpret them. Its graphical representation is very intuitive and users can easily relate their hypothesis. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant variables and relation between two or more variables. With the help of decision trees, we can identify features that have better power to predict target variable. For example, we are working on a problem where we have information available in hundreds of variables, there decision tree will help to identify most significant variable. Less data cleaning required: It requires less data cleaning compared to some other modeling techniques. It is not influenced by outliers and missing values to a fair degree. Data type is not a constraint: It can handle both numerical and categorical variables.

artificial intelligence, machine learning, node, (15 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Tixier, Antoine J. -P., Rossi, Maria-Evgenia G., Malliaros, Fragkiskos D., Read, Jesse, Vazirgiannis, Michalis

Perturb and Combine to Identify Influential Spreaders in Real-World Networks

arXiv.org Machine LearningSep-4-2018

Recent research has shown that graph degeneracy algorithms, which decompose a network into a hierarchy of nested subgraphs of decreasing size and increasing density, are very effective at detecting the good spreaders in a network. However, it is also known that degeneracy-based decompositions of a graph are unstable to small perturbations of the network structure. In Machine Learning, the performance of unstable classification and regression methods, such as fully-grown decision trees, can be greatly improved by using Perturb and Combine (P&C) strategies such as bagging (bootstrap aggregating). Therefore, we propose a P&C procedure for networks that (1) creates many perturbed versions of a given graph, (2) applies a node scoring function separately to each graph (such as a degeneracy-based one), and (3) combines the results. We conduct real-world experiments on the tasks of identifying influential spreaders in large social networks, and influential words (keywords) in small word co-occurrence networks. We use the k-core, generalized k-core, and PageRank algorithms as our vertex scoring functions. In each case, using the aggregated scores brings significant improvements compared to using the scores computed on the original graphs. Finally, a bias-variance analysis suggests that our P&C procedure works mainly by reducing bias, and that therefore, it should be capable of improving the performance of all vertex scoring functions, not only unstable ones.

information retrieval, machine learning, node, (22 more...)

1807.09586

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.50)

Industry:

Information Technology (0.49)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Lucas, Benjamin, Shifaz, Ahmed, Pelletier, Charlotte, O'Neill, Lachlan, Zaidi, Nayyar, Goethals, Bart, Petitjean, Francois, Webb, Geoffrey I.

Proximity Forest: An effective and scalable distance-based classifier for time series

arXiv.org Machine LearningAug-31-2018

Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10 thousand time series only; which may explain why the primary research focus has been in creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state of the art models Elastic Ensemble and COTE.

artificial intelligence, data mining, machine learning, (18 more...)

1808.10594

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Hungary > Budapest > Budapest (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.66)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)