AITopics

1605.04262

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.48)

Sutera, Antonio, Louppe, Gilles, Huynh-Thu, Vân Anh, Wehenkel, Louis, Geurts, Pierre

Context-dependent feature analysis with random forests

arXiv.org Machine LearningMay-12-2016

In many cases, feature selection is often more complicated than identifying a single subset of input variables that would together explain the output. There may be interactions that depend on contextual information, i.e., variables that reveal to be relevant only in some specific circumstances. In this setting, the contribution of this paper is to extend the random forest variable importances framework in order (i) to identify variables whose relevance is context-dependent and (ii) to characterize as precisely as possible the effect of contextual information on these variables. The usage and the relevance of our framework for highlighting context-dependent variables is illustrated on both artificial and real datasets.

artificial intelligence, information, machine learning, (20 more...)

1605.03848

Country:

Europe (0.93)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.61)

Wright, Marvin N., Dankowski, Theresa, Ziegler, Andreas

Random forests for survival analysis using maximally selected rank statistics

arXiv.org Machine LearningMay-11-2016

The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption is not always fulfilled. An alternative approach is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistics, which favors splitting variables with many possible split points. Conditional inference forests avoid this split point selection bias. However, linear rank statistics are utilized in current software for conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. We therefore use maximally selected rank statistics for split point selection in random forests for survival analysis. As in conditional inference forests, p-values for association between split points and survival time are minimized. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split point selection is possible. However, there is a trade-off between unbiased split point selection and runtime. In benchmark studies of prediction performance on simulated and real datasets the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used.

approximation, artificial intelligence, machine learning, (17 more...)

1605.03391

Country:

Europe > Germany (0.28)
Europe > Austria (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

#artificialintelligenceMay-10-2016, 19:56:37 GMT

A Complete Tutorial to learn Data Science in R from Scratch

Adjusted R² measures the goodness of fit of a regression model. Higher the R², better is the model.

artificial intelligence, machine learning, vector, (19 more...)

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

#artificialintelligenceMay-9-2016, 02:25:27 GMT

Improving performance of random forests for a particular value of outcome by adding chosen features

Choosing features to improve a performance of a particular algorithm is a difficult question. Currently here is PCA, which is difficult to understand (although it can be used out-of-the-box), requires centralizing and scaling of features and is not easy to interpret. In addition, it does not allows to improve prediction performance for a particular outcome (if its accuracy is lower than for others or it has a particular importance). My method enables to use features without preprocessing. Therefore a resulting prediction is easy to explain.

decision tree learning, machine learning, particular value, (2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

arXiv.org Machine LearningMay-9-2016

A Selection of Giant Radio Sources from NVSS

Proctor, D. D.

Results of the application of pattern recognition techniques to the problem of identifying Giant Radio Sources (GRS) from the data in the NVSS catalog are presented and issues affecting the process are explored. Decision-tree pattern recognition software was applied to training set source pairs developed from known NVSS large angular size radio galaxies. The full training set consisted of 51,195 source pairs, 48 of which were known GRS for which each lobe was primarily represented by a single catalog component. The source pairs had a maximum separation of 20 arc minutes and a minimum component area of 1.87 square arc minutes at the 1.4 mJy level. The importance of comparing resulting probability distributions of the training and application sets for cases of unknown class ratio is demonstrated. The probability of correctly ranking a randomly selected (GRS, non-GRS) pair from the best of the tested classifiers was determined to be 97.8 +/- 1.5%. The best classifiers were applied to the over 870,000 candidate pairs from the entire catalog. Images of higher ranked sources were visually screened and a table of over sixteen hundred candidates, including morphological annotation, is presented. These systems include doubles and triples, Wide-Angle Tail (WAT) and Narrow-Angle Tail (NAT), S- or Z-shaped systems, and core-jets and resolved cores. While some resolved lobe systems are recovered with this technique, generally it is expected that such systems would require a different approach.

artificial intelligence, machine learning, solovyov & verkhodanov, (17 more...)

doi: 10.3847/0067-0049/224/2/18

1603.06895

Country: North America > United States > California (0.92)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (0.92)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.66)

#artificialintelligenceMay-8-2016, 08:50:32 GMT

The 7 Best Data Science and Machine Learning Podcasts -- The Startup

Data science and machine learning have long been interests of mine, but now that I'm working on Fuzzy.io I need to keep on top of all the news in both fields. My preferred way to do this is through listening to podcasts. I've listened to a bunch of machine learning and data science podcasts in the last few months, so I thought I'd share my favorites: Every other week, they release a 10–15 minute episode where hosts, Kyle and Linda Polich give a short primer on topics like k-means clustering, natural language processing and decision tree learning, often using analogies related to their pet parrot, Yoshi. This is the only place where you'll learn about k-means clustering via placement of parrot droppings.

artificial intelligence, decision tree learning, science and machine learning podcast, (5 more...)

Industry: Education > Educational Setting > Online (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

AAAI ConferencesMay-8-2016

Learning Decision Trees from Histogram Data Using Multiple Subsets of Bins

Gurung, Ram B. (Stockholm University) | Lindgren, Tony (Stockholm University) | Boström, Henrik (Stockholm University)

The standard approach of learning decision trees from histogram data is to treat the bins as independent variables. However, as the underlying dependencies among the bins might not be completely exploited by this approach, an algorithm has been proposed for learning decision trees from histogram data by considering all bins simultaneously while partitioning examples at each node of the tree. Although the algorithm has been demonstrated to improve predictive performance, its computational complexity has turned out to be a major bottleneck, in particular for histograms with a large number of bins. In this paper, we propose instead a sliding window approach to select subsets of the bins to be considered simultaneously while partitioning examples. This significantly reduces the number of possible splits to consider, allowing for substantially larger histograms to be handled. We also propose to evaluate the original bins independently, in addition to evaluating the subsets of bins when performing splits. This ensures that the information obtained by treating bins simultaneously is an additional gain compared to what is considered by the standard approach. Results of experiments on applying the new algorithm to both synthetic and real world datasets demonstrate positive results in terms of predictive performance without excessive computational cost.

histogram data, learning decision tree, multiple subset, (1 more...)

AAAI Conferences

The Twenty-Ninth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.80)

AAAI ConferencesMay-8-2016

Multiplicative Factorization of Multi-Valued NIN-AND Tree Models

Xiang, Yang (University of Guelph) | Jin, Yiting (University of Guelph)

A multi-valued Non-Impeding Noisy-AND (NIN-AND) tree model has the linear complexity and is more expressive than common Causal Independence Models (CIMs). We formulate a Multiplicative Factorization (MF) for multi-valued NIN-AND Tree (NAT) models. In comparison with the MF for binary NAT models (of a undirected tree structure), the proposed MF is a hybrid and multiply connected graphical model. Although a NAT is made of two types of NIN-AND gates, we showthat a sound and space efficient MF requires multiple types of gate MFs, and therefore significantly more sophisticated parameterizationand integration of gate MFs, and soundness analysis. We show that the formulated MF is exact and itsspace complexity is linear on the number $n$ of causes per effect. Based on the proposed MF, we extend the scheme for lazy propagation (LP) with binary NAT-modeled Bayesian Networks (BNs) to multi-valued NAT-modeled BNs. We show that the extended scheme is more powerful than LP based on MF of noisy-MAX. We demonstrate that the scheme allows significantly more efficient LP both in space and in time.

decision tree learning, machine learning, multi-valued nin-and tree model, (2 more...)

AAAI Conferences

The Twenty-Ninth International Flairs Conference

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)

#artificialintelligenceMay-2-2016, 00:35:37 GMT

Something is wrong in the way #MachineLearning is being taught to #Developers

The last few years have seen an explosion of interest in Machine Learning (ML) technology and potential applications. Machine Learning is the unsung hero that powers many applications, systems, sensors, devices, and products. Today, Machine Learning is so pervasive that we can often assume its presence in most of the applications and systems without having to specifically call it out. In simple terms, machine learning is a computer's ability to learn from data, and it is one of the most useful tools we have to develop intelligent systems and applications. Machine learning is used widely today for all kinds of tasks, from churn prediction in large companies, to web search, to medical diagnostics, to robotics.

application, artificial intelligence, machine learning, (14 more...)

Country:

Asia > Middle East > UAE > Dubai Emirate > Dubai (0.06)
North America > Canada > Ontario > Middlesex County > London (0.05)

Genre: Instructional Material (0.71)

Industry: Education (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.50)