AITopics | Farrelly, Colleen M.

Collaborating Authors

Farrelly, Colleen M.

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dimensionality Reduction Ensembles

Farrelly, Colleen M.

arXiv.org Machine LearningOct-11-2017

Ensemble learning has had many successes in supervised learning, but it has been rare in unsupervised learning and dimensionality reduction. This study explores dimensionality reduction ensembles, using principal component analysis and manifold learning techniques to capture linear, nonlinear, local, and global features in the original dataset. Dimensionality reduction ensembles are tested first on simulation data and then on two real medical datasets using random forest classifiers; results suggest the efficacy of this approach, with accuracies approaching that of the full dataset. Limitations include computational cost of some algorithms with strong performance, which may be ameliorated through distributed computing and the development of more efficient versions of these algorithms.

dataset, health & medicine, oncology, (17 more...)

arXiv.org Machine Learning

1710.04484

Country: North America > United States > Wisconsin (0.15)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.71)
Health & Medicine > Therapeutic Area (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback

Deep vs. Diverse Architectures for Classification Problems

Farrelly, Colleen M.

arXiv.org Machine LearningAug-21-2017

This study compares various superlearner and deep learning architectures (machine-learning-based and neural-network-based) for classification problems across several simulated and industrial datasets to assess performance and computational efficiency, as both methods have nice theoretical convergence properties. Superlearner formulations outperform other methods at small to moderate sample sizes (500-2500) on nonlinear and mixed linear/nonlinear predictor relationship datasets, while deep neural networks perform well on linear predictor relationship datasets of all sizes. This suggests faster convergence of the superlearner compared to deep neural network architectures on many messy classification problems for real-world data. Superlearners also yield interpretable models, allowing users to examine important signals in the data; in addition, they offer flexible formulation, where users can retain good performance with low-computational-cost base algorithms. K-nearest-neighbor (KNN) regression demonstrates improvements using the superlearner framework, as well; KNN superlearners consistently outperform deep architectures and KNN regression, suggesting that superlearners may be better able to capture local and global geometric features through utilizing a variety of algorithms to probe the data space.

algorithm, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1708.06347

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Extensions of Morse-Smale Regression with Application to Actuarial Science

Farrelly, Colleen M.

arXiv.org Machine LearningAug-17-2017

The problem of subgroups is ubiquitous in scientific research (ex. disease heterogeneity, spatial distributions in ecology...), and piecewise regression is one way to deal with this phenomenon. Morse-Smale regression offers a way to partition the regression function based on level sets of a defined function and that function's basins of attraction. This topologically-based piecewise regression algorithm has shown promise in its initial applications, but the current implementation in the literature has been limited to elastic net and generalized linear regression. It is possible that nonparametric methods, such as random forest or conditional inference trees, may provide better prediction and insight through modeling interaction terms and other nonlinear relationships between predictors and a given outcome. This study explores the use of several machine learning algorithms within a Morse-Smale piecewise regression framework, including boosted regression with linear baselearners, homotopy-based LASSO, conditional inference trees, random forest, and a wide neural network framework called extreme learning machines. Simulations on Tweedie regression problems with varying Tweedie parameter and dispersion suggest that many machine learning approaches to Morse-Smale piecewise regression improve the original algorithm's performance, particularly for outcomes with lower dispersion and linear or a mix of linear and nonlinear predictor relationships. On a real actuarial problem, several of these new algorithms perform as good as or better than the original Morse-Smale regression algorithm, and most provide information on the nature of predictor relationships within each partition to provide insight into differences between dataset partitions.

health & medicine, neurology, regression, (20 more...)

arXiv.org Machine Learning

1708.05712

Genre: Research Report (0.83)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods

Farrelly, Colleen M.

arXiv.org Machine LearningJul-29-2017

Very few K-nearest-neighbor (KNN) ensembles exist, despite the efficacy of this approach in regression, classification, and outlier detection. Those that do exist focus on bagging features, rather than varying k or bagging observations; it is unknown whether varying k or bagging observations can improve prediction. Given recent studies from topological data analysis, varying k may function like multiscale topological methods, providing stability and better prediction, as well as increased ensemble diversity. This paper explores 7 KNN ensemble algorithms combining bagged features, bagged observations, and varied k to understand how each of these contribute to model fit. Specifically, these algorithms are tested on Tweedie regression problems through simulations and 6 real datasets; results are compared to state-of-the-art machine learning models including extreme learning machines, random forest, boosted regression, and Morse-Smale regression. Results on simulations suggest gains from varying k above and beyond bagging features or samples, as well as the robustness of KNN ensembles to the curse of dimensionality. KNN regression ensembles perform favorably against state-of-the-art algorithms and dramatically improve performance over KNN regression. Further, real dataset results suggest varying k is a good strategy in general (particularly for difficult Tweedie regression problems) and that KNN regression ensembles often outperform state-of-the-art methods. These results for k-varying ensembles echo recent theoretical results in topological data analysis, where multidimensional filter functions and multiscale coverings provide stability and performance gains over single-dimensional filters and single-scale covering. This opens up the possibility of leveraging multiscale neighborhoods and multiple measures of local geometry in ensemble methods.

attention deficit hyperactivity disorder, ensemble, health & medicine, (20 more...)

arXiv.org Machine Learning

1708.02122

Country: North America > United States (0.14)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback