Webster, Jennifer
Spatiotemporal k-means
Dorabiala, Olga, Webster, Jennifer, Kutz, Nathan, Aravkin, Aleksandr
The widespread use of sensor and data acquisition technologies, including IOT, GPS, RFID, LIDAR, satellite, and cellular networks allows for, among other applications, the continuous monitoring of the positions of moving objects of interest. These technologies create rich spatiotemporal data that is found across many scientific and real-world domains including ecologists' studies of collective animal behavior [13], the surveillance of large groups of people for suspicious activity [17], and traffic management [12]. Often, the data collected is large and unlabeled, motivating the development of unsupervised learning methods that can efficiently extract information about object behavior with no human supervision. In this study, we propose a method of spatiotemporal k-means (STKM) clustering that is able to analyze the multi-scale relationships within spatiotemporal data. Clustering is a major unsupervised data mining tool used to gain insight from unlabeled data by grouping objects based on some similarity measure [6, 11]. The most common methods for unsupervised clustering include k-means, Gaussian mixture models, and hierarchical clustering [18], all of which are workhorse algorithms for the data science industry.
Personalized Prognostic Models for Oncology: A Machine Learning Approach
Dooling, David, Kim, Angela, McAneny, Barbara, Webster, Jennifer
We have applied a little-known data transformation to subsets of the Surveillance, Epidemiology, and End Results (SEER) publically available data of the National Cancer Institute (NCI) to make it suitable input to standard machine learning classifiers. This transformation properly treats the right-censored data in the SEER data and the resulting Random Forest and Multi-Layer Perceptron models predict full survival curves. Treating the 6, 12, and 60 months points of the resulting survival curves as 3 binary classifiers, the 18 resulting classifiers have AUC values ranging from .765 to .885. Further evidence that the models have generalized well from the training data is provided by the extremely high levels of agreement between the random forest and neural network models predictions on the 6, 12, and 60 month binary classifiers.