Performance Analysis
Bayesian Learning for Statistical Classification – Stats and Bots
A well-calibrated estimator for the conditional probabilities should obey this equation. Once we have derived a statistical classifier, we need to validate it on some test data. This data should be different from that used to train the classifier, otherwise skill scores will be unduly optimistic. This is known as cross-validation. The confusion matrix expresses everything about the accuracy of a discrete classifier over a given database and you can use it to compose any possible skill score. Here, we are going to cover two that are rarely seen in the literature, but are nonetheless important for reasons that will become clear.
Stacked Generalization: An Introduction to Super Learning
Stacked generalization is an ensemble method that allows researchers to combine several different prediction algorithms into one. Since its introduction in the early 1990s, the method has evolved several times into what is now known as "Super Learner". Super Learner uses V-fold cross-validation to build the optimal weighted combination of predictions from a library of candidate algorithms. Optimality is defined by a user-specified objective function, such as minimizing mean squared error or maximizing the area under the receiver operating characteristic curve. Although relatively simple in nature, use of the Super Learner by epidemiologists has been hampered by limitations in understanding conceptual and technical details.
A signature-based machine learning model for bipolar disorder and borderline personality disorder
Arribas, Imanol Perez, Saunders, Kate, Goodwin, Guy, Lyons, Terry
Mobile technologies offer opportunities for higher resolution monitoring of health conditions. This opportunity seems of particular promise in psychiatry where diagnoses often rely on retrospective and subjective recall of mood states. However, getting actionable information from these rather complex time series is challenging, and at present the implications for clinical care are largely hypothetical. This research demonstrates that, with well chosen cohorts (of bipolar disorder, borderline personality disorder, and control) and modern methods, it is possible to objectively learn to identify distinctive behaviour over short periods (20 reports) that effectively separate the cohorts. Participants with bipolar disorder or borderline personality disorder and healthy volunteers completed daily mood ratings using a bespoke smartphone app for up to a year. A signature-based machine learning model was used to classify participants on the basis of the interrelationship between the different mood items assessed and to predict subsequent mood. The signature methodology was significantly superior to earlier statistical approaches applied to this data in distinguishing the participant three groups, clearly placing 75% into their original groups on the basis of their reports. Subsequent mood ratings were correctly predicted with greater than 70% accuracy in all groups. Prediction of mood was most accurate in healthy volunteers (89-98%) compared to bipolar disorder (82-90%) and borderline personality disorder (70-78%).
How To Apply Data Science To Real Business Problems - Seattle Data Guy
Data science and statistics are not magic. They won't magically fix all of a company's problems. However, they are useful tools to help companies make more accurate decisions and automate repetitive work and choices that teams need to make. Machine learning and data science get referenced a lot when referring to natural language processing, imaging recognition and chat bots. However, they also can be applied to help managers make decisions, predict future revenues, segment markets, produce better content and diagnosis patients more effectively. Below, we are going to discuss some case examples of statistics and applied data science algorithms that can help your business and team produce more accurate results. This doesn't require complex hadoop clusters and cloud analytics. Just, let's get the basics going first! Before we jump to far down the rabbit hole of technology and hype!
Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks
Gregorova, Magda, Kalousis, Alexandros, Marchand-Maillet, Stephane
We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods.
Bardo: Emotion-Based Music Recommendation for Tabletop Role-Playing Games
Padovani, Rafael R. (Universidade Federal de Viçosa) | Ferreira, Lucas N. (University of California, Santa Cruz) | Lelis, Levi H. S. (Universidade Federal de Viçosa)
In this paper we introduce Bardo, a real-time intelligent system to automatically select the background music for tabletop role-playing games. Bardo uses an off-the-shelf speech recognition system to transform into text what the players say during a game session, and a supervised learning algorithm to classify the text into an emotion. Bardo then selects and plays as background music a song representing the classified emotion. We evaluate Bardo with a Dungeons and Dragons (D&D) campaign available on YouTube. Accuracy experiments show that a simple Naive Bayes classifier is able to obtain good prediction accuracy in our classification task. A user study in which people evaluated edited versions of the D&D videos suggests that Bardo's selections can be better than those used in the original videos of the campaign.
Recognizing Detailed Human Context In-the-Wild from Smartphones and Smartwatches
Vaizman, Yonatan, Ellis, Katherine, Lanckriet, Gert
The ability to automatically recognize a person's behavioral context can contribute to health monitoring, aging care and many other domains. Validating context recognition in-the-wild is crucial to promote practical applications that work in real-life settings. We collected over 300k minutes of sensor data with context labels from 60 subjects. Unlike previous studies, our subjects used their own personal phone, in any way that was convenient to them, and engaged in their routine in their natural environments. Unscripted behavior and unconstrained phone usage resulted in situations that are harder to recognize. We demonstrate how fusion of multi-modal sensors is important for resolving such cases. We present a baseline system, and encourage researchers to use our public dataset to compare methods and improve context recognition in-the-wild.
Bayesian Learning for Statistical Classification – Stats and Bots
A well-calibrated estimator for the conditional probabilities should obey this equation. Once we have derived a statistical classifier, we need to validate it on some test data. This data should be different from that used to train the classifier, otherwise skill scores will be unduly optimistic. This is known as cross-validation. The confusion matrix expresses everything about the accuracy of a discrete classifier over a given database and you can use it to compose any possible skill score. Here, we are going to cover two that are rarely seen in the literature, but are nonetheless important for reasons that will become clear.
Detecting and Monitoring Diseases with Big Data DataScience.US
To avoid epidemics such as the 2014 Ebola outbreak, early detection and diagnosing of diseases is critical. Even in the more isolated cases, such as the development of cancer, early diagnosis can save lives. Big data can be used as a versatile tool to assist in the detection, monitoring, and diagnosis of bacterial diseases and cancer. Past efforts to combat outbreaks of disease typically focused on the collection of physical information from laboratory test results and public health records, to create predictive models of how the disease might spread. However, the big data model uses medical information, internet resources, social media, and other sources to enable real time tracking of disease outbreaks.