Goto

Collaborating Authors

 Accuracy


On formalizing fairness in prediction with machine learning

arXiv.org Machine Learning

Machine learning algorithms for prediction are increasingly being used in critical decisions affecting human lives. Various fairness formalizations, with no firm consensus yet, are employed to prevent such algorithms from systematically discriminating against people based on certain attributes protected by law. The aim of this article is to survey how fairness is formalized in the machine learning literature for the task of prediction and present these formalizations with their corresponding notions of distributive justice from the social sciences literature. We provide theoretical as well as empirical critiques of these notions from the social sciences literature and explain how these critiques limit the suitability of the corresponding fairness formalizations to certain domains. We also suggest two notions of distributive justice which address some of these critiques and discuss avenues for prospective fairness formalizations.


Considerations of automated machine learning in clinical metabolic profiling: Altered homocysteine plasma concentration associated with metformin exposure

arXiv.org Machine Learning

With the maturation of metabolomics science and proliferation of biobanks, clinical metabolic profiling is an increasingly opportunistic frontier for advancing translational clinical research. Automated Machine Learning (AutoML) approaches provide exciting opportunity to guide feature selection in agnostic metabolic profiling endeavors, where potentially thousands of independent data points must be evaluated. In previous research, AutoML using high-dimensional data of varying types has been demonstrably robust, outperforming traditional approaches. However, considerations for application in clinical metabolic profiling remain to be evaluated. Particularly, regarding the robustness of AutoML to identify and adjust for common clinical confounders. In this study, we present a focused case study regarding AutoML considerations for using the Tree-Based Optimization Tool (TPOT) in metabolic profiling of exposure to metformin in a biobank cohort. First, we propose a tandem rank-accuracy measure to guide agnostic feature selection and corresponding threshold determination in clinical metabolic profiling endeavors. Second, while AutoML, using default parameters, demonstrated potential to lack sensitivity to low-effect confounding clinical covariates, we demonstrated residual training and adjustment of metabolite features as an easily applicable approach to ensure AutoML adjustment for potential confounding characteristics. Finally, we present increased homocysteine with long-term exposure to metformin as a potentially novel, non-replicated metabolite association suggested by TPOT; an association not identified in parallel clinical metabolic profiling endeavors. While considerations are recommended, including adjustment approaches for clinical confounders, AutoML presents an exciting tool to enhance clinical metabolic profiling and advance translational research endeavors.



Machine Unlearning: The Value of Imperfect Models

@machinelearnbot

A project manager once told me that "any job worth doing is worth doing poorly." I understood exactly what she meant, and she knew that I would understand, especially when she preceded our conversation with these words: "I wouldn't say this to everyone, but I know you will understand what I mean." The message was clear to me because I was a perfectionist (and hopefully I have learned over the years to be less of a perfectionist thanks to my project manager's wise counsel). As a perfectionist, I would strive for 100% completion and perfection on every project, every analysis, and every report. It would take me longer than most people to finish the analysis and report, and my manager understood why.


A Large Self-Annotated Corpus for Sarcasm

arXiv.org Artificial Intelligence

We introduce the Self-Annotated Reddit Corpus (SARC), a large corpus for sarcasm research and for training and evaluating systems for sarcasm detection. The corpus has 1.3 million sarcastic statements -- 10 times more than any previous dataset -- and many times more instances of non-sarcastic statements, allowing for learning in regimes of both balanced and unbalanced labels. Each statement is furthermore self-annotated -- sarcasm is labeled by the author and not an independent annotator -- and provided with user, topic, and conversation context. We evaluate the corpus for accuracy, compare it to previous related corpora, and provide baselines for the task of sarcasm detection.


Anatomical Pattern Analysis for decoding visual stimuli in human brains

arXiv.org Machine Learning

Background: A universal unanswered question in neuroscience and machine learning is whether computers can decode the patterns of the human brain. Multi-Voxels Pattern Analysis (MVPA) is a critical tool for addressing this question. However, there are two challenges in the previous MVPA methods, which include decreasing sparsity and noise in the extracted features and increasing the performance of prediction. Methods: In overcoming mentioned challenges, this paper proposes Anatomical Pattern Analysis (APA) for decoding visual stimuli in the human brain. This framework develops a novel anatomical feature extraction method and a new imbalance AdaBoost algorithm for binary classification. Further, it utilizes an Error-Correcting Output Codes (ECOC) method for multiclass prediction. APA can automatically detect active regions for each category of the visual stimuli. Moreover, it enables us to combine homogeneous datasets for applying advanced classification. Results and Conclusions: Experimental studies on 4 visual categories (words, consonants, objects and scrambled photos) demonstrate that the proposed approach achieves superior performance to state-of-the-art methods.


WWE Hell In A Cell 2017: Predictions, Match Card For SmackDown PPV

International Business Times

The first "SmackDown Live" exclusive pay-per-view in two and a half months is set for Sunday night in Detroit. Eight matches are scheduled for WWE Hell in a Cell 2017, including the pre-show, and four titles will be on the line. Below are predictions for every match on the WWE Hell in a Cell card. Shane McMahon vs. Kevin Owens (Hell in a Cell Falls Count Anywhere Match) This has certainly been a feud worthy of a main-event match, with the story building slowly ever since Owens joined the blue brand in May. Anything can happen when there are no disqualifications, but the logical decision is to have the actual wrestler win. Shane had the biggest moment of WrestleMania 32 before losing to The Undertaker, and he held his own with AJ Styles before getting pinned at WrestleMania 33.


A signature-based machine learning model for bipolar disorder and borderline personality disorder

arXiv.org Machine Learning

Mobile technologies offer opportunities for higher resolution monitoring of health conditions. This opportunity seems of particular promise in psychiatry where diagnoses often rely on retrospective and subjective recall of mood states. However, getting actionable information from these rather complex time series is challenging, and at present the implications for clinical care are largely hypothetical. This research demonstrates that, with well chosen cohorts (of bipolar disorder, borderline personality disorder, and control) and modern methods, it is possible to objectively learn to identify distinctive behaviour over short periods (20 reports) that effectively separate the cohorts. Participants with bipolar disorder or borderline personality disorder and healthy volunteers completed daily mood ratings using a bespoke smartphone app for up to a year. A signature-based machine learning model was used to classify participants on the basis of the interrelationship between the different mood items assessed and to predict subsequent mood. The signature methodology was significantly superior to earlier statistical approaches applied to this data in distinguishing the participant three groups, clearly placing 75% into their original groups on the basis of their reports. Subsequent mood ratings were correctly predicted with greater than 70% accuracy in all groups. Prediction of mood was most accurate in healthy volunteers (89-98%) compared to bipolar disorder (82-90%) and borderline personality disorder (70-78%).


How To Apply Data Science To Real Business Problems - Seattle Data Guy

#artificialintelligence

Data science and statistics are not magic. They won't magically fix all of a company's problems. However, they are useful tools to help companies make more accurate decisions and automate repetitive work and choices that teams need to make. Machine learning and data science get referenced a lot when referring to natural language processing, imaging recognition and chat bots. However, they also can be applied to help managers make decisions, predict future revenues, segment markets, produce better content and diagnosis patients more effectively. Below, we are going to discuss some case examples of statistics and applied data science algorithms that can help your business and team produce more accurate results. This doesn't require complex hadoop clusters and cloud analytics. Just, let's get the basics going first! Before we jump to far down the rabbit hole of technology and hype!


Learning Predictive Leading Indicators for Forecasting Time Series Systems with Unknown Clusters of Forecast Tasks

arXiv.org Machine Learning

We present a new method for forecasting systems of multiple interrelated time series. The method learns the forecast models together with discovering leading indicators from within the system that serve as good predictors improving the forecast accuracy and a cluster structure of the predictive tasks around these. The method is based on the classical linear vector autoregressive model (VAR) and links the discovery of the leading indicators to inferring sparse graphs of Granger causality. We formulate a new constrained optimisation problem to promote the desired sparse structures across the models and the sharing of information amongst the learning tasks in a multi-task manner. We propose an algorithm for solving the problem and document on a battery of synthetic and real-data experiments the advantages of our new method over baseline VAR models as well as the state-of-the-art sparse VAR learning methods.