Goto

Collaborating Authors

 Performance Analysis


Machine Learning Approaches to Predict 6-Month Mortality Among Patients With Cancer

#artificialintelligence

Question Can machine learning algorithms identify oncology patients at risk of short-term mortality to inform timely conversations between patients and physicians regrading serious illness? Findings In this cohort study of 26 525 patients seen in oncology practices within a large academic health system, machine learning algorithms accurately identified patients at high risk of 6-month mortality with good discrimination and positive predictive value. When the gradient boosting algorithm was applied in real time, most patients who were classified as having high risk were deemed appropriate by oncology clinicians for a conversation regarding serious illness. Meaning In this study, machine learning algorithms accurately identified patients with cancer who were at risk of 6-month mortality, suggesting that these models could facilitate more timely conversations between patients and physicians regarding goals and values. Importance Machine learning algorithms could identify patients with cancer who are at risk of short-term mortality. However, it is unclear how different machine learning algorithms compare and whether they could prompt clinicians to have timely conversations about treatment and end-of-life preferences. Objectives To develop, validate, and compare machine learning algorithms that use structured electronic health record data before a clinic visit to predict mortality among patients with cancer. Design, Setting, and Participants Cohort study of 26 525 adult patients who had outpatient oncology or hematology/oncology encounters at a large academic cancer center and 10 affiliated community practices between February 1, 2016, and July 1, 2016.


Top 7 Checkpoints To Consider During Machine Learning Production

#artificialintelligence

A major challenge for any company that is starting out in the realm of data-driven markets is the deployment of machine learning pipelines at full scale for their products. To tap the most out of AI, it is necessary to build service-specific tools and frameworks in addition to the existing models. The best strategy varies from product to product; but the rubrics of machine learning stay the same. To democratise the use of machine learning, Google has condensed their years of research into a paper titled "A Rubric for ML Production Readiness", where they listed out their findings in the form of 28 specific tests that have shown promising results. The offline/online metric relationship can be measured in one or more small scale A/B experiments using an intentionally degraded model.


Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency

arXiv.org Machine Learning

Current multi-reference style transfer models for Text-to-Speech (TTS) perform sub-optimally on disjoints datasets, where one dataset contains only a single style class for one of the style dimensions. These models generally fail to produce style transfer for the dimension that is underrepresented in the dataset. In this paper, we propose an adversarial cycle consistency training scheme with paired and unpaired triplets to ensure the use of information from all style dimensions. During training, we incorporate unpaired triplets with randomly selected reference audio samples and encourage the synthesized speech to preserve the appropriate styles using adversarial cycle consistency. We use this method to transfer emotion from a dataset containing four emotions to a dataset with only a single emotion. This results in a 78% improvement in style transfer (based on emotion classification) with minimal reduction in fidelity and naturalness. In subjective evaluations our method was consistently rated as closer to the reference style than the baseline. Synthesized speech samples are available at: https://sites.google.com/view/adv-cycle-consistent-tts


Toward a better trade-off between performance and fairness with kernel-based distribution matching

arXiv.org Machine Learning

As recent literature has demonstrated how classifiers often carry unintended biases toward some subgroups, deploying machine learned models to users demands careful consideration of the social consequences. How should we address this problem in a real-world system? How should we balance core performance and fairness metrics? In this paper, we introduce a MinDiff framework for regularizing classifiers toward different fairness metrics and analyze a technique with kernel-based statistical dependency tests. We run a thorough study on an academic dataset to compare the Pareto frontier achieved by different regularization approaches, and apply our kernel-based method to two large-scale industrial systems demonstrating real-world improvements.


Choosing a Machine Learning Model - KDnuggets

#artificialintelligence

The number of shiny models out there can be overwhelming, which means a lot of times people fall back on a few they trust the most and use them on all new problems. This can lead to sub-optimal results. Today we're going to learn how to quickly and efficiently narrow down the space of available models to find those that are most likely to perform best on your problem type. We'll also see how we can keep track of our models' performances using Weights and Biases and compare them. You can find the accompanying code here.


Incode raises $10 million to verify identities with AI

#artificialintelligence

Incode, a San Francisco startup developing what it describes as an omnichannel biometric identity platform, today announced that it's raised $10 million in seed funding from undisclosed investors. Founder and CEO Ricardo Amper said that the newfound capital will enable Incode to accelerate the development and rollout of its tools globally, some of which are already being used by major banks, financial institutions, governments, and retailers. "The modern consumer is all about experiences and convenience," said Amper. "What they want is a seamless, consistent and secure way to perform daily tasks like access their ATM, make payments, and access online accounts. Yet, what they get today is quite the opposite. The ecosystem is fragmented by multiple vendors and devices, making processes clunky and inefficient. That's precisely why we've built Incode Omni: to help companies provide a frictionless, secure and convenient experience for the next generation of consumers."


Securing machine learning models against adversarial attacks

#artificialintelligence

Beware: many defence methods can lead to gradient masking, whether intentional or not. Gradient masking does not guarantee adversarial robustness, and has been shown to be circumventable (Tramรจr et al., 2017; Athalye et al, 2018). We hope this article provides helpful insights on how to defend against adversarial examples. Please feel free to provide suggestions in the comment section if we're missing something.


Torus Graphs for Multivariate Phase Coupling Analysis

arXiv.org Machine Learning

Angular measurements are often modeled as circular random variables, where there are natural circular analogues of moments, including correlation. Because a product of circles is a torus, a d-dimensional vector of circular random variables lies on a d-dimensional torus. For such vectors we present here a class of graphical models, which we call torus graphs, based on the full exponential family with pairwise interactions. The topological distinction between a torus and Euclidean space has several important consequences. Our development was motivated by the problem of identifying phase coupling among oscillatory signals recorded from multiple electrodes in the brain: oscillatory phases across electrodes might tend to advance or recede together, indicating coordination across brain areas. The data analyzed here consisted of 24 phase angles measured repeatedly across 840 experimental trials (replications) during a memory task, where the electrodes were in 4 distinct brain regions, all known to be active while memories are being stored or retrieved. In realistic numerical simulations, we found that a standard pairwise assessment, known as phase locking value, is unable to describe multivariate phase interactions, but that torus graphs can accurately identify conditional associations. Torus graphs generalize several more restrictive approaches that have appeared in various scientific literatures, and produced intuitive results in the data we analyzed. Torus graphs thus unify multivariate analysis of circular data and present fertile territory for future research.


Precision and Recall

#artificialintelligence

Imagine a machine learning algorithm is tasked with identifying the number of bananas within a bowl of fruit. In total, the bowl contains 10 pieces of fruit, 4 of which are bananas, and 6 are apples. The algorithm determines that there are 5 bananas, and 5 apples. The number of bananas that were counted correctly are known as true positives, while the items that were identified incorrectly as bananas are called false positives. In this example, there are 4 true positives, and one false positive, making the algorithms precision 4/5, and its recall is 4/10.


Bootstrapping deep music separation from primitive auditory grouping principles

arXiv.org Machine Learning

Separating an audio scene such as a cocktail party into constituent, meaningful components is a core task in computer audition. Deep networks are the state-of-the-art approach. They are trained on synthetic mixtures of audio made from isolated sound source recordings so that ground truth for the separation is known. However, the vast majority of available audio is not isolated. The brain uses primitive cues that are independent of the characteristics of any particular sound source to perform an initial segmentation of the audio scene. We present a method for bootstrapping a deep model for music source separation without ground truth by using multiple primitive cues. We apply our method to train a network on a large set of unlabeled music recordings from YouTube to separate vocals from accompaniment without the need for ground truth isolated sources or artificial training mixtures.