Goto

Collaborating Authors

 Law


An iterative method for classification of binary data

arXiv.org Machine Learning

We consider the problem of performing classification when only binary measurements of data are available. This situation may arise due to the need for extreme compression of data or in the interest of hardware efficiency [11, 17, 18, 1]. Despite this extremely coarse quantization of the data, one can still perform learning tasks, such as classification, with high accuracy. The authors of [23] recently proposed a classification method for binary data, which they show to be reasonably accurate and sufficiently simple to allow for theoretical analysis in certain settings. Additionally, the predicted class can be approximately understood as the class whose binarized training data most closely and frequently matches that of the test point. As this approach will be the foundation of the work presented here, we discuss it in detail in the next section. Interpretability of algorithms and the ability to explain predictions is of increasing importance as machine learning algorithms are applied to an expanding range of problems in areas such as medicine, criminal justice, and finance [3, 2, 24]. Decisions made based on algorithmic predictions can have profound repercussions for both participating individuals as well as society at large. A major drawback to complex models such as deep neural networks [20, 15, 8, 19] is that it is extremely difficult to explain how or why such algorithms arrive at a specific prediction, see e.g.


Learning Optimized Risk Scores

arXiv.org Machine Learning

Risk scores are simple classification models that let users make quick risk predictions by adding and subtracting a few small numbers. These models are widely used in medicine and criminal justice, but are difficult to learn from data because they need to be calibrated, sparse, use small integer coefficients and obey application-specific constraints. In this paper, we present a new machine learning approach to learn risk scores. We formulate the risk score problem as a mixed integer nonlinear program, and present a new cutting plane algorithm for non-convex settings to efficiently recover its optimal solution. We improve our algorithm with specialized techniques to generate feasible solutions, narrow the optimality gap, and reduce data-related computation. Our approach can fit risk scores in a way that scales linearly in the number of samples, provides a certificate of optimality, and obeys real-world constraints without parameter tuning or post-processing. We illustrate the performance benefits of this approach through an extensive set of numerical experiments, where we compare risk scores built using our approach to those built using heuristic approaches. We also discuss the practical benefits of our approach through an application where we build a customized risk score for ICU seizure prediction in collaboration with the Massachusetts General Hospital.


Deep Recurrent Survival Analysis

arXiv.org Machine Learning

Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.


Learning Optimal Fair Policies

arXiv.org Machine Learning

We consider the problem of learning optimal policies from observational data in a way that satisfies certain fairness criteria. The issue of fairness arises where some covariates used in decision making are sensitive features, or are correlated with sensitive features. (Nabi and Shpitser 2018) formalized fairness in the context of regression problems as constraining the causal effects of sensitive features along certain disallowed causal pathways. The existence of these causal effects may be called retrospective unfairness in the sense of already being present in the data before analysis begins, and may be due to discriminatory practices or the biased way in which variables are defined or recorded. In the context of learning policies, what we call prospective bias, i.e., the inappropriate dependence of learned policies on sensitive features, is also possible. In this paper, we use methods from causal and semiparametric inference to learn optimal policies in a way that addresses both retrospective bias in the data, and prospective bias due to the policy. In addition, our methods appropriately address statistical bias due to model misspecification and confounding bias, which are important in the estimation of path-specific causal effects from observational data. We apply our methods to both synthetic data and real criminal justice data.


Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under Different Worldviews

arXiv.org Machine Learning

We mathematically compare three competing definitions of group-level nondiscrimination: demographic parity, equalized odds, and calibration. Using the theoretical framework of Friedler et al., we study the properties of each definition under various worldviews, which are assumptions about how, if at all, the observed data is biased. We prove that different worldviews call for different definitions of fairness, and we specify when it is appropriate to use demographic parity and equalized odds. In addition, we argue that calibration is unsuitable for the purpose of ensuring nondiscrimination. Finally, we define a worldview that is more realistic than the previously considered ones, and we introduce a new notion of fairness that is suitable for this worldview.


An Analysis of Hierarchical Text Classification Using Word Embeddings

arXiv.org Artificial Intelligence

Efficient distributed numerical word representation models (word embeddings) combined with modern machine learning algorithms have recently yielded considerable improvement on automatic document classification tasks. However, the effectiveness of such techniques has not been assessed for the hierarchical text classification (HTC) yet. This study investigates the application of those models and algorithms on this specific problem by means of experimentation and analysis. We trained classification models with prominent machine learning algorithm implementations---fastText, XGBoost, SVM, and Keras' CNN---and noticeable word embeddings generation methods---GloVe, word2vec, and fastText---with publicly available data and evaluated them with measures specifically appropriate for the hierarchical context. FastText achieved an ${}_{LCA}F_1$ of 0.893 on a single-labeled version of the RCV1 dataset. An analysis indicates that using word embeddings and its flavors is a very promising approach for HTC.


A Roadmap for the Value-Loading Problem

arXiv.org Artificial Intelligence

We analyze the value-loading problem. This is the problem of encoding moral values into an AI agent interacting with a complex environment. Like many before, we argue that this is both a major concern and an extremely challenging problem. Solving it will likely require years, if not decades, of multidisciplinary work by teams of top scientists and experts. Given how uncertain the timeline of human-level AI research is, we thus argue that a pragmatic partial solution should be designed as soon as possible. To this end, we propose a preliminary research program. This roadmap identifies several key steps. We hope that this will allow scholars, engineers and decision-makers to better grasp the upcoming difficulties, and to foresee how they can best contribute to the global effort.



IoT, AI and Blockchain: Catalysts for Digital Transformation

#artificialintelligence

The digital revolution has brought with it a new way of thinking about manufacturing and operations. Emerging challenges associated with logistics and energy costs are influencing global production and associated distribution decisions. Significant advances in technology, including big data and analytics, AI, Internet of Things, robotics and additive manufacturing, are shifting the capabilities and value proposition of global manufacturing. In response, manufacturing and operations require a digital renovation: the value chain must be redesigned and retooled and the workforce retrained. Total delivered cost must be analyzed to determine the best places to locate sources of supply, manufacturing and assembly operations around the world.


AI sucks at stopping online trolls spewing toxic comments

#artificialintelligence

New research has shown just how bad AI is at dealing with online trolls. Such systems struggle to automatically flag nudity and violence, don't understand text well enough to shoot down fake news and aren't effective at detecting abusive comments from trolls hiding behind their keyboards. A group of researchers from Aalto University and the University of Padua found this out when they tested seven state-of-the-art models used to detect hate speech. All of them failed to recognize foul language when subtle changes were made, according to a paper [PDF] on arXiv. Adversarial examples can be created automatically by using algorithms to misspell certain words, swap characters for numbers or add random spaces between words or attach innocuous words such as'love' in sentences.