Goto

Collaborating Authors

 Accuracy


Machine Learning with R – Barbara Fusinska

#artificialintelligence

Barbara started by introducing machine learning (ML), gave a brief overview of R and then discussed three examples; classifying hand written digits, estimating values in a socio-economic dataset and clustering crimes in Chicago. ML is statistics in steroids. ML uses data to find that pattern then uses that pattern (model) to predict results from similar data. Barbra uses the example of classifying film genres into either action or romance based on the number of kicks and kisses. Barbara described supervised and unsupervised. Unsupervised is the "wild, wild west" we can't train the model and it is much more difficult to understand how effective these are. Back to supervised learning, it's important to choose good predicting factors – in the movie example perhaps the title, actors, script may have been better predictors that the number of kicks and kisses. Then you must choose the algorithm and then tune it and finally make it useful and visible and get it into production - it's a hard job especially when data scientists and software developer seem to be different tribes.


Redundancy Techniques for Straggler Mitigation in Distributed Optimization and Learning

arXiv.org Machine Learning

Performance of distributed optimization and learning systems is bottlenecked by "straggler" nodes and slow communication links, which significantly delay computation. We propose a distributed optimization framework where the dataset is "encoded" to have an over-complete representation with built-in redundancy, and the straggling nodes in the system are dynamically left out of the computation at every iteration, whose loss is compensated by the embedded redundancy. We show that oblivious application of several popular optimization algorithms on encoded data, including gradient descent, L-BFGS, proximal gradient under data parallelism, and coordinate descent under model parallelism, converge to either approximate or exact solutions of the original problem when stragglers are treated as erasures. These convergence results are deterministic, i.e., they establish sample path convergence for arbitrary sequences of delay patterns or distributions on the nodes, and are independent of the tail behavior of the delay distribution. We demonstrate that equiangular tight frames have desirable properties as encoding matrices, and propose efficient mechanisms for encoding large-scale data. We implement the proposed technique on Amazon EC2 clusters, and demonstrate its performance over several learning problems, including matrix factorization, LASSO, ridge regression and logistic regression, and compare the proposed method with uncoded, asynchronous, and data replication strategies.


Domain Adaptation on Graphs by Learning Aligned Graph Bases

arXiv.org Machine Learning

We propose a method for domain adaptation on graphs. Given sufficiently many observations of the label function on a source graph, we study the problem of transferring the label information from the source graph to a target graph for estimating the target label function. Our assumption about the relation between the two domains is that the frequency content of the label function, regarded as a graph signal, has similar characteristics over the source and the target graphs. We propose a method to learn a pair of coherent bases on the two graphs, such that the corresponding source and target graph basis vectors have similar spectral content, while "aligning" the two graphs at the same time so that the reconstructed source and target label functions have similar coefficients over the bases. Experiments on several types of data sets suggest that the proposed method compares quite favorably to reference domain adaptation methods. To the best of our knowledge, our treatment is the first to study the domain adaptation problem in a purely graph-based setting with no need for embedding the data in an ambient space. This feature is particularly convenient for many problems of interest concerning learning on graphs or networks.


Revisiting Classifier Two-Sample Tests

arXiv.org Machine Learning

The goal of two-sample tests is to assess whether two samples, $S_P \sim P^n$ and $S_Q \sim Q^m$, are drawn from the same distribution. Perhaps intriguingly, one relatively unexplored method to build two-sample tests is the use of binary classifiers. In particular, construct a dataset by pairing the $n$ examples in $S_P$ with a positive label, and by pairing the $m$ examples in $S_Q$ with a negative label. If the null hypothesis "$P = Q$" is true, then the classification accuracy of a binary classifier on a held-out subset of this dataset should remain near chance-level. As we will show, such Classifier Two-Sample Tests (C2ST) learn a suitable representation of the data on the fly, return test statistics in interpretable units, have a simple null distribution, and their predictive uncertainty allow to interpret where $P$ and $Q$ differ. The goal of this paper is to establish the properties, performance, and uses of C2ST. First, we analyze their main theoretical properties. Second, we compare their performance against a variety of state-of-the-art alternatives. Third, we propose their use to evaluate the sample quality of generative models with intractable likelihoods, such as Generative Adversarial Networks (GANs). Fourth, we showcase the novel application of GANs together with C2ST for causal discovery.


Classifying Online Dating Profiles on Tinder using FaceNet Facial Embeddings

arXiv.org Machine Learning

ABSTRACT A method to produce personalized classification models to automatically review online dating profiles on Tinder, based on the user's historical preference, is proposed. The method takes advantage of a FaceNet facial classification model to extract features which may be related to facial attractiveness. The embeddings from a FaceNet model were used as the features to describe an individual's face. A user reviewed 8,545 online dating profiles. For each reviewed online dating profile, a feature set was constructed from the profile images which contained just one face. Two approaches are presented to go from the set of features for each face to a set of profile features. A simple logistic regression trained on the em-beddings from just 20 profiles could obtain a 65% validation accuracy. A point of diminishing marginal returns was identified to occur around 80 profiles, at which the model accuracy of 73% would only improve marginally after reviewing a significant number of additional profiles. Index Terms-- facial classification, facial attractiveness, online dating, classifying dating profiles 1. INTRODUCTION Online dating has become a commonplace in today's society.


False Discovery Rate Control via Debiased Lasso

arXiv.org Machine Learning

We consider the problem of variable selection in high-dimensional statistical models where the goal is to report a set of variables, out of many predictors $X_1, \dotsc, X_p$, that are relevant to a response of interest. For linear high-dimensional model, where the number of parameters exceeds the number of samples $(p>n)$, we propose a procedure for variables selection and prove that it controls the \emph{directional} false discovery rate (FDR) below a pre-assigned significance level $q\in [0,1]$. We further analyze the statistical power of our framework and show that for designs with subgaussian rows and a common precision matrix $\Omega\in\mathbb{R}^{p\times p}$, if the minimum nonzero parameter $\theta_{\min}$ satisfies $$\sqrt{n} \theta_{\min} - \sigma \sqrt{2(\max_{i\in [p]}\Omega_{ii})\log\left(\frac{2p}{qs_0}\right)} \to \infty\,,$$ then this procedure achieves asymptotic power one. Our framework is built upon the debiasing approach and assumes the standard condition $s_0 = o(\sqrt{n}/(\log p)^2)$, where $s_0$ indicates the number of true positives among the $p$ features. Notably, this framework achieves exact directional FDR control without any assumption on the amplitude of unknown regression parameters, and does not require any knowledge of the distribution of covariates or the noise level. We test our method in synthetic and real data experiments to asses its performance and to corroborate our theoretical results.


Delayed Impact of Fair Machine Learning

arXiv.org Machine Learning

Fairness in machine learning has predominantly been studied in static classification settings without concern for how decisions change the underlying population over time. Conventional wisdom suggests that fairness criteria promote the long-term well-being of those groups they aim to protect. We study how static fairness criteria interact with temporal indicators of well-being, such as long-term improvement, stagnation, and decline in a variable of interest. We demonstrate that even in a one-step feedback model, common fairness criteria in general do not promote improvement over time, and may in fact cause harm in cases where an unconstrained objective would not. We completely characterize the delayed impact of three standard criteria, contrasting the regimes in which these exhibit qualitatively different behavior. In addition, we find that a natural form of measurement error broadens the regime in which fairness criteria perform favorably. Our results highlight the importance of measurement and temporal modeling in the evaluation of fairness criteria, suggesting a range of new challenges and trade-offs.


Climate Change Pushes Ticks Into Canada, Bringing Lyme Disease (and Confusion) With Them

Mother Jones

This story was originally published by Undark and appears here as part of the Climate Desk collaboration. Joanne Seiff, a resident of Manitoba, contracted Lyme disease a couple of years ago but didn't remember pulling off the tick that bit her; nor did she have the telltale bullseye rash of a tick bite. Her husband Jeff Marcus, who grew up in New York's Hudson Valley, about an hour and a half from the eponymous town of Lyme, Connecticut, recognized her symptoms immediately because Lyme disease was common there. Canadian doctors, however, were not convinced. "Even though we had been telling people for months that she had Lyme disease and that all she needed was about four weeks of antibiotics, we were seeing specialist after specialist, and getting the same run-around," Marcus says.


WWE Fastlane 2018: Start Time, Live Stream Info, Matches For PPV Before WrestleMania

International Business Times

The "SmackDown Live" roster will compete Sunday night in Columbus, Ohio with a few matches that will determine what wrestlers will defend titles at the year's biggest event. Fans can watch Fastlane on pay-per-view for $54.99 or through a live stream with WWE Network, for which a monthly subscription costs $9.99. The kickoff show begins at 7 p.m. EDT on WWE Network and the actual PPV starts at 8 p.m. EDT. AJ Styles (WWE Championship), Charlotte Flair (SmackDown Women's Championship), Bobby Roode (United States Championship) and The Usos (SmackDown Tag Team Championship) will all put their belts on the line. There's a good chance that all four champions will leave Fastlane as winners.


A Neural Network Architecture Combining Gated Recurrent Unit (GRU) and Support Vector Machine (SVM) for Intrusion Detection in Network Traffic Data

arXiv.org Machine Learning

Gated Recurrent Unit (GRU) is a recently-developed variation of the long short-term memory (LSTM) unit, both of which are types of recurrent neural network (RNN). Through empirical evidence, both models have been proven to be effective in a wide variety of machine learning tasks such as natural language processing (Wen et al., 2015), speech recognition (Chorowski et al., 2015), and text classification (Yang et al., 2016). Conventionally, like most neural networks, both of the aforementioned RNN variants employ the Softmax function as its final output layer for its prediction, and the cross-entropy function for computing its loss. In this paper, we present an amendment to this norm by introducing linear support vector machine (SVM) as the replacement for Softmax in the final output layer of a GRU model. Furthermore, the cross-entropy function shall be replaced with a margin-based function. While there have been similar studies (Alalshekmubarak & Smith, 2013; Tang, 2013), this proposal is primarily intended for binary classification on intrusion detection using the 2013 network traffic data from the honeypot systems of Kyoto University. Results show that the GRU-SVM model performs relatively higher than the conventional GRU-Softmax model. The proposed model reached a training accuracy of ~81.54% and a testing accuracy of ~84.15%, while the latter was able to reach a training accuracy of ~63.07% and a testing accuracy of ~70.75%. In addition, the juxtaposition of these two final output layers indicate that the SVM would outperform Softmax in prediction time - a theoretical implication which was supported by the actual training and testing time in the study.