Goto

Collaborating Authors

 Accuracy


Effective injury prediction in professional soccer with GPS data and machine learning

arXiv.org Machine Learning

Injuries have a great impact on professional soccer, due to their large influence on team performance and the considerable costs of rehabilitation for players. Existing studies in the literature provide just a preliminary understanding of which factors mostly affect injury risk, while an evaluation of the potential of statistical models in forecasting injuries is still missing. In this paper, we propose a multidimensional approach to injury prediction in professional soccer which is based on GPS measurements and machine learning. By using GPS tracking technology, we collect data describing the training workload of players in a professional soccer club during a season. We show that our injury predictors are both accurate and interpretable by providing a set of case studies of interest to soccer practitioners. Our approach opens a novel perspective on injury prevention, providing a set of simple and practical rules for evaluating and interpreting the complex relations between injury risk and training performance in professional soccer.


Stopword removal (suprisingly) decreases accuracy of naive-bayes model

#artificialintelligence

Stop words typically remove such things as "a, an, the, it". Often this can be beneficial when we are classifying based on topics, which are well described by nouns and adjectives. However some text classification tasks are more abstract. Consider classifying fiction and non-fiction articles on the same topic, what would the difference between these two writing styles be? They would probably use the same nouns but what about the frequency of "the" vs "an" or "he" vs "they"?


Regularizing deep networks using efficient layerwise adversarial training

arXiv.org Machine Learning

Adversarial training has been shown to regularize deep neural networks in addition to increasing their robustness to adversarial examples. However, its impact on very deep state of the art networks has not been fully investigated. In this paper, we present an efficient approach to perform adversarial training by perturbing intermediate layer activations and study the use of such perturbations as a regularizer during training. We use these perturbations to train very deep models such as ResNets and show improvement in performance both on adversarial and original test data. Our experiments highlight the benefits of perturbing intermediate layer activations compared to perturbing only the inputs. The results on CIFAR-10 and CIFAR-100 datasets show the merits of the proposed adversarial training approach. Additional results on WideResNets show that our approach provides significant improvement in classification accuracy for a given base model, outperforming dropout and other base models of larger size.


Finding Significant Combinations of Continuous Features

arXiv.org Machine Learning

This problem is relevant in a broad range of applications including natural language processing, statistical genetics, and healthcare. To date, this problem of feature selection (Guyon and Elisseeff, 2003) has been extensively studied in machine learning, including the recent advances in selective inference (Taylor and Tibshirani, 2015), a technique that can assess the statistical significance of features selected by linear models such as the Lasso (Lee et al., 2016). However, current approaches have a crucial limitation: They can only find single features or linear combinations of features, but it is still an open problem to find patterns, that is, combinations of features with multiplicative effect. A relevant line of research towards this goal is significant pattern mining (Llinares-López et al., 2015; Papaxanthos et al., 2016; Terada et al., 2013), which tries to find statistically associated feature combinations while controlling the family-wise error rate (FWER), that is, the probability to detect one or more false positive patterns. However, all existing methods for significant pattern mining only apply to combinations of binary or discrete features, and none of methods can handle real-valued data, although such data is common in many applications. If we binarize data beforehand to use significant pattern mining approaches, a binarization-based method cannot distinguish correlated and uncorrelated features (see Figure 1 for an example). Subgroup discovery (Atzmueller, 2015; Herrera et al., 2011; Novak et al., 2009) also has the same goal of finding associated feature combinations, but the existing methods are also designed for discrete data, which means that binarization is required (Grosskreutz and Rüping, 2009) for real-valued data and the above problem still exists. To date, there is no method that can find all combinations of continuous features that are significantly associated with an output variable and that accounts for the inherent multiple testing problem.


WWE Backlash 2017: Live Stream Info, Start Time, Match Card For PPV And NXT TakeOver: Chicago

International Business Times

The action starts Saturday night with NXT TakeOver: Chicago, followed by Backlash 2017 Sunday night with members of the "SmackDown Live" roster. In total, 13 matches are scheduled for the two cards, both of which have 8 p.m. EDT start times. NXT TakeOver: Chicago can only be seen on WWE Network, which costs $9.99 per month. Fans can watch Backlash on the network or by ordering the pay-per-view for $54.99. New subscribers to the network can watch both shows with a free live stream, given that they won't be charged for the first month.


40 Interview Questions asked at Startups in Machine Learning / Data Science

@machinelearnbot

This article was posted by Manish Saraswat on Analytics Vidhya. Manish who works in marketing and Data Science at Analytics Vidhya believes that education can change this world. R, Data Science and Machine Learning keep him busy. Machine learning and data science are being looked as the drivers of the next industrial revolution happening in the world today. This also means that there are numerous exciting startups looking for data scientists.


Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

arXiv.org Machine Learning

Recently, substantial progress has been made on the problem of high-dimensional sparse linear models [22]. In particular, Lasso has been shown to be remarkably successful, and is statistically well-behaved and generates interpretable solutions. However, in the presence of non-linearity (i.e., the relation between the covariates and response is nonlinear), boosted decision trees, deep learning models, and kernel methods are regarded as the most effective models that deliver substantial performance boost over linear models; however, their interpretability is limited. As a result, there is a significant gap between the statistical performance and the interpretability, and it is often desirable to have computationally efficient algorithms that learn interpretable models without sacrificing statistical guarantees. This raises a natural question that we aim to tackle: Is there any algorithm which has similar statistical performance to complex models, while still retaining much of the interpretability of Lasso? In this paper, we answer the above question affirmatively and propose a novel way of learning the feature non-linearities with provable statistical and computational guarantees.


CDS Rate Construction Methods by Machine Learning Techniques

arXiv.org Machine Learning

Regulators require financial institutions to estimate counterparty default risks from liquid CDS quotes for the valuation and risk management of OTC derivatives. However, the vast majority of counterparties do not have liquid CDS quotes and need proxy CDS rates. Existing methods cannot account for counterparty-specific default risks; we propose to construct proxy CDS rates by associating to illiquid counterparty liquid CDS Proxy based on Machine Learning Techniques. After testing 156 classifiers from 8 most popular classifier families, we found that some classifiers achieve highly satisfactory accuracy rates. Furthermore, we have rank-ordered the performances and investigated performance variations amongst and within the 8 classifier families. This paper is, to the best of our knowledge, the first systematic study of CDS Proxy construction by Machine Learning techniques, and the first systematic classifier comparison study based entirely on financial market data. Its findings both confirm and contrast existing classifier performance literature. Given the typically highly correlated nature of financial data, we investigated the impact of correlation on classifier performance. The techniques used in this paper should be of interest for financial institutions seeking a CDS Proxy method, and can serve for proxy construction for other financial variables. Some directions for future research are indicated.


CardiacNET: Segmentation of Left Atrium and Proximal Pulmonary Veins from MRI Using Multi-View CNN

arXiv.org Machine Learning

Anatomical and biophysical modeling of left atrium (LA) and proximal pulmonary veins (PPVs) is important for clinical management of several cardiac diseases. Magnetic resonance imaging (MRI) allows qualitative assessment of LA and PPVs through visualization. However, there is a strong need for an advanced image segmentation method to be applied to cardiac MRI for quantitative analysis of LA and PPVs. In this study, we address this unmet clinical need by exploring a new deep learning-based segmentation strategy for quantification of LA and PPVs with high accuracy and heightened efficiency. Our approach is based on a multi-view convolutional neural network (CNN) with an adaptive fusion strategy and a new loss function that allows fast and more accurate convergence of the backpropagation based optimization. After training our network from scratch by using more than 60K 2D MRI images (slices), we have evaluated our segmentation strategy to the STACOM 2013 cardiac segmentation challenge benchmark. Qualitative and quantitative evaluations, obtained from the segmentation challenge, indicate that the proposed method achieved the state-of-the-art sensitivity (90%), specificity (99%), precision (94%), and efficiency levels (10 seconds in GPU, and 7.5 minutes in CPU).


Email Spam Filtering: An Implementation with Python and Scikit-learn

@machinelearnbot

Text mining (deriving information from text) is a wide field which has gained popularity with the huge text data being generated. Automation of a number of applications like sentiment analysis, document classification, topic classification, text summarization, machine translation, etc has been done using machine learning models. Spam filtering is a beginner's example of document classification task which involves classifying an email as spam or non-spam (a.k.a. Spam box in your Gmail account is the best example of this. So lets get started in building a spam filter on a publicly available mail corpus.