Performance Analysis
eSports Pro-Players Behavior During the Game Events: Statistical Analysis of Data Obtained Using the Smart Chair
Smerdov, Anton, Burnaev, Evgeny, Somov, Andrey
--T oday's competition between the professional eSports teams is so strong that in-depth analysis of players' performance literally crucial for creating a powerful team. There are two main approaches to such an estimation: obtaining features and metrics directly from the in-game data or collecting detailed information about the player including data on his/her physical training. While the correlation between the player's skill and in-game data has already been covered in many papers, there are very few works related to analysis of eSports athlete's skill through his/her physical behavior . We propose the smart chair platform which is to collect data on the person's behavior on the chair using an integrated accelerometer, a gyroscope and a magnetometer . We extract the important game events to define the players' physical reactions to them. The obtained data are used for training machine learning models in order to distinguish between the low-skilled and high-skilled players. We extract and figure out the key features during the game and discuss the results. I NTRODUCTION Nowadays eSports is a rapidly growing industry with more than billion players involved worldwide.
The Anatomy of a Cryptocurrency Pump-and-Dump Scheme
Xu, Jiahua, Livshits, Benjamin
While pump-and-dump schemes have attracted the attention of cryptocurrency observers and regulators alike, this paper represents the first detailed empirical query of pump-and-dump activities in cryptocurrency markets. We present a case study of a recent pump-and-dump event, investigate 412 pump-and-dump activities organized in Telegram channels from June 17, 2018 to February 26, 2019, and discover patterns in crypto-markets associated with pump-and-dump schemes. We then build a model that predicts the pump likelihood of all coins listed in a crypto-exchange prior to a pump. The model exhibits high precision as well as robustness, and can be used to create a simple, yet very effective trading strategy, which we empirically demonstrate can generate a return as high as 60% on small retail investments within a span of two and half months. The study provides a proof of concept for strategic crypto-trading and sheds light on the application of machine learning for crime detection.
Consistent Feature Construction with Constrained Genetic Programming for Experimental Physics
Cherrier, Noรซlie, Poli, Jean-Philippe, Defurne, Maxime, Sabatiรฉ, Franck
A good feature representation is a determinant factor to achieve high performance for many machine learning algorithms in terms of classification. This is especially true for techniques that do not build complex internal representations of data (e.g. decision trees, in contrast to deep neural networks). To transform the feature space, feature construction techniques build new high-level features from the original ones. Among these techniques, Genetic Programming is a good candidate to provide interpretable features required for data analysis in high energy physics. Classically, original features or higher-level features based on physics first principles are used as inputs for training. However, physicists would benefit from an automatic and interpretable feature construction for the classification of particle collision events. Our main contribution consists in combining different aspects of Genetic Programming and applying them to feature construction for experimental physics. In particular, to be applicable to physics, dimensional consistency is enforced using grammars. Results of experiments on three physics datasets show that the constructed features can bring a significant gain to the classification accuracy. To the best of our knowledge, it is the first time a method is proposed for interpretable feature construction with units of measurement, and that experts in high-energy physics validate the overall approach as well as the interpretability of the built features.
An algorithm could play a major role in helping radiologists diagnose cancer early, accurately
Breast cancer is the leading cause of cancer-related death among women, and it is difficult to diagnose. Nearly 1 in 10 cancers is misdiagnosed as not cancerous; on the other hand, the more mammograms a woman has, the greater the chance she will see a false positive result and face an unnecessary invasive procedure--most likely a biopsy. More accurate diagnostic techniques are emerging. But what if instead we relied on the guidance of an algorithm? Assad Oberai, Hughes Professor in the Aerospace and Mechanical Engineering Department at the USC Viterbi School of Engineering, asked this exact question in a recent paper published in ScienceDirect.
Using Wasserstein-2 regularization to ensure fair decisions with Neural-Network classifiers
Risser, Laurent, Vincenot, Quentin, Couellan, Nicolas, Loubes, Jean-Michel
In this paper, we propose a new method to build fair Neural-Network classifiers by using a constraint based on the Wasserstein distance. More specifically, we detail how to efficiently compute the gradients of Wasserstein-2 regularizers for Neural-Networks. The proposed strategy is then used to train Neural-Networks decision rules which favor fair predictions. Our method fully takes into account two specificities of Neural-Networks training: (1) The network parameters are indirectly learned based on automatic differentiation and on the loss gradients, and (2) batch training is the gold standard to approximate the parameter gradients, as it requires a reasonable amount of computations and it can efficiently explore the parameters space. Results are shown on synthetic data, as well as on the UCI Adult Income Dataset. Our method is shown to perform well compared with 'ZafarICWWW17' and linear-regression with Wasserstein-1 regularization, as in 'JiangUAI19', in particular when non-linear decision rules are required for accurate predictions.
With Malice Towards None: Assessing Uncertainty via Equalized Coverage
Romano, Yaniv, Barber, Rina Foygel, Sabatti, Chiara, Candรจs, Emmanuel J.
We are increasingly turning to machine learning systems to support human decisions. While decision makers may be subject to many forms of prejudice and bias, the promise and hope is that machines would be able to make more equitable decisions. Unfortunately, whether because they are fitted on already biased data or otherwise, there are concerns that some of these data driven recommendation systems treat members of different classes differently, perpetrating biases, providing different degrees of utilities, and inducing disparities. The examples that have emerged are quite varied: 1. Criminal justice: courts in the United States use COMP AS--a commercially available algorithm to assess a criminal defendant's likelihood of becoming a recidivist--to help them decide who should receive parole, based on records collected through the criminal justice system. In 2016 ProPublica analyzed COMP AS and "found that black defendants were far more likely than white defendants to be incorrectly judged to be at a higher risk of recidivism, while white defendants were more likely than black defendants to be incorrectly flagged as low risk" [1].
TAPER: Time-Aware Patient EHR Representation
Darabi, Sajad, Kachuee, Mohammad, Fazeli, Shayan, Sarrafzadeh, Majid
--Effective representation learning of electronic health records is a challenging task and is becoming more important as the availability of such data is becoming pervasive. The data contained in these records are irregular and contain multiple modalities such as notes, and medical codes. They are preempted by medical conditions the patient may have, and are typically recorded by medical staff. Accompanying codes are notes containing valuable information about patients beyond the structured information contained in electronic health records. We use transformer networks and the recently proposed BERT language model to embed these data streams into a unified vector representation. The presented approach effectively encodes a patient's visit data into a single distributed representation, which can be used for downstream tasks. Our model demonstrates superior performance and generalization on mortality, readmission and length of stay tasks using the publicly available MIMIC-III ICU dataset. LECTRONIC health records (EHR) are commonly adopted in hospitals to improve patient care. In an intensive care unit (ICU), various data sources are collected on a daily basis as preempted by medical staff as the patient undergoes care in the unit. The collected data consists of data from different modalities: medical codes such as diagnosis which are standardized by well-organized ontology's like the International Classification of Disease (ICD) Additionally, lab tests and bedside monitoring devices are used to collect signals each of which are collected at varying frequencies for a quantitative measure of the patient care.
A Machine Learning Approach for Smartphone-based Sensing of Roads and Driving Style
Road transportation is of critical importance for a nation, having profound effects in the economy, the health and life style of its people. With the growth of cities and populations come bigger demands for mobility and safety, creating new problems and magnifying those of the past. New tools are needed to face the challenge, to keep roads in good conditions, their users safe, and minimize the impact on the environment. This dissertation is concerned with road quality assessment and aggressive driving, two important problems in road transportation, approached in the context of Intelligent Transportation Systems by using Machine Learning techniques to analyze acceleration time series acquired with smartphone-based opportunistic sensing to automatically detect, classify, and characterize events of interest. Two aspects of road quality assessment are addressed: the detection and the characterization of road anomalies. For the first, the most widely cited works in the literature are compared and proposals capable of equal or better performance are presented, removing the reliance on threshold values and reducing the computational cost and dimensionality of previous proposals. For the second, new approaches for the estimation of pothole depth and the functional condition of speed reducers are showed. The new problem of pothole depth ranking is introduced, using a learning-to-rank approach to sort acceleration signals by the depth of the potholes that they reflect. The classification of aggressive driving maneuvers is done with automatic feature extraction, finding characteristically shaped subsequences in the signals as more effective discriminants than conventional descriptors calculated over time windows. Finally, all the previously mentioned tasks are combined to produce a robust road transport evaluation platform.
Towards automated symptoms assessment in mental health
Activity and motion analysis has the potential to be used as a diagnostic tool for mental disorders. However, to-date, little work has been performed in turning stratification measures of activity into useful symptom markers. The research presented in this thesis has focused on the identification of objective activity and behaviour metrics that could be useful for the analysis of mental health symptoms in the above mentioned dimensions. Particular attention is given to the analysis of objective differences between disorders, as well as identification of clinical episodes of mania and depression in bipolar patients, and deterioration in borderline personality disorder patients. A principled framework is proposed for mHealth monitoring of psychiatric patients, based on measurable changes in behaviour, represented in physical activity time series, collected via mobile and wearable devices. The framework defines methods for direct computational analysis of symptoms in disorganisation and psychomotor dimensions, as well as measures for indirect assessment of mood, using patterns of physical activity, sleep and circadian rhythms. The approach of computational behaviour analysis, proposed in this thesis, has the potential for early identification of clinical deterioration in ambulatory patients, and allows for the specification of distinct and measurable behavioural phenotypes, thus enabling better understanding and treatment of mental disorders.
Maximum Relevance and Minimum Redundancy Feature Selection Methods for a Marketing Machine Learning Platform
Zhao, Zhenyu, Anand, Radhika, Wang, Mallory
In machine learning applications for online product offerings and marketing strategies, there are often hundreds or thousands of features available to build such models. Feature selection is one essential method in such applications for multiple objectives: improving the prediction accuracy by eliminating irrelevant features, accelerating the model training and prediction speed, reducing the monitoring and maintenance workload for feature data pipeline, and providing better model interpretation and diagnosis capability. However, selecting an optimal feature subset from a large feature space is considered as an NP-complete problem. The mRMR (Minimum Redundancy and Maximum Relevance) feature selection framework solves this problem by selecting the relevant features while controlling for the redundancy within the selected features. This paper describes the approach to extend, evaluate, and implement the mRMR feature selection methods for classification problem in a marketing machine learning platform at Uber that automates creation and deployment of targeting and personalization models at scale. This study first extends the existing mRMR methods by introducing a non-linear feature redundancy measure and a model-based feature relevance measure. Then an extensive empirical evaluation is performed for eight different feature selection methods, using one synthetic dataset and three real-world marketing datasets at Uber to cover different use cases. Based on the empirical results, the selected mRMR method is implemented in production for the marketing machine learning platform. A description of the production implementation is provided and an online experiment deployed through the platform is discussed.