Performance Analysis
Bayesian Multi Plate High Throughput Screening of Compounds
Shterev, Ivo D., Dunson, David B., Chan, Cliburn, Sempowski, Gregory D.
High throughput screening of compounds (chemicals) is an essential part of drug discovery [7], involving thousands to millions of compounds, with the purpose of identifying candidate hits. Most statistical tools, including the industry standard B-score method, work on individual compound plates and do not exploit cross-plate correlation or statistical strength among plates. We present a new statistical framework for high throughput screening of compounds based on Bayesian nonparametric modeling. The proposed approach is able to identify candidate hits from multiple plates simultaneously, sharing statistical strength among plates and providing more robust estimates of compound activity. It can flexibly accommodate arbitrary distributions of compound activities and is applicable to any plate geometry. The algorithm provides a principled statistical approach for hit identification and false discovery rate control. Experiments demonstrate significant improvements in hit identification sensitivity and specificity over the B-score method, which is highly sensitive to threshold choice. The framework is implemented as an efficient R extension package BHTSpack and is suitable for large scale data sets.
Floyd Mayweather, Conor McGregor Fall Short Of PPV Record With 2017 Fight
It turns out that Floyd Mayweather and Conor McGregor didn't have the biggest fight of all time after all. Their boxing match on Aug. 26 reportedly generated 4.4 million pay-per-view buys, falling just shy of the record-setting 4.6 PPVs sold by Mayweather and Manny Pacquiao on May 2, 2015. During the two-month build towards the fight, which included a four-city press tour in the middle of July, there was talk that the undefeated boxer and the UFC star might set a new mark and possibly approach five million buys. Mayweather and McGregor will have to settle for second-best, according to BoxingScene, and an official announcement could come later this week. Finishing at No.2 is an achievement, nonetheless, as the bout sits comfortably ahead of the third-best selling fight in history.
The most secure way to lock your phone, revealed
People should stop using patterns to unlock their devices, researchers have warned. A new study has found that it's a lot easier for people who might be looking over your shoulder as you unlock your phone to memorise a pattern than a passcode. So-called "shoulder surfing attacks" can be easy for a criminal to plan and execute, but you can protect yourself by switching to a PIN code and increasing its length from four digits to six, the researchers say. They got over 1,000 volunteers to act as attackers, challenging them to memorise a range of unlocking authentications โ four- and six-digit PINs, and four- and six-length pa tterns with and without tracing lines โ by watching a victim over their shoulder from a variety of angles. The 5-inch Nexus 5 and 6-inch OnePlus One were the two handsets used in the study, as the researchers say they "are similar to a wide variety of displays and form factors available on the market today, for both Android and iPhone".
Quantifying the relation between performance and success in soccer
Pappalardo, Luca, Cintia, Paolo
The availability of massive data about sports activities offers nowadays the opportunity to quantify the relation between performance and success. In this study, we analyze more than 6,000 games and 10 million events in six European leagues and investigate this relation in soccer competitions. We discover that a team's position in a competition's final ranking is significantly related to its typical performance, as described by a set of technical features extracted from the soccer data. Moreover we find that, while victory and defeats can be explained by the team's performance during a game, it is difficult to detect draws by using a machine learning approach. We then simulate the outcomes of an entire season of each league only relying on technical data, i.e. excluding the goals scored, exploiting a machine learning model trained on data from past seasons. The simulation produces a team ranking (the PC ranking) which is close to the actual ranking, suggesting that a complex systems' view on soccer has the potential of revealing hidden patterns regarding the relation between performance and success.
Active learning in annotating micro-blogs dealing with e-reputation
Cossu, Jean-Valรจre, Molina-Villegas, Alejandro, Tello-Signoret, Mariana
Elections unleash strong political views on Twitter, but what do people really think about politics? Opinion and trend mining on micro blogs dealing with politics has recently attracted researchers in several fields including Information Retrieval and Machine Learning (ML). Since the performance of ML and Natural Language Processing (NLP) approaches are limited by the amount and quality of data available, one promising alternative for some tasks is the automatic propagation of expert annotations. This paper intends to develop a so-called active learning process for automatically annotating French language tweets that deal with the image (i.e., representation, web reputation) of politicians. Our main focus is on the methodology followed to build an original annotated dataset expressing opinion from two French politicians over time. We therefore review state of the art NLP-based ML algorithms to automatically annotate tweets using a manual initiation step as bootstrap. This paper focuses on key issues about active learning while building a large annotated data set from noise. This will be introduced by human annotators, abundance of data and the label distribution across data and entities. In turn, we show that Twitter characteristics such as the author's name or hashtags can be considered as the bearing point to not only improve automatic systems for Opinion Mining (OM) and Topic Classification but also to reduce noise in human annotations. However, a later thorough analysis shows that reducing noise might induce the loss of crucial information.
Scan $B$-Statistic for Kernel Change-Point Detection
Li, Shuang, Xie, Yao, Dai, Hanjun, Song, Le
Detecting the emergence of an abrupt change-point is a classic problem in statistics and machine learning. Kernel-based nonparametric statistics have been used for this task which enjoy fewer assumptions on the distributions than the parametric approach and can handle high-dimensional data. In this paper we focus on the scenario when the amount of background data is large, and propose two related computationally efficient kernel-based statistics for change-point detection, which are inspired by the recently developed $B$-statistics. A novel theoretical result of the paper is the characterization of the tail probability of these statistics using the change-of-measure technique, which focuses on characterizing the tail of the detection statistics rather than obtaining its asymptotic distribution under the null distribution. Such approximations are crucial to control the false alarm rate, which corresponds to the significance level in offline change-point detection and the average-run-length in online change-point detection. Our approximations are shown to be highly accurate. Thus, they provide a convenient way to find detection thresholds for both offline and online cases without the need to resort to the more expensive simulations or bootstrapping. We show that our methods perform well on both synthetic data and real data.
Improving your statistical inferences Coursera
About this course: This course aims to help you to draw better statistical inferences from empirical research. First, we will discuss how to correctly interpret p-values, effect sizes, confidence intervals, Bayes Factors, and likelihood ratios, and how these statistics answer different questions you might be interested in. Then, you will learn how to design experiments where the false positive rate is controlled, and how to decide upon the sample size for your study, for example in order to achieve high statistical power. Subsequently, you will learn how to interpret evidence in the scientific literature given widespread publication bias, for example by learning about p-curve analysis. Finally, we will talk about how to do philosophy of science, theory construction, and cumulative science, including how to perform replication studies, why and how to pre-register your experiment, and how to share your results following Open Science principles.
Lift (data mining) - Wikipedia
In data mining and association rule learning, lift is a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. A targeting model is doing a good job if the response within the target is much better than the average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response. For example, suppose a population has an average response rate of 5%, but a certain model (or rule) has identified a segment with a response rate of 20%. Then that segment would have a lift of 4.0 (20%/5%).
Editor's picks: The many applications of Machine Learning in banking
Can robots and data stop banks terror financing? Buying a new printer from ISIS is probably not how many people envision their stationary shopping to proceed. But it was only a month ago that the FBI announced that it had found a senior Islamic State (ISIS) official sent money to an alleged operative based in the US via a global financial network that used fake eBay sales to mask payments. This is a timely reminder about how vulnerable businesses can be to terrorist financing. Gurjeet Singh, co-founder and Executive Chairman of Ayasdi, spoke to bobsguide about the challenges of compliance with anti-money laundering, the characteristics of AI, and how AI is vastly improving false positive rates on suspicious security reports. It's a Catch 22: to get financial credit, you need a credit history; to get a credit history, someone has to give you credit.