Performance Analysis
Floyd Mayweather vs. Conor McGregor: Ticket Prices, PPV Cost For 2017 Fight
The fight between Floyd Mayweather and Conor McGregor promises to be the biggest bout of 2017, though it won't be cheap to watch. Buying the fight on pay-per-view will cost nearly $100, and no tickets can be had for fewer than $500. The upcoming Aug. 26 bout has been compared to the fight between Mayweather and Manny Pacquiao on May 2, 2015. It generated a record 4.6 million buys, even though it cost $99.95 in HD. Mayweather-McGregor will also cost $99.95,
Kernel Method for Detecting Higher Order Interactions in multi-view Data: An Application to Imaging, Genetics, and Epigenetics
Alam, Md. Ashad, Lin, Hui-Yi, Calhoun, Vince, Wang, Yu-Ping
In this study, we tested the interaction effect of multimodal datasets using a novel method called the kernel method for detecting higher order interactions among biologically relevant mulit-view data. Using a semiparametric method on a reproducing kernel Hilbert space (RKHS), we used a standard mixed-effects linear model and derived a score-based variance component statistic that tests for higher order interactions between multi-view data. The proposed method offers an intangible framework for the identification of higher order interaction effects (e.g., three way interaction) between genetics, brain imaging, and epigenetic data. Extensive numerical simulation studies were first conducted to evaluate the performance of this method. Finally, this method was evaluated using data from the Mind Clinical Imaging Consortium (MCIC) including single nucleotide polymorphism (SNP) data, functional magnetic resonance imaging (fMRI) scans, and deoxyribonucleic acid (DNA) methylation data, respectfully, in schizophrenia patients and healthy controls. We treated each gene-derived SNPs, region of interest (ROI) and gene-derived DNA methylation as a single testing unit, which are combined into triplets for evaluation. In addition, cardiovascular disease risk factors such as age, gender, and body mass index were assessed as covariates on hippocampal volume and compared between triplets. Our method identified $13$-triplets ($p$-values $\leq 0.001$) that included $6$ gene-derived SNPs, $10$ ROIs, and $6$ gene-derived DNA methylations that correlated with changes in hippocampal volume, suggesting that these triplets may be important in explaining schizophrenia-related neurodegeneration. With strong evidence ($p$-values $\leq 0.000001$), the triplet ({\bf MAGI2, CRBLCrus1.L, FBXO28}) has the potential to distinguish schizophrenia patients from the healthy control variations.
On Measuring and Quantifying Performance: Error Rates, Surrogate Loss, and an Example in SSL
Loog, Marco, Krijthe, Jesse H., Jensen, Are C.
The aim of semi-supervised learning is to improve supervised learners by exploiting potentially large amounts of, typically easier to obtain, unlabeled data [1]. Up to now, however, semi-supervised learners have reported mixed results when it comes to such improvements: it is not always the case that semi-supervision results in lower expected error rates. On the contrary, severely deteriorated performances have been observed in empirical studies and theory shows that improvement guarantees can often only be provided under rather stringent conditions [2-5]. Now, the principal suggestion put forward in this chapter is that, when dealing with semi-supervised learning, one may not only want to study the (expected) error rates these classifiers produce, but also to measure the classifiers' performances by means of the intrinsic loss they may be optimizing in the first place. That is, for classification routines that optimize a so-called surrogate loss at training time--which is what many machine learning and Bayesian decision theoretic approaches do [6, 7], we propose to also investigate how this loss behaves on the test set as this can provide us with an alternative view on the classifier's behavior that a mere error rate cannot capture. In fact, though the main example is concerned with semi-supervision, we would like to argue that in other learning scenarios, similar considerations might be beneficial. For instance in active learning [8], where rather than sampling randomly from ones input data to provide these instances with labels, one aims to do the sampling in a systematic way, trying to keep labeling cost as low as one can or, similarly, to learn from as few labeled examples as possible. Also here it may (or, we believe, it should) be of interest to not only compare the error rates that different approaches (e.g.
Influence of Resampling on Accuracy of Imbalanced Classification
Burnaev, Evgeny, Erofeev, Pavel, Papanov, Artem
Generally, accurate prediction of the minor class is crucial but it's hard to achieve since there is not much information about the minor class. One approach to deal with this problem is to preliminarily resample the dataset, i.e., add new elements to the dataset or remove existing ones. Resampling can be done in various ways which raises the problem of choosing the most appropriate one. In this paper we experimentally investigate impact of resampling on classification accuracy, compare resampling methods and highlight key points and difficulties of resampling.
How an artificial brain could help us outsmart hackers - Artificial Intelligence
The big conceptual difference between deep learning and traditional machine learning is that deep learning is the first, and currently the only learning method that is capable of training directly on the raw data (e.g., the pixels in our face recognition example), without any need for feature extraction. When applying traditional machine learning, it is necessary to first convert the computer files from raw bytes to a list of features (e.g., important API calls, etc), and only then is this list of features fed into the machine learning module. Additionally, unlike traditional machine learning, which reaches a performance ceiling as the number of files it is trained on increases, deep learning can effectively improve as the datasets grow, to the extent of hundreds of millions of malicious and legitimate files. The results of benchmarks that compare the performance of deep learning vs traditional machine learning in cybersecurity show that deep learning results in a considerably higher detection rate and a lower false positive rate. As malware developers use more advanced methods to create new malware, the gap between the detection rates of deep learning vs traditional machine learning will grow wider; and in coming years it will be critical to rely on deep learning in order to have a realistic chance of foiling the most sophisticated attacks.
Residual Value Forecasting Using Asymmetric Cost Functions
Dress, Korbinian, Lessmann, Stefan, von Mettenheim, Hans-Jรถrg
Leasing is a popular channel to market new cars. Pricing a leasing contract is complicated because the leasing rate embodies an expectation of the residual value of the car after contract expiration. To aid lessors in their pricing decisions, the paper develops resale price forecasting models. A peculiarity of the leasing business is that forecast errors entail different costs. Identifying effective ways to address this characteristic is the main objective of the paper. More specifically, the paper contributes to the literature through i) consolidating and integrating previous work in forecasting with asymmetric cost of error functions, ii) systematically evaluating previous approaches and comparing them to a new approach, and iii) demonstrating that forecasting with asymmetric cost of error functions enhances the quality of decision support in car leasing. For example, under the assumption that the costs of overestimating resale prices is twice that of the opposite error, incorporating corresponding cost asymmetry into forecast model development reduces decision costs by about eight percent, compared to a standard forecasting model. Higher asymmetry produces even larger improvements.
Submodular Variational Inference for Network Reconstruction
Chen, Lin, Crawford, Forrest W, Karbasi, Amin
In real-world and online social networks, individuals receive and transmit information in real time. Cascading information transmissions (e.g. phone calls, text messages, social media posts) may be understood as a realization of a diffusion process operating on the network, and its branching path can be represented by a directed tree. The process only traverses and thus reveals a limited portion of the edges. The network reconstruction/inference problem is to infer the unrevealed connections. Most existing approaches derive a likelihood and attempt to find the network topology maximizing the likelihood, a problem that is highly intractable. In this paper, we focus on the network reconstruction problem for a broad class of real-world diffusion processes, exemplified by a network diffusion scheme called respondent-driven sampling (RDS). We prove that under realistic and general models of network diffusion, the posterior distribution of an observed RDS realization is a Bayesian log-submodular model.We then propose VINE (Variational Inference for Network rEconstruction), a novel, accurate, and computationally efficient variational inference algorithm, for the network reconstruction problem under this model. Crucially, we do not assume any particular probabilistic model for the underlying network. VINE recovers any connected graph with high accuracy as shown by our experimental results on real-life networks.
WWE Great Balls Of Fire 2017: Live Stream Info, Start Time, Match Card For 'Monday Night Raw' PPV
Four WWE superstars are unofficially in the running to be in the main event of SummerSlam 2017 on Aug. 20. The biggest match since WrestleMania 33 will be determined Sunday night at Great Balls of Fire 2017, WWE's next "Monday Night Raw" pay-per-view. Great Balls of Fire 2017 is scheduled to start at 8 p.m. EDT, and the pre-show gets underway at 7:30 p.m. EDT. Ordering the event on PPV costs $54.99, but fans can also watch the event with a live stream on the WWE Network. A subscription to the network costs $9.99 per month, though new subscribers get the first month free.
UFC 213 Betting Odds, PPV Info For Amanda Nunes vs. Valentina Shevchenko And Entire Fight Card
For the first time in 2017, Amanda Nunes will defend her UFC women's bantamweight championship. She'll put the belt on the line Saturday night in the main event of UFC 213 in Las Vegas. Nunes hasn't stepped inside the octagon since she needed 48 seconds to defeat Ronda Rousey on Dec. 30. She headlined the pay-per-view in her previous two title fights, and it'll cost fans $69.99 to order Saturday's fight. The PPV starts at 10 p.m. EDT, though fans can watch the event with a live stream online at ufc.tv for $59.99.
Estimating network edge probabilities by neighborhood smoothing
Zhang, Yuan, Levina, Elizaveta, Zhu, Ji
The estimation of probabilities of network edges from the observed adjacency matrix has important applications to predicting missing links and network denoising. It has usually been addressed by estimating the graphon, a function that determines the matrix of edge probabilities, but this is ill-defined without strong assumptions on the network structure. Here we propose a novel computationally efficient method, based on neighborhood smoothing to estimate the expectation of the adjacency matrix directly, without making the structural assumptions that graphon estimation requires. The neighborhood smoothing method requires little tuning, has a competitive mean-squared error rate, and outperforms many benchmark methods on link prediction in simulated and real networks.