Accuracy
WWE Elimination Chamber 2017: Predictions, Match Card For Final 'SmackDown' PPV Before WrestleMania 33
The road to WrestleMania 33 is in full swing, but there are still two pay-per-views on the schedule before the biggest WWE show of 2017. The first one is Elimination Chamber Sunday night, featuring the "SmackDown Live" roster. Elimination Chamber will likely establish multiple WrestleMania title matches, starting with the WWE World Championship match. Bray Wyatt, Baron Corbin, The Miz, AJ Styles and Dean Ambrose will all compete against John Cena for the No. 1 belt on "SmackDown Live." The SmackDown Women's Championship and the SmackDown Tag Team Championships will also be on the line at the PPV in Phoenix.
Learning detectors of malicious web requests for intrusion detection in network traffic
Machlica, Lukas, Bartos, Karel, Sofka, Michal
This paper proposes a generic classification system designed to detect security threats based on the behavior of malware samples. The system relies on statistical features computed from proxy log fields to train detectors using a database of malware samples. The behavior detectors serve as basic reusable building blocks of the multi-level detection architecture. The detectors identify malicious communication exploiting encrypted URL strings and domains generated by a Domain Generation Algorithm (DGA) which are frequently used in Command and Control (C&C), phishing, and click fraud. Surprisingly, very precise detectors can be built given only a limited amount of information extracted from a single proxy log. This way, the computational requirements of the detectors are kept low which allows for deployment on a wide range of security devices and without depending on traffic context such as DNS logs, Whois records, webpage content, etc. Results on several weeks of live traffic from 100+ companies having 350k+ hosts show correct detection with a precision exceeding 95% of malicious flows, 95% of malicious URLs and 90% of infected hosts. In addition, a comparison with a signature and rule-based solution shows that our system is able to detect significant amount of new threats.
So you want to be a Data Scientist ?
Microsoft has a mission, "To empower every person and every organization on the planet to achieve more" and they are great words, but action speaks louder than words. You will also hear the phrase'Democratizing AI" being used increasingly over the following months, in other words making Artificial Intelligence available to everyone. I have had the pleasure of working with some of Microsoft's finest over the last two days at a Hackathon in London, covering a number of aspects of Machine Learning, Cognitive Computing and Data Analytics - this post is making some of the learning accessible and available to you. Whether you are just curious or want to start a Data Science journey, this will get you started on your first data science experiment! You need nothing to get started, no previous knowledge, no credit card, just curiosity and a little patience.
Using Digital Fingerprints And Deep Learning To Fight Online Harassment
Activists protest against the gang-rape of a 16-year-old girl in Brazil last year – videos and images of the assault were posted to social media and circulated widely, revictimizing the woman. As Facebook defended itself in a German court against claims that it does too little to counter abusive content on its platform, one of its lawyers made the intriguing claim "There are billions of postings each day. You want us to employ a sort of wonder machine to detect each misuse. Such a machine doesn't exist." For a technology company with a heavy investment in deep learning and filtering technologies that has repeatedly run afoul of free speech advocates for its aggressive stance on content removal this is certainly a curious claim to make.
Efficient fetal-maternal ECG signal separation from two channel maternal abdominal ECG via diffusion-based channel selection
Li, Ruilin, Frasch, Martin G., Wu, Hau-tieng
There is a need for affordable, widely deployable maternal-fetal ECG monitors to improve maternal and fetal health during pregnancy and delivery. Based on the diffusion-based channel selection, here we present the mathematical formalism and clinical validation of an algorithm capable of accurate separation of maternal and fetal ECG from a two channel signal acquired over maternal abdomen.
Understanding machine learning #3: Confusion matrix - not all errors are equal
One of the most typical tasks in machine learning is classification tasks. It may seem that evaluating the effectiveness of such a model is easy. Let's assume that we have a model which, based on historical data, calculates if a client will pay back credit obligations. We evaluate 100 bank customers and our model correctly guesses in 93 instances. That may appear to be a good result – but is it really?
Landmark-Based Plan Recognition
Pereira, Ramon Fraga, Meneguzzi, Felipe
Recognition of goals and plans using incomplete evidence from action execution can be done efficiently by using planning techniques. In many applications it is important to recognize goals and plans not only accurately, but also quickly. In this paper, we develop a heuristic approach for recognizing plans based on planning techniques that rely on ordering constraints to filter candidate goals from observations. These ordering constraints are called landmarks in the planning literature, which are facts or actions that cannot be avoided to achieve a goal. We show the applicability of planning landmarks in two settings: first, we use it directly to develop a heuristic-based plan recognition approach; second, we refine an existing planning-based plan recognition approach by pre-filtering its candidate goals. Our empirical evaluation shows that our approach is not only substantially more accurate than the state-of-the-art in all available datasets, it is also an order of magnitude faster.
High-Dimensional Regularized Discriminant Analysis
Ramey, John A., Stein, Caleb K., Young, Phil D., Young, Dean M.
Regularized discriminant analysis (RDA), proposed by Friedman (1989), is a widely popular classifier that lacks interpretability and is impractical for high-dimensional data sets. Here, we present an interpretable and computationally efficient classifier called high-dimensional RDA (HDRDA), designed for the small-sample, high-dimensional setting. For HDRDA, we show that each training observation, regardless of class, contributes to the class covariance matrix, resulting in an interpretable estimator that borrows from the pooled sample covariance matrix. Moreover, we show that HDRDA is equivalent to a classifier in a reduced-feature space with dimension approximately equal to the training sample size. As a result, the matrix operations employed by HDRDA are computationally linear in the number of features, making the classifier well-suited for high-dimensional classification in practice. We demonstrate that HDRDA is often superior to several sparse and regularized classifiers in terms of classification accuracy with three artificial and six real high-dimensional data sets. Also, timing comparisons between our HDRDA implementation in the sparsediscrim R package and the standard RDA formulation in the klaR R package demonstrate that as the number of features increases, the computational runtime of HDRDA is drastically smaller than that of RDA.
A SMART Stochastic Algorithm for Nonconvex Optimization with Applications to Robust Machine Learning
Aravkin, Aleksandr, Davis, Damek
Noname manuscript No. (will be inserted by the editor) Abstract In this paper, we show how to transform any optimization problem that arises from fitting a machine learning model into one that (1) detects and removes contaminated data from the training set while (2) simultaneously fitting the trimmed model on the uncontaminated data that remains. To solve the resulting nonconvex optimization problem, we introduce a fast stochastic proximal-gradient algorithm that incorporates prior knowledge through nonsmooth regularization. Keywords Stochastic algorithms · Nonsmooth, nonconvex optimization · Trimmed estimators 1 Introduction Potential outliers in datasets can be identified in several ways. This work was funded by the Washington Research Foundation Data Science Professorship. This material is based upon work supported by the National Science Foundation under Award No. 1502405. A. Aravkin Department of Applied Mathematics University of Washington Seattle, WA 98195-4322, USA Email: saravkin@uw.edu For higher-dimensional data, several tests involving order statistics exist (so called L-estimators [23]), such as the three-sigma rule for Gaussian data, or trimming strategies for disregarding points that are furthest away from the mean. After potential outliers are removed from a dataset, models are fit on the remaining data. After fitting the model, potential outliers are again identified and removed and another model is fit [33].