Accuracy
Introduction to Formal Concept Analysis and Its Applications in Information Retrieval and Related Fields
This paper is a tutorial on Formal Concept Analysis (FCA) and its applications. FCA is an applied branch of Lattice Theory, a mathematical discipline which enables formalisation of concepts as basic units of human thinking and analysing data in the object-attribute form. Originated in early 80s, during the last three decades, it became a popular human-centred tool for knowledge representation and data analysis with numerous applications. Since the tutorial was specially prepared for RuS-SIR 2014, the covered FCA topics include Information Retrieval with a focus on visualisation aspects, Machine Learning, Data Mining and Knowledge Discovery, Text Mining and several others.
Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment
Zafar, Muhammad Bilal, Valera, Isabel, Rodriguez, Manuel Gomez, Gummadi, Krishna P.
Automated data-driven decision making systems are increasingly being used to assist, or even replace humans in many settings. These systems function by learning from historical decisions, often taken by humans. In order to maximize the utility of these systems (or, classifiers), their training involves minimizing the errors (or, misclassifications) over the given historical data. However, it is quite possible that the optimally trained classifier makes decisions for people belonging to different social groups with different misclassification rates (e.g., misclassification rates for females are higher than for males), thereby placing these groups at an unfair disadvantage. To account for and avoid such unfairness, in this paper, we introduce a new notion of unfairness, disparate mistreatment, which is defined in terms of misclassification rates. We then propose intuitive measures of disparate mistreatment for decision boundary-based classifiers, which can be easily incorporated into their formulation as convex-concave constraints. Experiments on synthetic as well as real world datasets show that our methodology is effective at avoiding disparate mistreatment, often at a small cost in terms of accuracy.
Probing the Geometry of Data with Diffusion Fr\'echet Functions
Martรญnez, Diego Hernรกn Dรญaz, Lee, Christine H., Kim, Peter T., Mio, Washington
Many complex ecosystems, such as those formed by multiple microbial taxa, involve intricate interactions amongst various sub-communities. The most basic relationships are frequently modeled as co-occurrence networks in which the nodes represent the various players in the community and the weighted edges encode levels of interaction. In this setting, the composition of a community may be viewed as a probability distribution on the nodes of the network. This paper develops methods for modeling the organization of such data, as well as their Euclidean counterparts, across spatial scales. Using the notion of diffusion distance, we introduce diffusion Frechet functions and diffusion Frechet vectors associated with probability distributions on Euclidean space and the vertex set of a weighted network, respectively. We prove that these functional statistics are stable with respect to the Wasserstein distance between probability measures, thus yielding robust descriptors of their shapes. We apply the methodology to investigate bacterial communities in the human gut, seeking to characterize divergence from intestinal homeostasis in patients with Clostridium difficile infection (CDI) and the effects of fecal microbiota transplantation, a treatment used in CDI patients that has proven to be significantly more effective than traditional treatment with antibiotics. The proposed method proves useful in deriving a biomarker that might help elucidate the mechanisms that drive these processes.
WWE Fastlane 2017: Start Time, Live Stream And PPV Info. Will WWE Superstars Make Surprise Returns?
WrestleMania 33 is less than a month away, but there is still one more pay-per-view before WWE's biggest event of 2017. WWE Fastlane will feature the top wrestlers from "Monday Night Raw" at Milwaukee's Bradley Center Sunday night. The final PPV before WrestleMania is often uneventful, but that likely won't be the case this year. Not only are a few titles expected to change hands, but a couple of WWE superstars could make surprise returns. Goldberg will challenge Kevin Owens for the WWE Universal Championship, which Owens has held for more than six months.
Meet the startup that's bootstrapping robot learning with human brainpower
A secretive Canadian startup called Kindred AI is teaching robots how to perform difficult dexterous tasks at superhuman speeds by pairing them with human "pilots" wearing virtual-reality headsets and holding motion-tracking controllers. The technology offers a fascinating glimpse of how humans might work in synchronization with machines in the future, and it shows how tapping into human capabilities might amplify the capabilities of automated systems. For all the worry over robots and artificial intelligence eliminating jobs, there are plenty of things that machines still cannot do. The company demonstrated the hardware to MIT Technology Review last week, and says it plans to launch a product aimed at retailers in the coming months. The long-term ambitions are far grander.
Conor McGregor Net Worth: UFC Star's Earnings, PPV Sales Could Land Him Big Payday Against Floyd Mayweather
The biggest holdup in making the boxing match between Conor McGregor and Floyd Mayweather a reality appears to be money. Both fighters are looking to make nine-figure paydays for what would undoubtedly become one of the highest-selling pay-per-views in history. Mayweather wants a guaranteed paycheck of $100 million to come out of retirement, and there is a precedent for the boxer earning such a large purse. It's how much he was paid to face Manny Pacquiao in 2015, and he eventually made more than twice that amount after receiving a cut of the PPV revenue. McGregor has asked for a $100 million payday, as well, but Mayweather isn't ready to agree to those terms. While McGregor is the UFC's biggest star, he's never had a purse of more than $3 million or even come close to approaching a nine-figure payday when factoring in his cut of the PPV revenue.
Optimization of distributions differences for classification
Bonyadi, Mohammad Reza, Tieng, Quang M., Reutens, David C.
In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability.
WrestleMania 33 Matches: Will AJ Styles Face Shane McMahon, Bray Wyatt Or Randy Orton At WWE's Biggest 2017 PPV?
With WrestleMania 33 just a month away, it's been clear for a few weeks what WWE's top superstars will be doing at the biggest pay-per-view of 2017. A match between Brock Lesnar and Goldberg has been confirmed, while The Undertaker is likely to face Roman Reigns. The future of AJ Styles, however, hasn't been as clear. Aside from various rumors, the longest-reigning WWE Champion of the past year didn't have an obvious opponent for WrestleMania 33. The latest edition of "SmackDown Live," however, has provided the spark for Styles' next feud.
Scalable Learning of Non-Decomposable Objectives
Eban, Elad ET., Schain, Mariano, Mackey, Alan, Gordon, Ariel, Saurous, Rif A., Elidan, Gal
Modern retrieval systems are often driven by an underlying machine learning model. The goal of such systems is to identify and possibly rank the few most relevant items for a given query or context. Thus, such systems are typically evaluated using a ranking-based performance metric such as the area under the precision-recall curve, the $F_\beta$ score, precision at fixed recall, etc. Obviously, it is desirable to train such systems to optimize the metric of interest. In practice, due to the scalability limitations of existing approaches for optimizing such objectives, large-scale retrieval systems are instead trained to maximize classification accuracy, in the hope that performance as measured via the true objective will also be favorable. In this work we present a unified framework that, using straightforward building block bounds, allows for highly scalable optimization of a wide range of ranking-based objectives. We demonstrate the advantage of our approach on several real-life retrieval problems that are significantly larger than those considered in the literature, while achieving substantial improvement in performance over the accuracy-objective baseline.
Building and Measuring Privacy-Preserving Predictive Blacklists
Melis, Luca, Pyrgelis, Apostolos, De Cristofaro, Emiliano
Collaborative security initiatives are increasingly often advocated to improve timeliness and effectiveness of threat mitigation. Among these, collaborative predictive blacklisting (CPB) aims to forecast attack sources based on alerts contributed by multiple organizations that might be targeted in similar ways. Alas, CPB proposals thus far have only focused on improving hit counts, but overlooked the impact of collaboration on false positives and false negatives. Moreover, sharing threat intelligence often prompts important privacy, confidentiality, and liability issues. In this paper, we first provide a comprehensive measurement analysis of two state-of-the-art CPB systems: one that uses a trusted central party to collect alerts [Soldo et al., Infocom'10] and a peer-to-peer one relying on controlled data sharing [Freudiger et al., DIMVA'15], studying the impact of collaboration on both correct and incorrect predictions. Then, we present a novel privacy-friendly approach that significantly improves over previous work, achieving a better balance of true and false positive rates, while minimizing information disclosure. Finally, we present an extension that allows our system to scale to very large numbers of organizations.