positive event
Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms
Romero, Raphaël, Buyl, Maarten, De Bie, Tijl, Lijffijt, Jefrey
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the scores of observed interactions (positives) are compared with those of randomly generated ones (negatives). However, a single metric is not sufficient to fully capture the differences between DLP algorithms, and is prone to overly optimistic performance evaluation. Instead, an in-depth evaluation should reflect performance variations across different nodes, edges, and time segments. In this work, we contribute tools to perform such a comprehensive evaluation. (1) We propose Birth-Death diagrams, a simple but powerful visualization technique that illustrates the effect of time-based train-test splitting on the difficulty of DLP on a given dataset. (2) We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time. (3) We carry out an empirical study of the effect of the different negative sampling strategies. Our comparison between heuristics and state-of-the-art memory-based methods on various real-world datasets confirms a strong effect of using different negative sampling strategies on the test Area Under the Curve (AUC). Moreover, we conduct a visual exploration of the prediction, with additional insights on which different types of errors are prominent over time.
- North America > United States > New York > New York County > New York City (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
- (3 more...)
Unbiased Filtering Of Accidental Clicks in Verizon Media Native Advertising
Kaplan, Yohay, Krasne, Naama, Shtoff, Alex, Somekh, Oren
Verizon Media (VZM) native advertising is one of VZM largest and fastest growing businesses, reaching a run-rate of several hundred million USDs in the past year. Driving the VZM native models that are used to predict event probabilities, such as click and conversion probabilities, is OFFSET - a feature enhanced collaborative-filtering based event-prediction algorithm. In this work we focus on the challenge of predicting click-through rates (CTR) when we are aware that some of the clicks have short dwell-time and are defined as accidental clicks. An accidental click implies little affinity between the user and the ad, so predicting that similar users will click on the ad is inaccurate. Therefore, it may be beneficial to remove clicks with dwell-time lower than a predefined threshold from the training set. However, we cannot ignore these positive events, as filtering these will cause the model to under predict. Previous approaches have tried to apply filtering and then adding corrective biases to the CTR predictions, but did not yield revenue lifts and therefore were not adopted. In this work, we present a new approach where the positive weight of the accidental clicks is distributed among all of the negative events (skips), based on their likelihood of causing accidental clicks, as predicted by an auxiliary model. These likelihoods are taken as the correct labels of the negative events, shifting our training from using only binary labels and adopting a binary cross-entropy loss function in our training process. After showing offline performance improvements, the modified model was tested online serving VZM native users, and provided 1.18% revenue lift over the production model which is agnostic to accidental clicks.
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- Oceania > Australia (0.04)
- North America > United States > Texas > Irion County (0.04)
- (3 more...)
- Information Technology > Networks (0.61)
- Information Technology > Services (0.46)
New Perspectives on the Evaluation of Link Prediction Algorithms for Dynamic Graphs
Romero, Raphaël, De Bie, Tijl, Lijffijt, Jefrey
There is a fast-growing body of research on predicting future links in dynamic networks, with many new algorithms. Some benchmark data exists, and performance evaluations commonly rely on comparing the scores of observed network events (positives) with those of randomly generated ones (negatives). These evaluation measures depend on both the predictive ability of the model and, crucially, the type of negative samples used. Besides, as generally the case with temporal data, prediction quality may vary over time. This creates a complex evaluation space. In this work, we catalog the possibilities for negative sampling and introduce novel visualization methods that can yield insight into prediction performance and the dynamics of temporal networks. We leverage these visualization tools to investigate the effect of negative sampling on the predictive performance, at the node and edge level. We validate empirically, on datasets extracted from recent benchmarks that the error is typically not evenly distributed across different data segments. Finally, we argue that such visualization tools can serve as powerful guides to evaluate dynamic link prediction methods at different levels.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Data Science > Data Mining (0.89)
- Information Technology > Information Management > Search (0.65)
Data-driven Models to Anticipate Critical Voltage Events in Power Systems
De Caro, Fabrizio, Collin, Adam J., Vaccaro, Alfredo
This paper explores the effectiveness of data-driven models to predict voltage excursion events in power systems using simple categorical labels. By treating the prediction as a categorical classification task, the workflow is characterized by a low computational and data burden. A proof-of-concept case study on a real portion of the Italian 150 kV sub-transmission network, which hosts a significant amount of wind power generation, demonstrates the general validity of the proposal and offers insight into the strengths and weaknesses of several widely utilized prediction models for this application.
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.05)
- Europe > Italy (0.05)
- North America > United States > Mississippi (0.04)
- Energy > Power Industry (1.00)
- Energy > Renewable > Wind (0.35)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection
Few-shot bioacoustic event detection is a task that detects the occurrence time of a novel sound given a few examples. Previous methods employ metric learning to build a latent space with the labeled part of different sound classes, also known as positive events. In this study, we propose a segment-level few-shot learning framework that utilizes both the positive and negative events during model optimization. Training with negative events, which are larger in volume than positive events, can increase the generalization ability of the model. In addition, we use transductive inference on the validation set during training for better adaptation to novel classes.
Zap: Making Predictions Based on Online User Behavior
Chervonyi, Yuri, Harabor, Dragos, Zhang, Brian, Sacks, Josh
This paper introduces Zap, a generic machine learning pipeline for making predictions based on online user behavior. Zap combines well known techniques for processing sequential data with more obscure techniques such as Bloom filters, bucketing, and model calibration into an end-to-end solution. The pipeline creates website- and task-specific models without knowing anything about the structure of the website. It is designed to minimize the amount of website-specific code, which is realized by factoring all website-specific logic into example generators. New example generators can typically be written up in a few lines of code.
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
Performance measures in Azure ML: Accuracy, Precision, Recall and F1 Score.
This is the first of three articles about performance measures and graphs for binary learning models in Azure ML. Binary learning models are models which just predict one of two outcomes: positive or negative. These models are very well suited to drive decisions, such as whether to administer a patient a certain drug or to include a lead in a targeted marketing campaign. This first article lays the foundation by covering several statistical measures: accuracy, precision, recall and F1 score, These measures require a solid understanding of the two types of prediction errors which we will also cover: false positives and false negatives. In the second article we'll discuss the ROC curve and the related AUC measure. We'll also look at another graph in Azure ML called the Precision/Recall curve.
Dimensions of Self-Expression in Facebook Status Updates
Kramer, Adam D. I. (Facebook, Inc.) | Chung, Cindy K. (The University of Texas at Austin)
We describe the dimensions along which Facebook users tend to express themselves via status updates using the semi-automated text analysis approach, the Meaning Extraction Method (MEM). First, we examined dimensions of self-expression in all status updates from a sample of four million Facebook users from four English-speaking countries (the United States, Canada, the United Kingdom, and Australia) in order to examine how these countries vary in their self-expressions. All four countries showed a basic three-component structure, indicating that the medium is a stronger influence than country characteristics or demographics on how people use Facebook status updates. In each country, people vary in terms of the extent to which they use Informal Speech, share Positive Events, and discuss School in their Facebook status updates. Together, these factors tell us how users differ in their self-expression, and thus illustrate meaningful use cases for the product: Talking about what’s going on tends to be positive, and people vary in terms of the extent to which their status updates are short, slangy emotional expressions and topics regarding school. The specific words that define these factors showed subtle differences across countries: The use of profanity indicates fewer school words (but only in Australia), whereas the UK shows greater use of slang terms (rather than profanity) when speaking informally. The MEM also identified English-language dialects as a meaningful dimension along which the countries varied. In sum, beyond simply indicating topicality of posts, this study provides insight into how status updates are used for self-expression. We discuss several theoretical frameworks that could produce these results, and more broadly discuss the generation of theoretical frameworks from wholly empirical data (such as naturalistic Internet speech) using the MEM.
- Oceania > Australia (0.46)
- North America > Canada (0.26)
- North America > United States > Texas > Travis County > Austin (0.14)
- (10 more...)
- Information Technology > Services (1.00)
- Education (1.00)
A computational model of affects
Due to complexity and interdisciplinarity of affective phenomena, attempts to define them have often been unsatisfactory. This article provides a simple logical structure, in which affective concepts can be defined. The set of affects defined is similar to the set of emotions covered in the OCC model [1], but the model presented in this article is fully computationally defined, whereas the OCC model depends on undefined concepts. Following Matthis [2], affects are seen as unconscious, emotions as preconscious and feelings as conscious. Affects are thus a superclass of emotions and feelings with regards to consciousness.
- Europe > Finland > Uusimaa > Helsinki (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Finland > Northern Savo > Kuopio (0.04)