Goto

Collaborating Authors

 Performance Analysis


Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems

Journal of Artificial Intelligence Research

Many aspects of the design of efficient crowdsourcing processes, such as defining workers bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration of the task at hand. In this work we introduce a new timesensitive Bayesian aggregation method that simultaneously estimates a tasks duration and obtains reliable aggregations of crowdsourced judgments. Our method, called BCCTime, uses latent variables to represent the uncertainty about the workers completion time, the tasks duration and the workers accuracy. To relate the quality of a judgment to the time a worker spends on a task, our model assumes that each task is completed within a latent time window within which all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers) are expected to submit their judgments. In contrast, workers with a lower propensity to valid labelling, such as spammers, bots or lazy labellers, are assumed to perform tasks considerably faster or slower than the time required by normal workers. Specifically, we use efficient message-passing Bayesian inference to learn approximate posterior probabilities of (i) the confusion matrix of each worker, (ii) the propensity to valid labelling of each worker, (iii) the unbiased duration of each task and (iv) the true label of each task. Using two real- world public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a tasks duration compared to stateoftheart methods.


Identifying Depression on Twitter

arXiv.org Machine Learning

Social media has recently emerged as a premier method to disseminate information online. Through these online networks, tens of millions of individuals communicate their thoughts, personal experiences, and social ideals. We therefore explore the potential of social media to predict, even prior to onset, Major Depressive Disorder (MDD) in online personas. We employ a crowdsourced method to compile a list of Twitter users who profess to being diagnosed with depression. Using up to a year of prior social media postings, we utilize a Bag of Words approach to quantify each tweet. Lastly, we leverage several statistical classifiers to provide estimates to the risk of depression. Our work posits a new methodology for constructing our classifier by treating social as a text-classification problem, rather than a behavioral one on social media platforms. By using a corpus of 2.5M tweets, we achieved an 81% accuracy rate in classification, with a precision score of .86. We believe that this method may be helpful in developing tools that estimate the risk of an individual being depressed, can be employed by physicians, concerned individuals, and healthcare agencies to aid in diagnosis, even possibly enabling those suffering from depression to be more proactive about recovering from their mental health.


Scalable Link Prediction in Dynamic Networks via Non-Negative Matrix Factorization

arXiv.org Artificial Intelligence

We propose a scalable temporal latent space model for link prediction in dynamic social networks, where the goal is to predict links over time based on a sequence of previous graph snapshots. The model assumes that each user lies in an unobserved latent space and interactions are more likely to form between similar users in the latent space representation. In addition, the model allows each user to gradually move its position in the latent space as the network structure evolves over time. We present a global optimization algorithm to effectively infer the temporal latent space, with a quadratic convergence rate. Two alternative optimization algorithms with local and incremental updates are also proposed, allowing the model to scale to larger networks without compromising prediction accuracy. Empirically, we demonstrate that our model, when evaluated on a number of real-world dynamic networks, significantly outperforms existing approaches for temporal link prediction in terms of both scalability and predictive power.


8 Tactics to Combat Imbalanced Classes in Your Machine Learning Dataset - Machine Learning Mastery

#artificialintelligence

You are working on your dataset. You create a classification model and get 90% accuracy immediately. You dive a little deeper and discover that 90% of the data belongs to one class. This is an example of an imbalanced dataset and the frustrating results it can cause. In this post you will discover the tactics that you can use to deliver great results on machine learning datasets with imbalanced data.


Latent Variable Discovery Using Dependency Patterns

arXiv.org Machine Learning

The causal discovery of Bayesian networks is an active and important research area, and it is based upon searching the space of causal models for those which can best explain a pattern of probabilistic dependencies shown in the data. However, some of those dependencies are generated by causal structures involving variables which have not been measured, i.e., latent variables. Some such patterns of dependency "reveal" themselves, in that no model based solely upon the observed variables can explain them as well as a model using a latent variable. That is what latent variable discovery is based upon. Here we did a search for finding them systematically, so that they may be applied in latent variable discovery in a more rigorous fashion.


Untangling AdaBoost-based Cost-Sensitive Classification. Part II: Empirical Analysis

arXiv.org Artificial Intelligence

A lot of approaches, each following a different strategy, have been proposed in the literature to provide AdaBoost with cost-sensitive properties. In the first part of this series of two papers, we have presented these algorithms in a homogeneous notational framework, proposed a clustering scheme for them and performed a thorough theoretical analysis of those approaches with a fully theoretical foundation. The present paper, in order to complete our analysis, is focused on the empirical study of all the algorithms previously presented over a wide range of heterogeneous classification problems. The results of our experiments, confirming the theoretical conclusions, seem to reveal that the simplest approach, just based on cost-sensitive weight initialization, is the one showing the best and soundest results, despite having been recurrently overlooked in the literature.


An experiment in trying to predict Google rankings

#artificialintelligence

Machine learning is quickly becoming an indispensable tool for many large companies. Everyone has, for sure, heard about Google's AI algorithm beating the World Champion in Go, as well as technologies like RankBrain, but machine learning does not have to be a mystical subject relegated to the domain of math researchers. There are many approachable libraries and technologies that show promise of being very useful to any industry that has data to play with. Machine learning also has the ability to turn traditional website marketing and SEO on its head. Late last year, my colleagues and I (rather naively) began an experiment in which we threw several popular machine learning algorithms at the task of predicting ranking in Google. We ended up with an assembly that achieved 41 percent true positive and 41 percent true negative on our data set.


An experiment in trying to predict Google rankings

#artificialintelligence

Machine learning is quickly becoming an indispensable tool for many large companies. Everyone has, for sure, heard about Google's AI algorithm beating the World Champion in Go, as well as technologies like RankBrain, but machine learning does not have to be a mystical subject relegated to the domain of math researchers. There are many approachable libraries and technologies that show promise of being very useful to any industry that has data to play with. Machine learning also has the ability to turn traditional website marketing and SEO on its head. Late last year, my colleagues and I (rather naively) began an experiment in which we threw several popular machine learning algorithms at the task of predicting ranking in Google. We ended up with an assembly that achieved 41 percent true positive and 41 percent true negative on our data set. In the following paragraphs, I will take you through our experiment, and I will also discuss a few important libraries and technologies that are important for SEOs to begin understanding.


Left/Right Hand Segmentation in Egocentric Videos

arXiv.org Artificial Intelligence

Wearable cameras allow people to record their daily activities from a user-centered (First Person Vision) perspective. Due to their favorable location, wearable cameras frequently capture the hands of the user, and may thus represent a promising usermachine interaction tool for different applications. Existent First Person Vision methods handle hand segmentation as a backgroundforeground problem, ignoring two important facts: i) hands are not a single "skin-like" moving element, but a pair of interacting cooperative entities, ii) close hand interactions may lead to hand-to-hand occlusions and, as a consequence, create a single hand-like segment. These facts complicate a proper understanding of hand movements and interactions. Our approach extends traditional background-foreground strategies, by including a hand-identification step (left-right) based on a Maxwell distribution of angle and position. Hand-to-hand occlusions are addressed by exploiting temporal superpixels. The experimental results show that, in addition to a reliable left/right hand-segmentation, our approach considerably improves the traditional background-foreground hand-segmentation. Keywords: Hand-Segmentation, Hand-identification, Egocentric Vision, First Person Vision 1. Introduction The recent widespread availability of wearable devices has quickly attracted the interest of researchers, computer scientists and high-tech companies [1]. The 90's idea of a body-worn device that is always ready to be used is nowadays possible, and its potential applicability to real problems is evident. In general, the wearable sensor that most attracted researchers' attention is the video camera: while enjoying a unique position to record what the user is seeing, it suffers from important issues and technical challenges [2]. Images and videos recorded from this perspective are commonly referred to as First-Person Vision (FPV) or Egocentric videos [2].


Introduction to Naive Bayes

#artificialintelligence

I think there's a rule somewhere that says "You can't call yourself a data scientist until you've used a Naive Bayes classifier". This article is my attempt at laying the groundwork for Naive Bayes in a practical and intuitive fashion. Let's start with a problem to motivate our formulation of Naive Bayes. Suppose we own a professional networking site similar to LinkedIn. Users sign up, type some information about themselves, and then roam the network looking for jobs/connections/etc. Until recently, we only required users to enter their current job title, but now we're asking them what industry they work in.