Goto

Collaborating Authors

Ghasedi Dizaji

AAAI Conferences

Crowdsourcing technique provides an efficient platform to employ human skills in sentiment analysis, which is a difficult task for automatic language models due to the large variations in context, writing style, view point and so on. However, the standard crowdsourcing aggregation models are incompetent when the number of crowd labels per worker is not sufficient to train parameters, or when it is not feasible to collect labels for each sample in a large dataset. In this paper, we propose a novel hybrid model to exploit both crowd and text data for sentiment analysis, consisting of a generative crowdsourcing aggregation model and a deep sentimental autoencoder. Combination of these two sub-models is obtained based on a probabilistic framework rather than a heuristic way. We introduce a unified objective function to incorporate the objectives of both sub-models, and derive an efficient optimization algorithm to jointly solve the corresponding problem. Experimental results indicate that our model achieves superior results in comparison with the state-of-the-art models, especially when the crowd labels are scarce.


Improving Quality of Crowdsourced Labels via Probabilistic Matrix Factorization

AAAI Conferences

In crowdsourced relevance judging, each crowd workertypically judges only a small number of examples,yielding a sparse and imbalanced set of judgments inwhich relatively few workers influence output consensuslabels, particularly with simple consensus methodslike majority voting. We show how probabilistic matrixfactorization, a standard approach in collaborative filtering,can be used to infer missing worker judgments suchthat all workers influence output labels. Given completeworker judgments inferred by PMF, we evaluate impactin unsupervised and supervised scenarios. In thesupervised case, we consider both weighted voting andworker selection strategies based on worker accuracy.Experiments on a synthetic data set and a real turk dataset with crowd judgments from the 2010 TREC RelevanceFeedback Track show promise of the PMF approachmerits further investigation and analysis.


DATELINE: Deep Plackett-Luce Model with Uncertainty Measurements

arXiv.org Machine Learning

The aggregation of k-ary preferences is a historical and important problem, since it has many real-world applications, such as peer grading, presidential elections and restaurant ranking. Meanwhile, variants of Plackett-Luce model has been applied to aggregate k-ary preferences. However, there are two urgent issues still existing in the current variants. First, most of them ignore feature information. Namely, they consider k-ary preferences instead of instance-dependent k-ary preferences. Second, these variants barely consider the uncertainty in k-ary preferences provided by agnostic crowds. In this paper, we propose Deep plAckeTt-luce modEL wIth uNcertainty mEasurements (DATELINE), which can address both issues simultaneously. To address the first issue, we employ deep neural networks mapping each instance into its ranking score in Plackett-Luce model. Then, we present a weighted Plackett-Luce model to solve the second issue, where the weight is a dynamic uncertainty vector measuring the worker quality. More importantly, we provide theoretical guarantees for DATELINE to justify its robustness.



Pooling of Causal Models under Counterfactual Fairness via Causal Judgement Aggregation

arXiv.org Artificial Intelligence

In this paper we consider the problem of combining multiple probabilistic causal models, provided by different experts, under the requirement that the aggregated model satisfy the criterion of counterfactual fairness. We build upon the work on causal models and fairness in machine learning, and we express the problem of combining multiple models within the framework of opinion pooling. We propose two simple algorithms, grounded in the theory of counterfactual fairness and causal judgment aggregation, that are guaranteed to generate aggregated probabilistic causal models respecting the criterion of fairness, and we compare their behaviors on a toy case study.