The problem of estimating event truths from conflicting agent opinions is investigated. An autoencoder learns the complex relationships between event truths, agent reliabilities and agent observations. A Bayesian network model is proposed to guide the learning of the autoencoder by modeling the dependence of agent reliabilities corresponding to different data samples. At the same time, it also models the social relationships between agents in the network. The proposed approach is unsupervised and is applicable when ground truth labels of events are unavailable. A variational inference method is used to jointly estimate the hidden variables in the Bayesian network and the parameters in the autoencoder. Simulations and experiments on real data suggest that the proposed method performs better than several other inference methods, including majority voting, the Bayesian Classifier Combination (BCC) method, the Community BCC method, and the recently proposed VISIT method.
We address the problem of latent truth discovery, LTD for short, where the goal is to discover the underlying true values of entity attributes in the presence of noisy, conflicting or incomplete information. Despite a multitude of algorithms to address the LTD problem that can be found in literature, only little is known about their overall performance with respect to effectiveness (in terms of truth discovery capabilities), efficiency and robustness. A practical LTD approach should satisfy all these characteristics so that it can be applied to heterogeneous datasets of varying quality and degrees of cleanliness. We propose a novel algorithm for LTD that satisfies the above requirements. The proposed model is based on Restricted Boltzmann Machines, thus coined LTD-RBM. In extensive experiments on various heterogeneous and publicly available datasets, LTD-RBM is superior to state-of-the-art LTD techniques in terms of an overall consideration of effectiveness, efficiency and robustness.
Twitter has emerged as a new application paradigm of sensing the physical environment by using human as sensors. These human sensed observations are often viewed as binary claims (either true or false). A fundamental challenge on Twitter is how to ascertain the credibility of claims and the reliability of sources without the prior knowledge on either of them beforehand. This challenge is referred to as truth discovery. An important limitation exists in the current Twitter-based truth discovery solutions: they did not explore the theme relevance aspect of claims and the correct claims identified by their solutions can be completely irrelevant to the theme of interests. In this paper, we present a new analytical model that explicitly considers the theme relevance feature of claims in the solutions of truth discovery problem on Twitter. The new model solves a bi-dimensional estimation problem to jointly estimate the correctness and theme relevance of claims as well as the reliability and theme awareness of sources. The new model is compared with the discovery solutions in current literature using three real world datasets collected from Twitter during recent disastrous and emergent events: Paris attack, Oregon shooting, and Baltimore riots, all in 2015. The new model was shown to be effective in terms of finding both correct and relevant claims.
Nowadays, crowd sensing becomes increasingly more popular due to the ubiquitous usage of mobile devices. However, the quality of such human-generated sensory data varies significantly among different users. To better utilize sensory data, the problem of truth discovery, whose goal is to estimate user quality and infer reliable aggregated results through quality-aware data aggregation, has emerged as a hot topic. Although the existing truth discovery approaches can provide reliable aggregated results, they fail to protect the private information of individual users. Moreover, crowd sensing systems typically involve a large number of participants, making encryption or secure multi-party computation based solutions difficult to deploy. To address these challenges, in this paper, we propose an efficient privacy-preserving truth discovery mechanism with theoretical guarantees of both utility and privacy. The key idea of the proposed mechanism is to perturb data from each user independently and then conduct weighted aggregation among users' perturbed data. The proposed approach is able to assign user weights based on information quality, and thus the aggregated results will not deviate much from the true results even when large noise is added. We adapt local differential privacy definition to this privacy-preserving task and demonstrate the proposed mechanism can satisfy local differential privacy while preserving high aggregation accuracy. We formally quantify utility and privacy trade-off and further verify the claim by experiments on both synthetic data and a real-world crowd sensing system.
I am profiling APIs as part of my partnership with Streamdata.io, and my continued API Stack work. As part of my work, I am creating OpenAPI, Postman Collections, and APIs.json indexes for APIs in a variety of business sectors, and as I'm finishing up the profile for ParallelDots machine learning APIs, I am struck (again) by the importance of tags within OpenAPI definitions when it comes to defining what any API does, and something that will have significant effects on the growing machine learning, and artificial intelligence space. While profiling ParallelDots, I had to generate the OpenAPI definition from the Postman Collection they provide, which was void of any tags. I went through the handful of API paths, manually adding tags for each of the machine learning resources. Trying to capture what resources were available, allowing for the discovery, filtering, and execution of each individual machine learning model being exposed using a simple web API.