Latent truth discovery, LTD for short, refers to the problem of aggregating ltiple claims from various sources in order to estimate the plausibility of atements about entities. In the absence of a ground truth, this problem is highly challenging, when some sources provide conflicting claims and others no claims at all. In this work we provide an unsupervised stochastic inference procedure on top of a model that combines restricted Boltzmann machines with feed-forward neural networks to accurately infer the reliability of sources as well as the plausibility of statements about entities. In comparison to prior work our approach stands out (1) by allowing the incorporation of arbitrary features about sources and claims, (2) by generalizing from reliability per source towards a reliability function, and thus (3) enabling the estimation of source reliability even for sources that have provided no or very few claims, (4) by building on efficient and scalable stochastic inference algorithms, and (5) by outperforming the state-of-the-art by a considerable margin.
The problem of estimating event truths from conflicting agent opinions is investigated. An autoencoder learns the complex relationships between event truths, agent reliabilities and agent observations. A Bayesian network model is proposed to guide the learning of the autoencoder by modeling the dependence of agent reliabilities corresponding to different data samples. At the same time, it also models the social relationships between agents in the network. The proposed approach is unsupervised and is applicable when ground truth labels of events are unavailable. A variational inference method is used to jointly estimate the hidden variables in the Bayesian network and the parameters in the autoencoder. Simulations and experiments on real data suggest that the proposed method performs better than several other inference methods, including majority voting, the Bayesian Classifier Combination (BCC) method, the Community BCC method, and the recently proposed VISIT method.
When Facebook chief executive Mark Zuckerberg promised Congress that AI would help solve the problem of fake news, he revealed little in the way of how. New research brings us one step closer to figuring that out. In an extensive study that will be presented at a conference later this month, researchers from MIT, Qatar Computing Research Institute (QCRI), and Sofia University in Bulgaria tested over 900 possible variables for predicting a media outlet's trustworthiness--probably the largest set ever proposed. The researchers then trained a machine-learning model on different combinations of the variables to see which would produce the most accurate results. The best model accurately labeled news outlets with "low," "medium," or "high" factuality just 65% of the time.
Even at this early stage of the game, machine learning holds much promise, and is being applied to incredibly diverse fields – autonomous driving, medical screening, and supply-chain management. In many of these fields, the application of the technology has been extremely successful, predicting consumer demand and the outbreak of pandemics much more reliably than human intelligence. However, there remain some problems with the basic way in which machine learning works. ML algorithms require huge amounts of data, and data processing capability, to provide reliable predictions. Even if these resources are available, the algorithms can fail.
I am profiling APIs as part of my partnership with Streamdata.io, and my continued API Stack work. As part of my work, I am creating OpenAPI, Postman Collections, and APIs.json indexes for APIs in a variety of business sectors, and as I'm finishing up the profile for ParallelDots machine learning APIs, I am struck (again) by the importance of tags within OpenAPI definitions when it comes to defining what any API does, and something that will have significant effects on the growing machine learning, and artificial intelligence space. While profiling ParallelDots, I had to generate the OpenAPI definition from the Postman Collection they provide, which was void of any tags. I went through the handful of API paths, manually adding tags for each of the machine learning resources. Trying to capture what resources were available, allowing for the discovery, filtering, and execution of each individual machine learning model being exposed using a simple web API.