Social Media


Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

Neural Information Processing Systems

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity.


Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing

Neural Information Processing Systems

The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor.


Facebook Gives Workers a Chatbot to Appease That Prying Uncle

#artificialintelligence

The answers were put together by Facebook's public relations department, parroting what company executives have publicly said. And the chatbot has a name: the "Liam Bot." (The provenance of the name is unclear.) "Our employees regularly ask for information to use with friends and family on topics that have been in the news, especially around the holidays," a Facebook spokeswoman said. "We put this into a chatbot, which we began testing this spring." Facebook's reputation has been shredded by a string of scandals -- including how the site spreads disinformation and can be used to meddle in elections -- in recent years.


Deep Tech with Avrohom Gottheil

#artificialintelligence

We are living in an era of information overload, where there is an overabundance of information, yet at the same time, it's hard to ascertain what's credible and what's not. Technology changes rapidly, and innovation is on the rise. How can we educate ourselves to know: (1) What products are available on the market? INTERVIEW HIGHLIGHTS: This episode of #AskTheCEO features a presentation Avrohom Gottheil gave in New Delhi, India for India's First Annual Deep Tech Summit, titled Deep Tech for All. "Time is the new currency, and that is what's driving the mass adoption of voice-based technology in the marketplace", said Avrohom [13:30] J. Dianne Dotson, Science Fiction Writer and Research Scientist, shares how in the future we will be able to leverage AI to search global DNA databases, such as 23 and me, and analyze people's genomes for disease-causing proteins so that we can disable them and stop diseases from spreading, right from the source.


Multistage Campaigning in Social Networks

Neural Information Processing Systems

We consider control problems for multi-stage campaigning over social networks. The dynamic programming framework is employed to balance the high present reward and large penalty on low future outcome in the presence of extensive uncertainties. In particular, we establish theoretical foundations of optimal campaigning over social networks where the user activities are modeled as a multivariate Hawkes process, and we derive a time dependent linear relation between the intensity of exogenous events and several commonly used objective functions of campaigning. We further develop a convex dynamic programming framework for determining the optimal intervention policy that prescribes the required level of external drive at each stage for the desired campaigning result. Experiments on both synthetic data and the real-world MemeTracker dataset show that our algorithm can steer the user activities for optimal campaigning much more accurately than baselines.


niderhoff/nlp-datasets

#artificialintelligence

Most stuff here is just raw unstructured text data, if you are looking for annotated corpora or Treebanks refer to the sources at the bottom. Blog Authorship Corpus: consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. Amazon Fine Food Reviews [Kaggle]: consists of 568,454 food reviews Amazon users left up to October 2012. ASAP Automated Essay Scoring [Kaggle]: For this competition, there are eight essay sets. Each of the sets of essays was generated from a single prompt.


How To Implement A Chatbot For Your Webinar Campaign

#artificialintelligence

Chatbot expert Natasha Takahashi walks you through how to use a chatbot with your webinar campaign. Don't forget to register for Natasha's Chatbot workshop on December 10th. RESOURCES To Check Out: Become a DigitalMarketer Insider (For FREE ALWAYS): http://bit.ly/2KqdSlc She also shares her findings, cutting-edge strategies, and results with the world. The other 50% of her time is spent as CMO and Co-Founder of School of Bots, the trusted chatbot resource for marketers & entrepreneurs.


Inference Aided Reinforcement Learning for Incentive Mechanism Design in Crowdsourcing

Neural Information Processing Systems

Incentive mechanisms for crowdsourcing are designed to incentivize financially self-interested workers to generate and report high-quality labels. Existing mechanisms are often developed as one-shot static solutions, assuming a certain level of knowledge about worker models (expertise levels, costs for exerting efforts, etc.). In this paper, we propose a novel inference aided reinforcement mechanism that acquires data sequentially and requires no such prior assumptions. Specifically, we first design a Gibbs sampling augmented Bayesian inference algorithm to estimate workers' labeling strategies from the collected labels at each step. Then we propose a reinforcement incentive learning (RIL) method, building on top of the above estimates, to uncover how workers respond to different payments.


Thy Friend is My Friend: Iterative Collaborative Filtering for Sparse Matrix Estimation

Neural Information Processing Systems

The sparse matrix estimation problem consists of estimating the distribution of an $n\times n$ matrix $Y$, from a sparsely observed single instance of this matrix where the entries of $Y$ are independent random variables. This captures a wide array of problems; special instances include matrix completion in the context of recommendation systems, graphon estimation, and community detection in (mixed membership) stochastic block models. Inspired by classical collaborative filtering for recommendation systems, we propose a novel iterative, collaborative filtering-style algorithm for matrix estimation in this generic setting. We show that the mean squared error (MSE) of our estimator converges to $0$ at the rate of $O(d 2 (pn) {-2/5})$ as long as $\omega(d 5 n)$ random entries from a total of $n 2$ entries of $Y$ are observed (uniformly sampled), $\E[Y]$ has rank $d$, and the entries of $Y$ have bounded support. The maximum squared error across all entries converges to $0$ with high probability as long as we observe a little more, $\Omega(d 5 n \ln 5(n))$ entries.


A Minimax Optimal Algorithm for Crowdsourcing

Neural Information Processing Systems

We consider the problem of accurately estimating the reliability of workers based on noisy labels they provide, which is a fundamental question in crowdsourcing. We propose a novel lower bound on the minimax estimation error which applies to any estimation procedure. We further propose Triangular Estimation (TE), an algorithm for estimating the reliability of workers. TE has low complexity, may be implemented in a streaming setting when labels are provided by workers in real time, and does not rely on an iterative procedure. We prove that TE is minimax optimal and matches our lower bound.