# Social Media

### Crowdsourcing via Pairwise Co-occurrences: Identifiability and Algorithms

The data deluge comes with high demands for data labeling. Crowdsourcing (or, more generally, ensemble learning) techniques aim to produce accurate labels via integrating noisy, non-expert labeling from annotators. The classic Dawid-Skene estimator and its accompanying expectation maximization (EM) algorithm have been widely used, but the theoretical properties are not fully understood. Tensor methods were proposed to guarantee identification of the Dawid-Skene model, but the sample complexity is a hurdle for applying such approaches---since the tensor methods hinge on the availability of third-order statistics that are hard to reliably estimate given limited data. In this paper, we propose a framework using pairwise co-occurrences of the annotator responses, which naturally admits lower sample complexity.

### Spectral Methods meet EM: A Provably Optimal Algorithm for Crowdsourcing

The Dawid-Skene estimator has been widely used for inferring the true labels from the noisy labels provided by non-expert crowdsourcing workers. However, since the estimator maximizes a non-convex log-likelihood function, it is hard to theoretically justify its performance. In this paper, we propose a two-stage efficient algorithm for multi-class crowd labeling problems. The first stage uses the spectral method to obtain an initial estimate of parameters. We show that our algorithm achieves the optimal convergence rate up to a logarithmic factor.

### Facebook Gives Workers a Chatbot to Appease That Prying Uncle

The answers were put together by Facebook's public relations department, parroting what company executives have publicly said. And the chatbot has a name: the "Liam Bot." (The provenance of the name is unclear.) "Our employees regularly ask for information to use with friends and family on topics that have been in the news, especially around the holidays," a Facebook spokeswoman said. "We put this into a chatbot, which we began testing this spring." Facebook's reputation has been shredded by a string of scandals -- including how the site spreads disinformation and can be used to meddle in elections -- in recent years.

### Deep Tech with Avrohom Gottheil

We are living in an era of information overload, where there is an overabundance of information, yet at the same time, it's hard to ascertain what's credible and what's not. Technology changes rapidly, and innovation is on the rise. How can we educate ourselves to know: (1) What products are available on the market? INTERVIEW HIGHLIGHTS: This episode of #AskTheCEO features a presentation Avrohom Gottheil gave in New Delhi, India for India's First Annual Deep Tech Summit, titled Deep Tech for All. "Time is the new currency, and that is what's driving the mass adoption of voice-based technology in the marketplace", said Avrohom [13:30] J. Dianne Dotson, Science Fiction Writer and Research Scientist, shares how in the future we will be able to leverage AI to search global DNA databases, such as 23 and me, and analyze people's genomes for disease-causing proteins so that we can disable them and stop diseases from spreading, right from the source.

### Multistage Campaigning in Social Networks

We consider control problems for multi-stage campaigning over social networks. The dynamic programming framework is employed to balance the high present reward and large penalty on low future outcome in the presence of extensive uncertainties. In particular, we establish theoretical foundations of optimal campaigning over social networks where the user activities are modeled as a multivariate Hawkes process, and we derive a time dependent linear relation between the intensity of exogenous events and several commonly used objective functions of campaigning. We further develop a convex dynamic programming framework for determining the optimal intervention policy that prescribes the required level of external drive at each stage for the desired campaigning result. Experiments on both synthetic data and the real-world MemeTracker dataset show that our algorithm can steer the user activities for optimal campaigning much more accurately than baselines.

### niderhoff/nlp-datasets

Most stuff here is just raw unstructured text data, if you are looking for annotated corpora or Treebanks refer to the sources at the bottom. Blog Authorship Corpus: consists of the collected posts of 19,320 bloggers gathered from blogger.com in August 2004. Amazon Fine Food Reviews [Kaggle]: consists of 568,454 food reviews Amazon users left up to October 2012. ASAP Automated Essay Scoring [Kaggle]: For this competition, there are eight essay sets. Each of the sets of essays was generated from a single prompt.

### How To Implement A Chatbot For Your Webinar Campaign

Chatbot expert Natasha Takahashi walks you through how to use a chatbot with your webinar campaign. Don't forget to register for Natasha's Chatbot workshop on December 10th. RESOURCES To Check Out: Become a DigitalMarketer Insider (For FREE ALWAYS): http://bit.ly/2KqdSlc She also shares her findings, cutting-edge strategies, and results with the world. The other 50% of her time is spent as CMO and Co-Founder of School of Bots, the trusted chatbot resource for marketers & entrepreneurs.

The sparse matrix estimation problem consists of estimating the distribution of an $n\times n$ matrix $Y$, from a sparsely observed single instance of this matrix where the entries of $Y$ are independent random variables. This captures a wide array of problems; special instances include matrix completion in the context of recommendation systems, graphon estimation, and community detection in (mixed membership) stochastic block models. Inspired by classical collaborative filtering for recommendation systems, we propose a novel iterative, collaborative filtering-style algorithm for matrix estimation in this generic setting. We show that the mean squared error (MSE) of our estimator converges to $0$ at the rate of $O(d 2 (pn) {-2/5})$ as long as $\omega(d 5 n)$ random entries from a total of $n 2$ entries of $Y$ are observed (uniformly sampled), $\E[Y]$ has rank $d$, and the entries of $Y$ have bounded support. The maximum squared error across all entries converges to $0$ with high probability as long as we observe a little more, $\Omega(d 5 n \ln 5(n))$ entries.