Researchers from Microsoft, along with a team from Arizona State University, have published a work that has outperformed the current state-of-the-art models that detect fake news. Though the prevalence and promotion of misinformation have been since time immemorial, today, thanks to the convenience for access provided by the internet, fake news is rampant and has affected healthy conversations. Given the rapidly evolving nature of news events and the limited amount of annotated data, state-of-the-art systems on fake news detection face challenges due to the lack of large numbers of annotated training instances that are hard to come by for early detection. In this work, the authors exploited multiple weak signals from different user engagements. They call this approach multi-source weak social supervision or MWSS.
The automatic identification of propaganda has gained significance in recent years due to technological and social changes in the way news is generated and consumed. That this task can be addressed effectively using BERT, a powerful new architecture which can be fine-tuned for text classification tasks, is not surprising. However, propaganda detection, like other tasks that deal with news documents and other forms of decontextualized social communication (e.g. sentiment analysis), inherently deals with data whose categories are simultaneously imbalanced and dissimilar. We show that BERT, while capable of handling imbalanced classes with no additional data augmentation, does not generalise well when the training and test data are sufficiently dissimilar (as is often the case with news sources, whose topics evolve over time). We show how to address this problem by providing a statistical measure of similarity between datasets and a method of incorporating cost-weighting into BERT when the training and test sets are dissimilar. We test these methods on the Propaganda Techniques Corpus (PTC) and achieve the second-highest score on sentence-level propaganda classification.
This article is part of Demystifying AI, a series of posts that (try to) disambiguate the jargon and myths surrounding AI. Machine learning, the subset of artificial intelligence that teaches computers to perform tasks through examples and experience, is a hot area of research and development. Many of the applications we use daily use machine learning algorithms, including AI assistants, web search and machine translation. Your social media news feed is powered by a machine learning algorithm. The recommended videos you see on YouTube and Netflix are the result of a machine learning model.
Millions of news articles are published online every day, which can be overwhelming for readers to follow. Grouping articles that are reporting the same event into news stories is a common way of assisting readers in their news consumption. However, it remains a challenging research problem to efficiently and effectively generate a representative headline for each story. Automatic summarization of a document set has been studied for decades, while few studies have focused on generating representative headlines for a set of articles. Unlike summaries, which aim to capture most information with least redundancy, headlines aim to capture information jointly shared by the story articles in short length, and exclude information that is too specific to each individual article. In this work, we study the problem of generating representative headlines for news stories. We develop a distant supervision approach to train large-scale generation models without any human annotation. This approach centers on two technical components. First, we propose a multi-level pre-training framework that incorporates massive unlabeled corpus with different quality-vs.-quantity balance at different levels. We show that models trained within this framework outperform those trained with pure human curated corpus. Second, we propose a novel self-voting-based article attention layer to extract salient information shared by multiple articles. We show that models that incorporate this layer are robust to potential noises in news stories and outperform existing baselines with or without noises. We can further enhance our model by incorporating human labels, and we show our distant supervision approach significantly reduces the demand on labeled data.
One of the main tasks in argument mining is the retrieval of argumentative content pertaining to a given topic. Most previous work addressed this task by retrieving a relatively small number of relevant documents as the initial source for such content. This line of research yielded moderate success, which is of limited use in a real-world system. Furthermore, for such a system to yield a comprehensive set of relevant arguments, over a wide range of topics, it requires leveraging a large and diverse corpus in an appropriate manner. Here we present a first end-to-end high-precision, corpus-wide argument mining system. This is made possible by combining sentence-level queries over an appropriate indexing of a very large corpus of newspaper articles, with an iterative annotation scheme. This scheme addresses the inherent label bias in the data and pinpoints the regions of the sample space whose manual labeling is required to obtain high-precision among top-ranked candidates. 1 Introduction Starting with the seminal work of Mochales Palau and Moens (2009), argument mining has mainly focused on the following tasks - identifying argumentative text segments within a given document; labeling these text segments according to the type of argument and its stance; and elucidating the discourse relations among the detected arguments. Typically, the considered documents were argumentative in nature, taken from a well defined domain, such as legal documents or student essays. More recently, some attention had been given to the corresponding retrieval task - given a controversial topic, retrieve arguments with a clear stance towards this topic. This is usually done by first retrieving - manually or automatically - documents relevant to the topic, and then using argument mining techniques to identify relevant argumentative segments therein. This documents-based approach was originally explored over Wikipedia (Levy et al. 2014; Rinott et al. 2015), and more recently over the entire Web (Stab et al. 2018). This approach is most suitable for topics of much controversy, where one can find documents directly addressing the debate, in which relevant argumentative text segments are abundant.Copyright c null 2020, Association for the Advancement of Artificial Intelligence (www.aaai.org).
If I understand correctly, both CPC and AlexNet used the same set of training images. CPC just didn't use labels, while AlexNet did. So, what about instances where a self-supervised network can be trained on 10,000x as much data as would be economically feasible to label? In these cases, are supervised learning's days numbered? The application I'm personally most interested in is self-driving cars.
Methods that calculate dense vector representations for text have proven to be very successful for knowledge representation. We study how to estimate dense representations for multi-modal data (e.g., text, continuous, categorical). We propose Feat2Vec as a novel model that supports supervised learning when explicit labels are available, and self-supervised learning when there are no labels. Feat2Vec calculates embeddings for data with multiple feature types, enforcing that all embeddings exist in a common space. We believe that we are the first to propose a method for learning self-supervised embeddings that leverage the structure of multiple feature types. Our experiments suggest that Feat2Vec outperforms previously published methods, and that it may be useful for avoiding the cold-start problem.
Self-supervised learning uses way more supervisory signals than supervised learning, and enormously more than reinforcement learning. That's why calling it "unsupervised" is totally misleading. That's also why more knowledge about the structure of the world can be learned through self-supervised learning than from the other two paradigms: the data is unlimited, and amount of feedback provided by each example is huge.
The majority of conversations a dialogue agent sees over its lifetime occur after it has already been trained and deployed, leaving a vast store of potential training signal untapped. In this work, we propose the self-feeding chatbot, a dialogue agent with the ability to extract new training examples from the conversations it participates in. As our agent engages in conversation, it also estimates user satisfaction in its responses. When the conversation appears to be going well, the user's responses become new training examples to imitate. When the agent believes it has made a mistake, it asks for feedback; learning to predict the feedback that will be given improves the chatbot's dialogue abilities further. On the PersonaChat chit-chat dataset with over 131k training examples, we find that learning from dialogue with a self-feeding chatbot significantly improves performance, regardless of the amount of traditional supervision.