In a world where seeing is increasingly no longer believing, experts are warning that society must take a multi-pronged approach to combat the potential harms of computer-generated media. As Bill Whitaker reports this week on 60 Minutes, artificial intelligence can manipulate faces and voices to make it look like someone said something they never said. The result is videos of things that never happened, called "deepfakes." Often, they look so real, people watching can't tell. Just this month, Justin Bieber was tricked by a series of deepfake videos on the social media video platform TikTok that appeared to be of Tom Cruise.
A "like" icon seen through raindrops. WASHINGTON: Researchers at Georgetown University's Center for Security and Emerging Technology (CSET) are raising alarms about powerful artificial intelligence technology now more widely available that could be used to generate disinformation at a troubling scale. The warning comes after CSET researchers conducted experiments using the second and third versions of Generative Pre-trained Transformer (GPT-2 and GPT-3), a technology developed by San Francisco company OpenAI. GPT's text-generation capabilities are characterized by CSET researchers as "autocomplete on steroids." "We don't often think of autocomplete as being very capable, but with these large language models, the autocomplete is really capable, and you can tailor what you're starting with to get it to write all sorts of things," Andrew Lohn, senior research fellow at CSET, said during a recent event where researchers discussed their findings.
In 2021, recent innovations in machine learning have made a great deal of tasks more feasible, efficient, and precise than ever before. Based on analysis of MobiDev's AI team experience, we have listed the latest innovations in machine learning to benefit businesses in 2021-2022: Trend 1. TinyML It can take time for a web request to send data to a large server for it to be processed by a machine learning algorithm and then sent back. Instead, a more desirable approach might be to use ML programs on edge devices - we can achieve lower latency, lower power consumption, lower required bandwidth, and ensure user privacy. Trend 2. AutoML Auto-ML brings improved data labeling tools to the table and enables the possibility of automatic tuning of neural network architectures. Evgeniy Krasnokutsky PhD, AI/ML Solution Architect at MobiDev, explains: "Traditionally, data labeling has been done manually by outsourced labor. This brings in a great deal of risk due to human error. Since AutoML aptly automates much of the labeling process, the risk of human error is much lower."
The recent emergence of artificial intelligence (AI)-powered media manipulations has widespread societal implications for journalism and democracy,7 national security,1 and art.8,14 AI models have the potential to scale misinformation to unprecedented levels by creating various forms of synthetic media.21 For example, AI systems can synthesize realistic video portraits of an individual with full control of facial expressions, including eye and lip movement;11,18,34,35,36 clone a speaker's voice with a few training samples and generate new natural-sounding audio of something the speaker never said;2 synthesize visually indicated sound effects;28 generate high-quality, relevant text based on an initial prompt;31 produce photorealistic images of a variety of objects from text inputs;5,17,27 and generate photorealistic videos of people expressing emotions from only a single image.3,40 The technologies for producing machine-generated, fake media online may outpace the ability to manually detect and respond to such media. We developed a neural network architecture that combines instance segmentation with image inpainting to automatically remove people and other objects from images.13,39 Figure 1 presents four examples of participant-submitted images and their transformations. The AI, which we call a "target object removal architecture," detects an object, removes it, and replaces its pixels with pixels that approximate what the background should look like without the object.
This document sums up our results forthe NLP lecture at ETH in the spring semester 2021. In this work, a BERT based neural network model (Devlin et al.,2018) is applied to the JIGSAW dataset (Jigsaw/Conversation AI, 2019) in order to create a model identifying hateful and toxic comments (strictly seperated from offensive language) in online social platforms (English language), inthis case Twitter. Three other neural network architectures and a GPT-2 (Radfordet al., 2019) model are also applied on the provided data set in order to compare these different models. The trained BERT model is then applied on two different data sets to evaluate its generalisation power, namely on another Twitter data set (Tom Davidson, 2017) (Davidsonet al., 2017) and the data set HASOC 2019 (Thomas Mandl, 2019) (Mandl et al.,2019) which includes Twitter and also Facebook comments; we focus on the English HASOC 2019 data. In addition, it can be shown that by fine-tuning the trained BERT model on these two datasets by applying different transfer learning scenarios via retraining partial or all layers the predictive scores improve compared to simply applying the model pre-trained on the JIGSAW data set. Withour results, we get precisions from 64% to around 90% while still achieving acceptable recall values of at least lower 60s%, proving that BERT is suitable for real usecases in social platforms.
Today more than ever, people are voicing concerns regarding biases in news media. Especially in the political arena, there are accusations of favouritism or disfavour in reporting, often expressed through the emphasizing or ignoring of certain political actors, policies, events, or topics. Is it possible to develop objective and transparent data-driven methods to identify such biases, rather than relying on subjective human judgements? MIT researchers Samantha D'Alonzo and Max Tegmark say "yes," and have proposed an automated method for measuring media bias. The proposed data-driven approach produces results that are in close accordance with human-judgement classifications on left-right and establishment biases.
Part of speech (POS) tagging is a familiar NLP task. State of the art taggers routinely achieve token-level accuracies of over 97% on news body text, evidence that the problem is well understood. However, the register of English news headlines, "headlinese", is very different from the register of long-form text, causing POS tagging models to underperform on headlines. In this work, we automatically annotate news headlines with POS tags by projecting predicted tags from corresponding sentences in news bodies. We train a multi-domain POS tagger on both long-form and headline text and show that joint training on both registers improves over training on just one or naively concatenating training sets. We evaluate on a newly-annotated corpus of over 5,248 English news headlines from the Google sentence compression corpus, and show that our model yields a 23% relative error reduction per token and 19% per headline. In addition, we demonstrate that better headline POS tags can improve the performance of a syntax-based open information extraction system. We make POSH, the POS-tagged Headline corpus, available to encourage research in improved NLP models for news headlines.
The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. With the current usage of social media platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to reality. Automated classification of a text article as misinformation or disinformation is a challenging task. Even an expert in a particular domain has to explore multiple aspects before giving a verdict on the truthfulness of an article. In this work, we propose to use machine learning ensemble approach for automated classification of news articles. Our study explores different textual properties that can be used to distinguish fake contents from real. By using those properties, we train a combination of different machine learning algorithms using various ensemble methods and evaluate their performance on 4 real world datasets. Experimental evaluation confirms the superior performance of our proposed ensemble learner approach in comparison to individual learners. The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. Besides other use cases, news outlets benefitted from the widespread use of social media platforms by providing updated news in near real time to its subscribers. The news media evolved from newspapers, tabloids, and magazines to a digital form such as online news platforms, blogs, social media feeds, and other digital media formats . It became easier for consumers to acquire the latest news at their fingertips.