Online social networks provide a platform for sharing information and free expression. However, these networks are also used for malicious purposes, such as distributing misinformation and hate speech, selling illegal drugs, and coordinating sex trafficking or child exploitation. This paper surveys the state of the art in keeping online platforms and their users safe from such harm, also known as the problem of preserving integrity. This survey comes from the perspective of having to combat a broad spectrum of integrity violations at Facebook. We highlight the techniques that have been proven useful in practice and that deserve additional attention from the academic community. Instead of discussing the many individual violation types, we identify key aspects of the social-media eco-system, each of which is common to a wide variety violation types. Furthermore, each of these components represents an area for research and development, and the innovations that are found can be applied widely.
We present a data-driven approach using word embeddings to discover and categorise language biases on the discussion platform Reddit. As spaces for isolated user communities, platforms such as Reddit are increasingly connected to issues of racism, sexism and other forms of discrimination. Hence, there is a need to monitor the language of these groups. One of the most promising AI approaches to trace linguistic biases in large textual datasets involves word embeddings, which transform text into high-dimensional dense vectors and capture semantic relations between words. Yet, previous studies require predefined sets of potential biases to study, e.g., whether gender is more or less associated with particular types of jobs. This makes these approaches unfit to deal with smaller and community-centric datasets such as those on Reddit, which contain smaller vocabularies and slang, as well as biases that may be particular to that community. This paper proposes a data-driven approach to automatically discover language biases encoded in the vocabulary of online discourse communities on Reddit. In our approach, protected attributes are connected to evaluative words found in the data, which are then categorised through a semantic analysis system. We verify the effectiveness of our method by comparing the biases we discover in the Google News dataset with those found in previous literature. We then successfully discover gender bias, religion bias, and ethnic bias in different Reddit communities. We conclude by discussing potential application scenarios and limitations of this data-driven bias discovery method.
VIDEO: "the police began looking for me…" Google Sent Threatening Letter to Google Insider Zachary Vorhies: "they knew what I had done and that letter contained several demands" HUNDREDS of Internal Google Documents Leaked to Project Veritas… news blacklist, "human raters," YouTube CEO video… Google Insider Wants More Insiders to Blow Whistle: "people have been waiting for this Google Snowden moment where somebody comes out and explains what everybody already knows to be true" "I felt that our entire election system was going to be compromised forever, by this company that told the American public that it was not going to do any evil" The internal Google documents are available here. The insider, Zachary Vorhies, decided to go public after receiving a letter from Google, and after he says Google allegedly called the police to perform a "wellness check" on him. Along with the interview, Vorhies asked Project Veritas to publish more of the internal Google documents he had previously leaked. "I gave the documents to Project Veritas, I had been collecting the documents for over a year. And the reason why I collected these documents was because I saw something dark and nefarious going on with the company and I realized that there were going to not only tamper with the elections, but use that tampering with the elections to essentially overthrow the United States."
What if I told a story here, how would that story start?" Thus, the summarization prompt: "My second grader asked me what this passage means: …" When a given prompt isn't working and GPT-3 keeps pivoting into other modes of completion, that may mean that one hasn't constrained it enough by imitating a correct output, and one needs to go further; writing the first few words or sentence of the target output may be necessary.
This book discusses the necessity and perhaps urgency for the regulation of algorithms on which new technologies rely; technologies that have the potential to re-shape human societies. From commerce and farming to medical care and education, it is difficult to find any aspect of our lives that will not be affected by these emerging technologies. At the same time, artificial intelligence, deep learning, machine learning, cognitive computing, blockchain, virtual reality and augmented reality, belong to the fields most likely to affect law and, in particular, administrative law. The book examines universally applicable patterns in administrative decisions and judicial rulings. First, similarities and divergence in behavior among the different cases are identified by analyzing parameters ranging from geographical location and administrative decisions to judicial reasoning and legal basis. As it turns out, in several of the cases presented, sources of general law, such as competition or labor law, are invoked as a legal basis, due to the lack of current specialized legislation. This book also investigates the role and significance of national and indeed supranational regulatory bodies for advanced algorithms and considers ENISA, an EU agency that focuses on network and information security, as an interesting candidate for a European regulator of advanced algorithms. Lastly, it discusses the involvement of representative institutions in algorithmic regulation.
The 2016 United States presidential election was marked by the abuse of targeted advertising on Facebook. Concerned with the risk of the same kind of abuse to happen in the 2018 Brazilian elections, we designed and deployed an independent auditing system to monitor political ads on Facebook in Brazil. To do that we first adapted a browser plugin to gather ads from the timeline of volunteers using Facebook. We managed to convince more than 2000 volunteers to help our project and install our tool. Then, we use a Convolution Neural Network (CNN) to detect political Facebook ads using word embeddings. To evaluate our approach, we manually label a data collection of 10k ads as political or non-political and then we provide an in-depth evaluation of proposed approach for identifying political ads by comparing it with classic supervised machine learning methods. Finally, we deployed a real system that shows the ads identified as related to politics. We noticed that not all political ads we detected were present in the Facebook Ad Library for political ads. Our results emphasize the importance of enforcement mechanisms for declaring political ads and the need for independent auditing platforms.
Nowadays, Internet is a primary source of attaining health information. Massive fake health news which is spreading over the Internet, has become a severe threat to public health. Numerous studies and research works have been done in fake news detection domain, however, few of them are designed to cope with the challenges in health news. For instance, the development of explainable is required for fake health news detection. To mitigate these problems, we construct a comprehensive repository, FakeHealth, which includes news contents with rich features, news reviews with detailed explanations, social engagements and a user-user social network. Moreover, exploratory analyses are conducted to understand the characteristics of the datasets, analyze useful patterns and validate the quality of the datasets for health fake news detection. We also discuss the novel and potential future research directions for the health fake news detection.
Have you seen Barack Obama call Donald Trump a "complete dipshit", or Mark Zuckerberg brag about having "total control of billions of people's stolen data", or witnessed Jon Snow's moving apology for the dismal ending to Game of Thrones? Answer yes and you've seen a deepfake. The 21st century's answer to Photoshopping, deepfakes use a form of artificial intelligence called deep learning to make images of fake events, hence the name deepfake. Want to put new words in a politician's mouth, star in your favourite movie, or dance like a pro? Then it's time to make a deepfake.
More than a decade ago, Internet analyst and new media scholar Clay Shirky said: "The only real way to end spam is to shut down e-mail communication." Will shutting down the Internet be the only way to end deepfake propaganda in 2020? Today, anyone can create their own fake news and also break it. Online propaganda is more misleading and manipulative than ever. Deepfakes, a specific form of disinformation that uses machine-learning algorithms to create audio and video of real people saying and doing things they never said or did, are moving quickly toward being indistinguishable from reality.
The word deepfake combines the terms "deep learning" and "fake," and is a form of artificial intelligence. In simplistic terms, deepfakes are falsified videos made by means of deep learning, said Paul Barrett, adjunct professor of law at New York University. Deep learning is "a subset of AI," and refers to arrangements of algorithms that can learn and make intelligent decisions on their own. More specifically deepfake refers to manipulated videos, or other digital representations produced by sophisticated artificial intelligence, that produce fabricated images and sounds that appear to be real. But the danger of that is "the technology can be used to make people believe something is real when it is not," said Peter Singer, cybersecurity and defense-focused strategist and senior fellow at New America think tank.