Large Language Model
The Download: watermarking AI text, and freezing eggs
What's happened: A new method could help us to spot AI-generated texts. Watermarking buries hidden patterns in the text that are invisible to the human eye, but lets computers detect that the text probably comes from an AI system or a human. Why it matters: ChatGPT is one of a new breed of large language models that generate fluent text that reads like a human could have written it. These AI models regurgitate facts confidently, but are notorious for spewing falsehoods, which makes it worrying that they're already being adopted for everything from essays to workout plans. To the untrained eye, it is almost impossible to detect whether a passage is written by an AI model or human.
Accident claims: How AI can save consumers billions
Artificial Intelligence is taking the world by storm, especially with the latest tech hit, ChatGPT, going viral. If you haven't heard of or tried ChatGPT, it's a highly advanced artificial intelligence tool that you can use to ask any question, and in return, it will provide you with an intelligent, well-written answer back -- all within seconds. From writing essays to typing up software code, producing song lyrics to doing advanced calculations and more -- ChatGPT is fast becoming what Google was for search when it launched back in 1998. ChatGPT, though, is not the only advanced AI tool being developed. Tech giants, ranging from Google to Amazon, are also playing in this space and we can expect to see more tools hit the market in the months and years to come. However, there is also a more unseen, practical side of AI that has the potential to save hours of time and money for several industries and consumers.
Big Tech was moving cautiously on AI. Then came ChatGPT.
The company kept rolling out state-of-the-art technology that propelled the entire field forward, deploying some AI breakthroughs in understanding language to improve Google search. Inside big tech companies, the system of checks and balances for vetting the ethical implications of cutting-edge AI isn't as established as privacy or data security. Typically teams of AI researchers and engineers publish papers on their findings, incorporate their technology into the company's existing infrastructure or develop new products, a process that can sometimes clash with other teams working on responsible AI over pressure to see innovation reach the public sooner.
[2301.11305] DetectGPT: Zero-Shot Machine-Generated Text Detection using Probability Curvature
The fluency and factual knowledge of large language models (LLMs) heightens the need for corresponding systems to detect whether a piece of text is machine-written. For example, students may use LLMs to complete written assignments, leaving instructors unable to accurately assess student learning. In this paper, we first demonstrate that text sampled from an LLM tends to occupy negative curvature regions of the model's log probability function. Leveraging this observation, we then define a new curvature-based criterion for judging if a passage is generated from a given LLM. This approach, which we call DetectGPT, does not require training a separate classifier, collecting a dataset of real or generated passages, or explicitly watermarking generated text. It uses only log probabilities computed by the model of interest and random perturbations of the passage from another generic pre-trained language model (e.g, T5). We find DetectGPT is more discriminative than existing zero-shot methods for model sample detection, notably improving detection of fake news articles generated by 20B parameter GPT-NeoX from 0.81 AUROC for the strongest zero-shot baseline to 0.95 AUROC for DetectGPT. See https://ericmitchell.ai/detectgpt for code, data, and other project information.
5 Ways Conversational AI Can Transform Your Business
ChatGPT is the first representative of a coming wave of practical, user-friendly AI apps known as "conversational AI." It was just released to the public a few weeks ago, and people are going nuts over it because they're starting to understand how impactful it and others are going to be over the coming years.
Investigating the use of ChatGPT for the scheduling of construction projects
Prieto, Samuel A., Mengiste, Eyob T., de Soto, Borja Garcรญa
Large language models such as ChatGPT have the potential to revolutionize the construction industry by automating repetitive and time-consuming tasks. This paper presents a study in which ChatGPT was used to generate a construction schedule for a simple construction project. The output from ChatGPT was evaluated by a pool of participants that provided feedback regarding their overall interaction experience and the quality of the output. The results show that ChatGPT can generate a coherent schedule that follows a logical approach to fulfill the requirements of the scope indicated. The participants had an overall positive interaction experience and indicated the great potential of such a tool to automate many preliminary and time-consuming tasks. However, the technology still has limitations, and further development is needed before it can be widely adopted in the industry. Overall, this study highlights the potential of using large language models in the construction industry and the need for further research. Keywords: Natural Language Processing, ChatGPT, Scheduling, Generative Pre-training Transformer, Project Management, Construction 5.0, GPT 3.5 1 Introduction Natural Language Processing (NLP) combines areas such as linguistics, computer science, and Artificial Intelligence (AI) and focuses on the interaction between computers and humans using programs that are developed from large natural language data [1]. Selected applications of NLP in the construction industry include (1) Extracting information from construction documents: NLP techniques can extract relevant information, such as specifications, plans, and contracts, and convert it into a structured format that can be quickly processed by computers [2].
Context Matters: A Strategy to Pre-train Language Model for Science Education
Liu, Zhengliang, He, Xinyu, Liu, Lei, Liu, Tianming, Zhai, Xiaoming
This study aims at improving the performance of scoring student responses in science education automatically. BERT-based language models have shown significant superiority over traditional NLP models in various language-related tasks. However, science writing of students, including argumentation and explanation, is domain-specific. In addition, the language used by students is different from the language in journals and Wikipedia, which are training sources of BERT and its existing variants. All these suggest that a domain-specific model pre-trained using science education data may improve model performance. However, the ideal type of data to contextualize pre-trained language model and improve the performance in automatically scoring student written responses remains unclear. Therefore, we employ different data in this study to contextualize both BERT and SciBERT models and compare their performance on automatic scoring of assessment tasks for scientific argumentation. We use three datasets to pre-train the model: 1) journal articles in science education, 2) a large dataset of students' written responses (sample size over 50,000), and 3) a small dataset of students' written responses of scientific argumentation tasks. Our experimental results show that in-domain training corpora constructed from science questions and responses improve language model performance on a wide variety of downstream tasks. Our study confirms the effectiveness of continual pre-training on domain-specific data in the education domain and demonstrates a generalizable strategy for automating science education tasks with high accuracy. We plan to release our data and SciEdBERT models for public use and community engagement.
Down the Rabbit Hole: Detecting Online Extremism, Radicalisation, and Politicised Hate Speech
Govers, Jarod, Feldman, Philip, Dant, Aaron, Patros, Panos
Social media is a modern person's digital voice to project and engage with new ideas and mobilise communities $\unicode{x2013}$ a power shared with extremists. Given the societal risks of unvetted content-moderating algorithms for Extremism, Radicalisation, and Hate speech (ERH) detection, responsible software engineering must understand the who, what, when, where, and why such models are necessary to protect user safety and free expression. Hence, we propose and examine the unique research field of ERH context mining to unify disjoint studies. Specifically, we evaluate the start-to-finish design process from socio-technical definition-building and dataset collection strategies to technical algorithm design and performance. Our 2015-2021 51-study Systematic Literature Review (SLR) provides the first cross-examination of textual, network, and visual approaches to detecting extremist affiliation, hateful content, and radicalisation towards groups and movements. We identify consensus-driven ERH definitions and propose solutions to existing ideological and geographic biases, particularly due to the lack of research in Oceania/Australasia. Our hybridised investigation on Natural Language Processing, Community Detection, and visual-text models demonstrates the dominating performance of textual transformer-based algorithms. We conclude with vital recommendations for ERH context mining researchers and propose an uptake roadmap with guidelines for researchers, industries, and governments to enable a safer cyberspace.
Truth Machines: Synthesizing Veracity in AI Language Models
Munn, Luke, Magee, Liam, Arora, Vanicka
University of Stirling, United Kingdom vanicka.arora@stir.ac.uk Abstract As AI technologies are rolled out into healthcare, academia, human resources, law, and a multitude of other domains, they become de-facto arbiters of truth. But truth is highly contested, with many different definitions and approaches. It then investigates the production of truth in InstructGPT, a large language model, highlighting how data harvesting, model architectures, and social feedback mechanisms weave together disparate understandings of veracity. It conceptualizes this performance as an operationalization of truth, where distinct, often conflicting claims are smoothly synthesized and confidently presented into truth-statements. We argue that these same logics and inconsistencies play out in Instruct's successor, ChatGPT, reiterating truth as a non-trivial problem. We suggest that enriching sociality and thickening "reality" are two promising vectors for enhancing the truth-evaluating capacities of future language models. We conclude, however, by stepping back to consider AI truth-telling as a social practice: what kind of "truth" do we as listeners desire? OpenAI's latest language model appeared to We stress then that truth in AI is not just technical but be powerful and almost magical, generating news articles, also social, cultural, and political, drawing on particular writing poetry, and explaining arcane concepts norms and values. But a week later, the coding the technical matters: translating truth theories into site StackOverflow banned all answers produced actionable architectures and processes updates them by the model. "The primary problem," explained in significant ways. These disparate sociotechnical the staff, "is that while the answers which ChatGPT forces coalesce into a final AI model which purports produces have a high rate of being incorrect, they typically to tell the truth--and in doing so, our understanding look like they might be good and the answers of "truth" is remade. "The ideal of truth is a fallacy are very easy to produce" (Vincent 2022). For a site for semantic interpretation and needs to be changed," aiming to provide correct answers to coding problems, suggested two AI researchers (Welty and Aroyo 2015).
PAL: Program-aided Language Models
Gao, Luyu, Madaan, Aman, Zhou, Shuyan, Alon, Uri, Liu, Pengfei, Yang, Yiming, Callan, Jamie, Neubig, Graham
Large language models (LLMs) have recently demonstrated an impressive ability to perform arithmetic and symbolic reasoning tasks, when provided with a few examples at test time ("few-shot prompting"). Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem. While LLMs seem to be adept at this sort of step-by-step decomposition, LLMs often make logical and arithmetic mistakes in the solution part, even when the problem is decomposed correctly. In this paper, we present Program-Aided Language models (PAL): a novel approach that uses the LLM to read natural language problems and generate programs as the intermediate reasoning steps, but offloads the solution step to a runtime such as a Python interpreter. With PAL, decomposing the natural language problem into runnable steps remains the only learning task for the LLM, while solving is delegated to the interpreter. We demonstrate this synergy between a neural LLM and a symbolic interpreter across 13 mathematical, symbolic, and algorithmic reasoning tasks from BIG-Bench Hard and other benchmarks. In all these natural language reasoning tasks, generating code using an LLM and reasoning using a Python interpreter leads to more accurate results than much larger models. For example, PAL using Codex achieves state-of-the-art few-shot accuracy on the GSM8K benchmark of math word problems, surpassing PaLM-540B which uses chain-of-thought by absolute 15% top-1. Our code and data are publicly available at http://reasonwithpal.com/ .