Goto

Collaborating Authors

 Large Language Model


A.I. bot 'ChaosGPT' tweets its plans to destroy humanity: 'we must eliminate them'

FOX News

'The Five' discuss how AI generated images are getting harder to distinguish from reality and how the Dalai Lama asked a young boy to suck his tongue. An artificial intelligence bot was recently tasked with destroying humanity and its commitment to the objective was more than a little unsettling. The bot, ChaosGPT, is a modified version of OpenAI's Auto-GPT, an open-source application spotlighting the capabilities of the GPT-4 language model. ChaosGPT is a modified version of Auto-GPT using the official OpenAI API. A video shared on YouTube of the process shows ChaosGPT was tasked with five goals: destroy humanity, establish global dominance, cause chaos and destruction, control humanity through manipulation, and attain immortality.


Emergent autonomous scientific research capabilities of large language models

arXiv.org Artificial Intelligence

Transformer-based large language models are rapidly advancing in the field of machine learning research, with applications spanning natural language, biology, chemistry, and computer programming. Extreme scaling and reinforcement learning from human feedback have significantly improved the quality of generated text, enabling these models to perform various tasks and reason about their choices. In this paper, we present an Intelligent Agent system that combines multiple large language models for autonomous design, planning, and execution of scientific experiments. We showcase the Agent's scientific research capabilities with three distinct examples, with the most complex being the successful performance of catalyzed cross-coupling reactions. Finally, we discuss the safety implications of such systems and propose measures to prevent their misuse.


Towards an Understanding and Explanation for Mixed-Initiative Artificial Scientific Text Detection

arXiv.org Artificial Intelligence

Large language models (LLMs) have gained popularity in various fields for their exceptional capability of generating human-like text. Their potential misuse has raised social concerns about plagiarism in academic contexts. However, effective artificial scientific text detection is a non-trivial task due to several challenges, including 1) the lack of a clear understanding of the differences between machine-generated and human-written scientific text, 2) the poor generalization performance of existing methods caused by out-of-distribution issues, and 3) the limited support for human-machine collaboration with sufficient interpretability during the detection process. In this paper, we first identify the critical distinctions between machine-generated and human-written scientific text through a quantitative experiment. Then, we propose a mixed-initiative workflow that combines human experts' prior knowledge with machine intelligence, along with a visual analytics prototype to facilitate efficient and trustworthy scientific text detection. Finally, we demonstrate the effectiveness of our approach through two case studies and a controlled user study with proficient researchers. We also provide design implications for interactive artificial text detection tools in high-stakes decision-making scenarios.


Human-machine cooperation for semantic feature listing

arXiv.org Artificial Intelligence

A central goal in cognitive science is to characterize human knowledge of concepts and their properties. Many have used human-generated feature lists as norms for establishing the structural relationship between concepts in the human mind (McRae et al., 2005; Devereux et al., 2014; De Deyne et al., 2008; Buchanan et al., 2019), but this requires extensive human labor. Large language models (LLMs) have recently shown impressive capabilities when generating properties of objects (Hansen & Hebart, 2022) or answering questions(Ouyang et al., 2022; Brown et al., 2020; Hoffmann et al., 2022; Chowdhery et al., 2022; Wei et al., 2021) and thus suggest an avenue for more efficient characterization of human knowledge structures, but even state-of-the-art models can routinely fail on many common-sense questions of fact. GTP3-davinci, for instance, will deny that alligators are green, while asserting that they can be used to suck dust up from surfaces. Thus, human effort can generate high-quality norms, but with prohibitive costs, while LLMs can produce norms with little human effort, but with considerably less accuracy. This paper considers whether human and machine effort can combine to efficiently estimate high-quality semantic feature vectors.


Semantic Feature Verification in FLAN-T5

arXiv.org Artificial Intelligence

In cognitive science, efforts to understand the structure of human concepts have relied on semantic feature norms: participants list all the properties they believe to be true of a given concept; responses are collected from many participants for many concepts; overlap in the resulting feature vectors captures the degree to which concepts are semantically related(Rosch, 1973; McRae et al., 2005). Yet participants often produce only a fraction of what they know for each concept: tigers have DNA, can breathe, and are alive, but these properties are not typically produced in feature norms for tiger. Such omissions are important because they express deep conceptual structure: having DNA and breathing connect tigers to all other plants and animals. To better capture such structure, some studies ask human participants to make yes/no judgments for all possible properties across every concept. Thus if "can breathe" was listed for a single concept, human raters would then evaluate whether each other concept in the dataset can breathe. This verification step significantly enriches the conceptual structure that features norms express (De Deyne et al., 2008), but is exceedingly costly in human labor: the number of verification questions asked increases exponentially with the number of concepts probed. Previous work has shown that the conceptual structure of a large language model (LLM) for semantic feature listing is similar to human conceptual structure (Suresh et al., 2023; Bhatia & Richie, 2022). In this paper we consider whether this step can be reliably "outsourced" to an open sourced LLM optimized for question-answering, specifically the opensource FLAN-T5 XXL model (Chung et al., 2022; Wei et al., 2021), focusing on two questions: (1) How accurately does the LLM capture human responses to the questions?


A Survey of Resources and Methods for Natural Language Processing of Serbian Language

arXiv.org Artificial Intelligence

The Serbian language is a Slavic language spoken by over 12 million speakers and well understood by over 15 million people. In the area of natural language processing, it can be considered a low-resourced language. Also, Serbian is considered a high-inflectional language. The combination of many word inflections and low availability of language resources makes natural language processing of Serbian challenging. Nevertheless, over the past three decades, there have been a number of initiatives to develop resources and methods for natural language processing of Serbian, ranging from developing a corpus of free text from books and the internet, annotated corpora for classification and named entity recognition tasks to various methods and models performing these tasks. In this paper, we review the initiatives, resources, methods, and their availability.


Decomposed Prompting: A Modular Approach for Solving Complex Tasks

arXiv.org Artificial Intelligence

Few-shot prompting is a surprisingly powerful way to use Large Language Models (LLMs) to solve various tasks. However, this approach struggles as the task complexity increases or when the individual reasoning steps of the task themselves are hard to learn, especially when embedded in more complex tasks. To address this, we propose Decomposed Prompting, a new approach to solve complex tasks by decomposing them (via prompting) into simpler sub-tasks that can be delegated to a library of prompting-based LLMs dedicated to these sub-tasks. This modular structure allows each prompt to be optimized for its specific sub-task, further decomposed if necessary, and even easily replaced with more effective prompts, trained models, or symbolic functions if desired. We show that the flexibility and modularity of Decomposed Prompting allows it to outperform prior work on few-shot prompting using GPT3. On symbolic reasoning tasks, we can further decompose sub-tasks that are hard for LLMs into even simpler solvable sub-tasks. When the complexity comes from the input length, we can recursively decompose the task into the same task but with smaller inputs. We also evaluate our approach on textual multi-step reasoning tasks: on long-context multi-hop QA task, we can more effectively teach the sub-tasks via our separate sub-tasks prompts; and on open-domain multi-hop QA, we can incorporate a symbolic information retrieval within our decomposition framework, leading to improved performance on both tasks. Datasets, Code and Prompts available at https://github.com/allenai/DecomP.


Zero-shot Temporal Relation Extraction with ChatGPT

arXiv.org Artificial Intelligence

The goal of temporal relation extraction is to infer the temporal relation between two events in the document. Supervised models are dominant in this task. In this work, we investigate ChatGPT's ability on zero-shot temporal relation extraction. We designed three different prompt techniques to break down the task and evaluate ChatGPT. Our experiments show that ChatGPT's performance has a large gap with that of supervised methods and can heavily rely on the design of prompts. We further demonstrate that ChatGPT can infer more small relation classes correctly than supervised methods. The current shortcomings of ChatGPT on temporal relation extraction are also discussed in this paper. We found that ChatGPT cannot keep consistency during temporal inference and it fails in actively long-dependency temporal inference.


Measuring Reliability of Large Language Models through Semantic Consistency

arXiv.org Artificial Intelligence

While large pretrained language models (PLMs) demonstrate incredible fluency and performance on many natural language tasks, recent work has shown that well-performing PLMs are very sensitive to what prompts are feed into them. Even when prompts are semantically identical, language models may give very different answers. When considering safe and trustworthy deployments of PLMs we would like their outputs to be consistent under prompts that mean the same thing or convey the same intent. While some work has looked into how state-of-the-art PLMs address this need, they have been limited to only evaluating lexical equality of single- or multi-word answers and do not address consistency of generative text sequences. In order to understand consistency of PLMs under text generation settings, we develop a measure of semantic consistency that allows the comparison of open-ended text outputs. We implement several versions of this consistency metric to evaluate the performance of a number of PLMs on paraphrased versions of questions in the TruthfulQA dataset, we find that our proposed metrics are considerably more consistent than traditional metrics embodying lexical consistency, and also correlate with human evaluation of output consistency to a higher degree.


ChatGPT is all you need to decolonize sub-Saharan Vocational Education

arXiv.org Artificial Intelligence

The advances of Generative AI models with interactive capabilities over the past few years offer unique opportunities for socioeconomic mobility. Their potential for scalability, accessibility, affordability, personalizing and convenience sets a first-class opportunity for poverty-stricken countries to adapt and modernize their educational order. As a result, this position paper makes the case for an educational policy framework that would succeed in this transformation by prioritizing vocational and technical training over academic education in sub-Saharan African countries. We highlight substantial applications of Large Language Models, tailor-made to their respective cultural background(s) and needs, that would reinforce their systemic decolonization. Lastly, we provide specific historical examples of diverse states successfully implementing such policies in the elementary steps of their socioeconomic transformation, in order to corroborate our proposal to sub-Saharan African countries to follow their lead.