Large Language Model
'Existential catastrophe' from AI is likely unavoidable, DeepMind researcher warns
Researchers from the University of Oxford and Google's artificial intelligence division DeepMind have claimed that there is a high probability of advanced forms of AI becoming "existentially dangerous to life on Earth". In a recent article in the peer-reviewed journal AI Magazine, the researchers warned that there would be "catastrophic consequences" if the development of certain AI agents continues. Leading philosphers like Oxford University's Nick Bostrom have previously spoken of the threat posed by advanced forms of artificial intelligence, though one of authors of the new paper claimed such warnings did not go far enough.
Natural Language Inference Prompts for Zero-shot Emotion Classification in Text across Corpora
Plaza-del-Arco, Flor Miriam, Martín-Valdivia, María-Teresa, Klinger, Roman
Within textual emotion classification, the set of relevant labels depends on the domain and application scenario and might not be known at the time of model development. This conflicts with the classical paradigm of supervised learning in which the labels need to be predefined. A solution to obtain a model with a flexible set of labels is to use the paradigm of zero-shot learning as a natural language inference task, which in addition adds the advantage of not needing any labeled training data. This raises the question how to prompt a natural language inference model for zero-shot learning emotion classification. Options for prompt formulations include the emotion name anger alone or the statement "This text expresses anger". With this paper, we analyze how sensitive a natural language inference-based zero-shot-learning classifier is to such changes to the prompt under consideration of the corpus: How carefully does the prompt need to be selected? We perform experiments on an established set of emotion datasets presenting different language registers according to different sources (tweets, events, blogs) with three natural language inference models and show that indeed the choice of a particular prompt formulation needs to fit to the corpus. We show that this challenge can be tackled with combinations of multiple prompts. Such ensemble is more robust across corpora than individual prompts and shows nearly the same performance as the individual best prompt for a particular corpus.
CommunityLM: Probing Partisan Worldviews from Language Models
Jiang, Hang, Beeferman, Doug, Roy, Brandon, Roy, Deb
As political attitudes have diverged ideologically in the United States, political speech has diverged lingusitically. The ever-widening polarization between the US political parties is accelerated by an erosion of mutual understanding between them. We aim to make these communities more comprehensible to each other with a framework that probes community-specific responses to the same survey questions using community language models CommunityLM. In our framework we identify committed partisan members for each community on Twitter and fine-tune LMs on the tweets authored by them. We then assess the worldviews of the two groups using prompt-based probing of their corresponding LMs, with prompts that elicit opinions about public figures and groups surveyed by the American National Election Studies (ANES) 2020 Exploratory Testing Survey. We compare the responses generated by the LMs to the ANES survey results, and find a level of alignment that greatly exceeds several baseline methods. Our work aims to show that we can use community LMs to query the worldview of any group of people given a sufficiently large sample of their social media discussions or media diet.
Pan More Gold from the Sand: Refining Open-domain Dialogue Training with Noisy Self-Retrieval Generation
Wang, Yihe, Li, Yitong, Wang, Yasheng, Mi, Fei, Zhou, Pingyi, Wang, Xin, Liu, Jin, Jiang, Xin, Liu, Qun
Real human conversation data are complicated, heterogeneous, and noisy, from which building open-domain dialogue systems remains a challenging task. In fact, such dialogue data still contains a wealth of information and knowledge, however, they are not fully explored. In this paper, we show existing open-domain dialogue generation methods that memorize context-response paired data with autoregressive or encode-decode language models underutilize the training data. Different from current approaches, using external knowledge, we explore a retrieval-generation training framework that can take advantage of the heterogeneous and noisy training data by considering them as "evidence". In particular, we use BERTScore for retrieval, which gives better qualities of the evidence and generation. Experiments over publicly available datasets demonstrate that our method can help models generate better responses, even such training data are usually impressed as low-quality data. Such performance gain is comparable with those improved by enlarging the training set, even better. We also found that the model performance has a positive correlation with the relevance of the retrieved evidence. Moreover, our method performed well on zero-shot experiments, which indicates that our method can be more robust to real-world data.
Google Deepmind Researcher Co-Authors Paper Saying AI Will Eliminate Humanity
Update: After publication, Google said in an email that this work was not done as part of co-author Marcus Hutter's work at DeepMind--rather, under his position at Australian National University--and that the DeepMind affiliation listed in the journal was an "error." Google sent the following statement: "DeepMind was not involved in this work and the paper's authors have requested corrections to reflect this. There are a wide range of views and academic interests at DeepMind, and many on our team also hold university professorships and pursue academic research separate to their work at DeepMind, through their university affiliations. While DeepMind was not involved in this work, we think deeply about the safety, ethics and wider societal impacts of AI and research and develop AI models that are safe, effective and aligned with human values. Alongside pursuing opportunities where AI can unlock widespread societal benefit, we also invest equal efforts in guarding against harmful uses.""
Google's DeepMind Has a Long-term Goal of Artificial General Intelligence
When DeepMind, an Alphabet subsidiary, started off more than a decade ago, solving some most pressing research questions and problems with AI wasn't at the top of the company's mind. Instead, the company started off AI research with computer games. Every score and win was a measuring stick of success, and pointed to DeepMind's AI going in the right direction. "Five years ago, we conquered the game of Go. This was a great moment," said Colin Murdoch, the chief business officer, during a fireside chat on Tuesday at the AI Hardware Summit being held in Santa Clara, California.
PainPoints: A Framework for Language-based Detection of Chronic Pain and Expert-Collaborative Text-Summarization
Fadnavis, Shreyas, Dhurandhar, Amit, Norel, Raquel, Reinen, Jenna M, Agurto, Carla, Secchettin, Erica, Schweiger, Vittorio, Perini, Giovanni, Cecchi, Guillermo
Chronic pain is a pervasive disorder which is often very disabling and is associated with comorbidities such as depression and anxiety. Neuropathic Pain (NP) is a common sub-type which is often caused due to nerve damage and has a known pathophysiology. Another common sub-type is Fibromyalgia (FM) which is described as musculoskeletal, diffuse pain that is widespread through the body. The pathophysiology of FM is poorly understood, making it very hard to diagnose. Standard medications and treatments for FM and NP differ from one another and if misdiagnosed it can cause an increase in symptom severity. To overcome this difficulty, we propose a novel framework, PainPoints, which accurately detects the sub-type of pain and generates clinical notes via summarizing the patient interviews. Specifically, PainPoints makes use of large language models to perform sentence-level classification of the text obtained from interviews of FM and NP patients with a reliable AUC of 0.83. Using a sufficiency-based interpretability approach, we explain how the fine-tuned model accurately picks up on the nuances that patients use to describe their pain. Finally, we generate summaries of these interviews via expert interventions by introducing a novel facet-based approach. PainPoints thus enables practitioners to add/drop facets and generate a custom summary based on the notion of "facet-coverage" which is also introduced in this work.
Can Offline Reinforcement Learning Help Natural Language Understanding?
Zhang, Ziqi, Wang, Yile, Zhang, Yue, Wang, Donglin
Pre-training has been a useful method for learning implicit transferable knowledge and it shows the benefit of offering complementary features across different modalities. Recent work mainly focuses on the modalities such as image and text, for example, studies show that visual features learned from images can help visual-grounded language understanding. In this paper, we consider investigating the potential connection between offline reinforcement learning (RL) and language modeling (LM). Intuitively, RL and LM are similar in predicting the next states based on the current and previous states, which rely on both local and long-range dependency across states. To validate such an assumption, we pre-trained different offline RL tasks using Transformer and then evaluate these models on various language-related tasks. Experimental results show that our RL pre-trained models can give close performance compared with the models using the LM training objective, showing that there exist common useful features across these two modalities. To further explore the potential relationship, we investigate some factors such as Markov property and the sequential nature of RL trajectory.
Out of One, Many: Using Language Models to Simulate Human Samples
Argyle, Lisa P., Busby, Ethan C., Fulda, Nancy, Gubler, Joshua, Rytting, Christopher, Wingate, David
We propose and explore the possibility that language models can be studied as effective proxies for specific human sub-populations in social science research. Practical and research applications of artificial intelligence tools have sometimes been limited by problematic biases (such as racism or sexism), which are often treated as uniform properties of the models. We show that the "algorithmic bias" within one such tool -- the GPT-3 language model -- is instead both fine-grained and demographically correlated, meaning that proper conditioning will cause it to accurately emulate response distributions from a wide variety of human subgroups. We term this property "algorithmic fidelity" and explore its extent in GPT-3. We create "silicon samples" by conditioning the model on thousands of socio-demographic backstories from real human participants in multiple large surveys conducted in the United States. We then compare the silicon and human samples to demonstrate that the information contained in GPT-3 goes far beyond surface similarity. It is nuanced, multifaceted, and reflects the complex interplay between ideas, attitudes, and socio-cultural context that characterize human attitudes. We suggest that language models with sufficient algorithmic fidelity thus constitute a novel and powerful tool to advance understanding of humans and society across a variety of disciplines.
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Xie, Yujia, Zhou, Luowei, Dai, Xiyang, Yuan, Lu, Bach, Nguyen, Liu, Ce, Zeng, Michael
People say, "A picture is worth a thousand words". Then how can we get the rich information out of the image? We argue that by using visual clues to bridge large pretrained vision foundation models and language models, we can do so without any extra cross-modal training. Thanks to the strong zero-shot capability of foundation models, we start by constructing a rich semantic representation of the image (e.g., image tags, object attributes / locations, captions) as a structured textual prompt, called visual clues, using a vision foundation model. Based on visual clues, we use large language model to produce a series of comprehensive descriptions for the visual content, which is then verified by the vision model again to select the candidate that aligns best with the image. We evaluate the quality of generated descriptions by quantitative and qualitative measurement. The results demonstrate the effectiveness of such a structured semantic representation.