Goto

Collaborating Authors

 Large Language Model


ChatGPT Is More Likely to Be Perceived as Male Than Female

arXiv.org Artificial Intelligence

The two authors contributed equally to this work. Data, analysis code, and additional materials will be openly available at the project's Open Science Framework page if it is accepted for publication by a journal. We have no conflicts of interest to disclose. Correspondence concerning this article should be addressed to Jin Kim (Advanced Institute of Business, Tongji University, Shanghai, China. PERCEIVED GENDER OF CHATGPT 2 Abstract We investigate how people perceive ChatGPT, and, in particular, how they assign human-like attributes such as gender to the chatbot. Across five pre-registered studies (N = 1,552), we find that people are more likely to perceive ChatGPT to be male than female. Specifically, people perceive male gender identity (1) following demonstrations of ChatGPT's core abilities (e.g., providing information or summarizing text), (2) in the absence of such demonstrations, and (3) across different methods of eliciting perceived gender (using various scales and asking to name ChatGPT). Moreover, we find that this seemingly default perception of ChatGPT as male can reverse when ChatGPT's feminine-coded abilities are highlighted (e.g., providing emotional support for a user).


GPT Paternity Test: GPT Generated Text Detection with GPT Genetic Inheritance

arXiv.org Artificial Intelligence

Large Language Models (LLMs) can generate texts that carry the risk of various misuses, including plagiarism, planting fake reviews on e-commerce platforms, or creating fake social media postings that can sway election results. Detecting whether a text is machine-generated has thus become increasingly important. While machine-learning-based detection strategies exhibit superior performance, they often lack generalizability, limiting their practicality. In this work, we introduce GPT Paternity Test (GPT-Pat), which reliably detects machine-generated text across varied datasets. Given a text under scrutiny, we leverage ChatGPT to generate a corresponding question and provide a re-answer to the question. By comparing the similarity between the original text and the generated re-answered text, it can be determined whether the text is machine-generated. GPT-Pat consists of a Siamese network to compute the similarity between the original text and the generated re-answered text and a binary classifier. Our method achieved an average accuracy of 94.57% on four generalization test sets, surpassing the state-of-the-art RoBERTa-based method by 12.34%. The accuracy drop of our method is only about half of that of the RoBERTa-based method when it is attacked by re-translation and polishing.


Augmenting Autotelic Agents with Large Language Models

arXiv.org Artificial Intelligence

Humans learn to master open-ended repertoires of skills by imagining and practicing their own goals. This autotelic learning process, literally the pursuit of self-generated (auto) goals (telos), becomes more and more open-ended as the goals become more diverse, abstract and creative. The resulting exploration of the space of possible skills is supported by an inter-individual exploration: goal representations are culturally evolved and transmitted across individuals, in particular using language. Current artificial agents mostly rely on predefined goal representations corresponding to goal spaces that are either bounded (e.g. list of instructions), or unbounded (e.g. the space of possible visual inputs) but are rarely endowed with the ability to reshape their goal representations, to form new abstractions or to imagine creative goals. In this paper, we introduce a language model augmented autotelic agent (LMA3) that leverages a pretrained language model (LM) to support the representation, generation and learning of diverse, abstract, human-relevant goals. The LM is used as an imperfect model of human cultural transmission; an attempt to capture aspects of humans' common-sense, intuitive physics and overall interests. Specifically, it supports three key components of the autotelic architecture: 1)~a relabeler that describes the goals achieved in the agent's trajectories, 2)~a goal generator that suggests new high-level goals along with their decomposition into subgoals the agent already masters, and 3)~reward functions for each of these goals. Without relying on any hand-coded goal representations, reward functions or curriculum, we show that LMA3 agents learn to master a large diversity of skills in a task-agnostic text-based environment.


X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4, based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous visual language models. We attribute this to the use of more advanced LLMs compared with previous multimodal models. Unfortunately, the model architecture and training strategies of GPT-4 are unknown. To endow LLMs with multimodal capabilities, we propose X-LLM, which converts Multi-modalities (images, speech, videos) into foreign languages using X2L interfaces and inputs them into a large Language model (ChatGLM). Specifically, X-LLM aligns multiple frozen single-modal encoders and a frozen LLM using X2L interfaces, where ``X'' denotes multi-modalities such as image, speech, and videos, and ``L'' denotes languages. X-LLM's training consists of three stages: (1) Converting Multimodal Information: The first stage trains each X2L interface to align with its respective single-modal encoder separately to convert multimodal information into languages. (2) Aligning X2L representations with the LLM: single-modal encoders are aligned with the LLM through X2L interfaces independently. (3) Integrating multiple modalities: all single-modal encoders are aligned with the LLM through X2L interfaces to integrate multimodal capabilities into the LLM. Our experiments show that X-LLM demonstrates impressive multimodel chat abilities, sometimes exhibiting the behaviors of multimodal GPT-4 on unseen images/instructions, and yields a 84.5\% relative score compared with GPT-4 on a synthetic multimodal instruction-following dataset. And we also conduct quantitative tests on using LLM for ASR and multimodal ASR, hoping to promote the era of LLM-based speech recognition.


Reflective Linguistic Programming (RLP): A Stepping Stone in Socially-Aware AGI (SocialAGI)

arXiv.org Artificial Intelligence

This paper presents Reflective Linguistic Programming (RLP), a unique approach to conversational AI that emphasizes self-awareness and strategic planning. RLP encourages models to introspect on their own predefined personality traits, emotional responses to incoming messages, and planned strategies, enabling contextually rich, coherent, and engaging interactions. A striking illustration of RLP's potential involves a toy example, an AI persona with an adversarial orientation, a demon named `Bogus' inspired by the children's fairy tale Hansel & Gretel. Bogus exhibits sophisticated behaviors, such as strategic deception and sensitivity to user discomfort, that spontaneously arise from the model's introspection and strategic planning. These behaviors are not pre-programmed or prompted, but emerge as a result of the model's advanced cognitive modeling. The potential applications of RLP in socially-aware AGI (Social AGI) are vast, from nuanced negotiations and mental health support systems to the creation of diverse and dynamic AI personas. Our exploration of deception serves as a stepping stone towards a new frontier in AGI, one filled with opportunities for advanced cognitive modeling and the creation of truly human `digital souls'.


On the Limitations of Simulating Active Learning

arXiv.org Artificial Intelligence

Active learning (AL) is a human-and-model-in-the-loop paradigm that iteratively selects informative unlabeled data for human annotation, aiming to improve over random sampling. However, performing AL experiments with human annotations on-the-fly is a laborious and expensive process, thus unrealistic for academic research. An easy fix to this impediment is to simulate AL, by treating an already labeled and publicly available dataset as the pool of unlabeled data. In this position paper, we first survey recent literature and highlight the challenges across all different steps within the AL loop. We further unveil neglected caveats in the experimental setup that can significantly affect the quality of AL research. We continue with an exploration of how the simulation setting can govern empirical findings, arguing that it might be one of the answers behind the ever posed question ``why do active learning algorithms sometimes fail to outperform random sampling?''. We argue that evaluating AL algorithms on available labeled datasets might provide a lower bound as to their effectiveness in real data. We believe it is essential to collectively shape the best practices for AL research, particularly as engineering advancements in LLMs push the research focus towards data-driven approaches (e.g., data efficiency, alignment, fairness). In light of this, we have developed guidelines for future work. Our aim is to draw attention to these limitations within the community, in the hope of finding ways to address them.


Infor-Coef: Information Bottleneck-based Dynamic Token Downsampling for Compact and Efficient language model

arXiv.org Artificial Intelligence

The prevalence of Transformer-based pre-trained language models (PLMs) has led to their wide adoption for various natural language processing tasks. However, their excessive overhead leads to large latency and computational costs. The statically compression methods allocate fixed computation to different samples, resulting in redundant computation. The dynamic token pruning method selectively shortens the sequences but are unable to change the model size and hardly achieve the speedups as static pruning. In this paper, we propose a model accelaration approaches for large language models that incorporates dynamic token downsampling and static pruning, optimized by the information bottleneck loss. Our model, Infor-Coef, achieves an 18x FLOPs speedup with an accuracy degradation of less than 8\% compared to BERT. This work provides a promising approach to compress and accelerate transformer-based models for NLP tasks.


Model-Generated Pretraining Signals Improves Zero-Shot Generalization of Text-to-Text Transformers

arXiv.org Artificial Intelligence

This paper explores the effectiveness of model-generated signals in improving zero-shot generalization of text-to-text Transformers such as T5. We study various designs to pretrain T5 using an auxiliary model to construct more challenging token replacements for the main model to denoise. Key aspects under study include the decoding target, the location of the RTD head, and the masking pattern. Based on these studies, we develop a new model, METRO-T0, which is pretrained using the redesigned ELECTRA-Style pretraining strategies and then prompt-finetuned on a mixture of NLP tasks. METRO-T0 outperforms all similar-sized baselines on prompted NLP benchmarks, such as T0 Eval and MMLU, and rivals the state-of-the-art T0-11B model with only 8% of its parameters. Our analysis on model's neural activation and parameter sensitivity reveals that the effectiveness of METRO-T0 stems from more balanced contribution of parameters and better utilization of their capacity. The code and model checkpoints are available at https://github.com/gonglinyuan/metro_t0.


When the tech boys start asking for new regulations, you know something's up John Naughton

The Guardian

Watching the opening day of the US Senate hearings on AI brought to mind Marx's quip about history repeating itself, "the first time as tragedy, the second as farce". Some time ago we had the farce of the boss of Meta (neé Facebook) explaining to a senator that his company made money from advertising. This week we had the tragedy of seeing senators quizzing Sam Altman, the new acceptable face of the tech industry. Well, as one of my kids, looking up from revising O-level classics, once explained to me: "It's when you can see the disaster coming but you can't do anything to stop it." The trigger moment was when Altman declared: "We think that regulatory interventions by government will be critical to mitigate the risks of increasingly powerful models."


A TikTok 'Car Theft' Challenge Is Costing Hyundai $200 Million

WIRED

Its absence left open a void in Google Play and Apple's App Store, which have been quietly filling with scam apps that sucker users into paying for weekly or monthly subscriptions, according to research from security firm Sophos. The official ChatGPT app, meanwhile, is free, and an Android version is arriving soon. But just because something is free doesn't make it good. Telly TV is offering 55-inch televisions for $0 to the first 500,000 people who join its reservation list. Of course, "free" comes with a catch: The company reserves the right to collect heaps of data about your viewing habits, and the TV includes a built-in camera that can track your movements.