Goto

Collaborating Authors

 Large Language Model


Artificial intelligence holds huge promise – and peril. Let's choose the right path Michael Osborne

The Guardian

The last few months have been by far the most exciting of my 17 years working on artificial intelligence. Among many other advances, OpenAI's ChatGPT – a type of AI known as a large language model – smashed records in January to become the fastest-growing consumer application of all time, achieving 100 million users in two months. No one knows for certain what's going to happen next with AI. There's too much going on, on too many fronts, behind too many closed doors. However, we do know that AI is now in the hands of the world, and, as a consequence, the world seems likely to be transformed.


Congress Really Wants to Regulate A.I., But No One Seems to Know How

The New Yorker

In February, 2019, OpenAI, a little-known artificial-intelligence company, announced that its large-language-model text generator, GPT-2, would not be released to the public "due to our concerns about malicious applications of the technology." Among the dangers, the company stated, was a potential for misleading news articles, online impersonation, and automating the production of abusive or faked social-media content and of spam and phishing content. As a consequence, Open AI proposed that "governments should consider expanding or commencing initiatives to more systematically monitor the societal impact and diffusion of AI technologies, and to measure the progression in the capabilities of such systems." This week, four years after that warning, members of the Senate Judiciary Subcommittee on Privacy, Technology, and the Law met to discuss "Oversight of A.I.: Rules for Artificial Intelligence." As has been the case with other tech hearings on the Hill, this one came after a new technology with the capacity to fundamentally alter our social and political lives was already in circulation. Like many Americans, the lawmakers became concerned about the pitfalls of large-language-model artificial intelligence in March, when OpenAI released GPT-4, the latest and most polished iteration of its text generator.


Lawmakers reveal AI concerns over 'future of humanity' following OpenAI CEO's Senate testimony

FOX News

Displaying bias, foreign adversaries like China becoming dominant, and outsmarting humans were all top artificial intelligence concerns for members of Congress. WASHINGTON, D.C. – Congressional lawmakers spouted an array of concerns about artificial intelligence after OpenAI CEO Sam Altman told a Senate subcommittee that he saw problems the technology could create. "The overall risk is allowing China to win the AI race, because obviously, China would use the technology to further their aims of global ambition and to export their model of total techno-totalitarian control, which is nightmarish and would make Orwell blush," Republican Rep. Mike Gallagher said. "The other risk is that we don't maintain control of the technology, somehow it escapes our control." OpenAI CEO Sam Altman told a Senate subcommittee Tuesday that he had concerns about artificial intelligence's possibilities.


UP5: Unbiased Foundation Model for Fairness-aware Recommendation

arXiv.org Artificial Intelligence

Recent advancements in foundation models such as large language models (LLM) have propelled them to the forefront of recommender systems (RS). Moreover, fairness in RS is critical since many users apply it for decision-making and demand fulfillment. However, at present, there is a lack of understanding regarding the level of fairness exhibited by recommendation foundation models and the appropriate methods for equitably treating different groups of users in foundation models. In this paper, we focus on user-side unfairness problem and show through a thorough examination that there is unfairness involved in LLMs that lead to unfair recommendation results. To eliminate bias from LLM for fairness-aware recommendation, we introduce a novel Unbiased P5 (UP5) foundation model based on Counterfactually-Fair-Prompting (CFP) techniques. CFP includes two sub-modules: a personalized prefix prompt that enhances fairness with respect to individual sensitive attributes, and a Prompt Mixture that integrates multiple counterfactually-fair prompts for a set of sensitive attributes. Experiments are conducted on two real-world datasets, MovieLens-1M and Insurance, and results are compared with both matching-based and sequential-based fairness-aware recommendation models. The results show that UP5 achieves better recommendation performance and meanwhile exhibits a high level of fairness.


i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data

arXiv.org Artificial Intelligence

The convergence of text, visual, and audio data is a key step towards human-like artificial intelligence, however the current Vision-Language-Speech landscape is dominated by encoder-only models which lack generative abilities. We propose closing this gap with i-Code V2, the first model capable of generating natural language from any combination of Vision, Language, and Speech data. i-Code V2 is an integrative system that leverages state-of-the-art single-modality encoders, combining their outputs with a new modality-fusing encoder in order to flexibly project combinations of modalities into a shared representational space. Next, language tokens are generated from these representations via an autoregressive decoder. The whole framework is pretrained end-to-end on a large collection of dual- and single-modality datasets using a novel text completion objective that can be generalized across arbitrary combinations of modalities. i-Code V2 matches or outperforms state-of-the-art single- and dual-modality baselines on 7 multimodal tasks, demonstrating the power of generative multimodal pretraining across a diversity of tasks and signals.


Improving Contrastive Learning of Sentence Embeddings from AI Feedback

arXiv.org Artificial Intelligence

Contrastive learning has become a popular approach in natural language processing, particularly for the learning of sentence embeddings. However, the discrete nature of natural language makes it difficult to ensure the quality of positive and negative sample pairs generated through data augmentation methods. Although supervised contrastive learning can produce more accurate sample pairs with human feedback labels, it still lacks fine-grained training signals. In this paper, we propose to improve \textbf{C}ontrastive \textbf{L}earning of sentence embeddings from \textbf{AI} \textbf{F}eedback \textbf{(CLAIF)}. Our method utilizes AI feedback from large pre-trained language models (LLMs) to construct sample pairs with fine-grained sample similarity scores to improve contrastive learning. Besides, we combine human feedback and AI feedback to provide better supervision signals for supervised contrastive learning of sentence embeddings. Experimental results show that our method achieves state-of-the-art performance on several semantic textual similarity (STS) and transfer learning tasks compared to other unsupervised and supervised contrastive learning methods.


Prompting Language Models for Linguistic Structure

arXiv.org Artificial Intelligence

Although pretrained language models (PLMs) can be prompted to perform a wide range of language tasks, it remains an open question how much this ability comes from generalizable linguistic understanding versus surface-level lexical patterns. To test this, we present a structured prompting approach for linguistic structured prediction tasks, allowing us to perform zero- and few-shot sequence tagging with autoregressive PLMs. We evaluate this approach on part-of-speech tagging, named entity recognition, and sentence chunking, demonstrating strong few-shot performance in all cases. We also find that while PLMs contain significant prior knowledge of task labels due to task leakage into the pretraining corpus, structured prompting can also retrieve linguistic structure with arbitrary labels. These findings indicate that the in-context learning ability and linguistic knowledge of PLMs generalizes beyond memorization of their training data.


M3KE: A Massive Multi-Level Multi-Subject Knowledge Evaluation Benchmark for Chinese Large Language Models

arXiv.org Artificial Intelligence

Large language models have recently made tremendous progress in a variety of aspects, e.g., cross-task generalization, instruction following. Comprehensively evaluating the capability of large language models in multiple tasks is of great importance. In this paper, we propose M3KE, a Massive Multi-Level Multi-Subject Knowledge Evaluation benchmark, which is developed to measure knowledge acquired by Chinese large language models by testing their multitask accuracy in zero- and few-shot settings. We have collected 20,477 questions from 71 tasks. Our selection covers all major levels of Chinese education system, ranging from the primary school to college, as well as a wide variety of subjects, including humanities, history, politics, law, education, psychology, science, technology, art and religion. All questions are multiple-choice questions with four options, hence guaranteeing a standardized and unified assessment process. We've assessed a number of state-of-the-art open-source Chinese large language models on the proposed benchmark. The size of these models varies from 335M to 130B parameters. Experiment results demonstrate that they perform significantly worse than GPT-3.5 that reaches an accuracy of ~ 48% on M3KE. The dataset is available at https://github.com/tjunlp-lab/M3KE.


OntoType: Ontology-Guided Zero-Shot Fine-Grained Entity Typing with Weak Supervision from Pre-Trained Language Models

arXiv.org Artificial Intelligence

Fine-grained entity typing (FET), which assigns entities in text with context-sensitive, fine-grained semantic types, will play an important role in natural language understanding. A supervised FET method, which typically relies on human-annotated corpora for training, is costly and difficult to scale. Recent studies leverage pre-trained language models (PLMs) to generate rich and context-aware weak supervision for FET. However, a PLM may still generate a mixture of rough and fine-grained types, or tokens unsuitable for typing. In this study, we vision that an ontology provides a semantics-rich, hierarchical structure, which will help select the best results generated by multiple PLM models and head words. Specifically, we propose a novel zero-shot, ontology-guided FET method, OntoType, which follows a type ontological structure, from coarse to fine, ensembles multiple PLM prompting results to generate a set of type candidates, and refines its type resolution, under the local context with a natural language inference model. Our experiments on the Ontonotes, FIGER, and NYT datasets using their associated ontological structures demonstrate that our method outperforms the state-of-the-art zero-shot fine-grained entity typing methods. Our error analysis shows that refinement of the existing ontology structures will further improve fine-grained entity typing.


CodeT5+: Open Code Large Language Models for Code Understanding and Generation

arXiv.org Artificial Intelligence

Large language models (LLMs) pretrained on vast source code have achieved prominent progress in code intelligence. However, existing code LLMs have two main limitations in terms of architecture and pretraining tasks. First, they often adopt a specific architecture (encoder-only or decoder-only) or rely on a unified encoder-decoder network for different downstream tasks. The former paradigm is limited by inflexibility in applications while in the latter, the model is treated as a single system for all tasks, leading to suboptimal performance on a subset of tasks. Secondly, they often employ a limited set of pretraining objectives which might not be relevant to some downstream tasks and hence result in substantial performance degrade. To address these limitations, we propose ``CodeT5+'', a family of encoder-decoder LLMs for code in which component modules can be flexibly combined to suit a wide range of downstream code tasks. Such flexibility is enabled by our proposed mixture of pretraining objectives to mitigate the pretrain-finetune discrepancy. These objectives cover span denoising, contrastive learning, text-code matching, and causal LM pretraining tasks, on both unimodal and bimodal multilingual code corpora. Furthermore, we propose to initialize CodeT5+ with frozen off-the-shelf LLMs without training from scratch to efficiently scale up our models, and explore instruction-tuning to align with natural language instructions. We extensively evaluate CodeT5+ on over 20 code-related benchmarks in different settings, including zero-shot, finetuning, and instruction-tuning. We observe state-of-the-art (SoTA) model performance on various code-related tasks, such as code generation and completion, math programming, and text-to-code retrieval tasks. Particularly, our instruction-tuned CodeT5+ 16B achieves new SoTA results on HumanEval code generation task against other open code LLMs.