Large Language Model
In the News: Jaishankar on 'ChatGPT' -- ORF America
Privacy Statement: By submitting your name and email address, you agree that the Observer Research Foundation America (ORF America) can use the information provided to communicate with you about its research, event registrations, and donations. Third party agents that perform functions on its behalf, such as hosting, content management, and social media integration, may have access to this information if needed to perform their functions. You should review the privacy policies of any interacting social media platforms if you choose to post information from our site to these platforms. Your information may be disclosed to a third party if required to do so by law, to prevent misuse of this site, or to protect the personal safety of ORF America or that of others. This site is published in the United States and complies with all applicable laws of its jurisdictions.
The Life Cycle of Knowledge in Big Language Models: A Survey
Cao, Boxi, Lin, Hongyu, Han, Xianpei, Sun, Le
Knowledge plays a critical role in artificial intelligence. Recently, the extensive success of pre-trained language models (PLMs) has raised significant attention about how knowledge can be acquired, maintained, updated and used by language models. Despite the enormous amount of related studies, there still lacks a unified view of how knowledge circulates within language models throughout the learning, tuning, and application processes, which may prevent us from further understanding the connections between current progress or realizing existing limitations. In this survey, we revisit PLMs as knowledge-based systems by dividing the life circle of knowledge in PLMs into five critical periods, and investigating how knowledge circulates when it is built, maintained and used. To this end, we systematically review existing studies of each period of the knowledge life cycle, summarize the main challenges and current limitations, and discuss future directions.
Exploring ChatGPT's Ability to Rank Content: A Preliminary Study on Consistency with Human Preferences
Ji, Yunjie, Gong, Yan, Peng, Yiping, Ni, Chao, Sun, Peiyan, Pan, Dongyu, Ma, Baochang, Li, Xiangang
As a natural language assistant, ChatGPT is capable of performing various tasks, including but not limited to article generation, code completion, and data analysis. Furthermore, ChatGPT has consistently demonstrated a remarkable level of accuracy and reliability in terms of content evaluation, exhibiting the capability of mimicking human preferences. To further explore ChatGPT's potential in this regard, a study is conducted to assess its ability to rank content. In order to do so, a test set consisting of prompts is created, covering a wide range of use cases, and five models are utilized to generate corresponding responses. ChatGPT is then instructed to rank the responses generated by these models. The results on the test set show that ChatGPT's ranking preferences are consistent with human to a certain extent. This preliminary experimental finding implies that ChatGPT's zero-shot ranking capability could be used to reduce annotation pressure in a number of ranking tasks.
I-Tuning: Tuning Frozen Language Models with Image for Lightweight Image Captioning
Luo, Ziyang, Hu, Zhipeng, Xi, Yadong, Zhang, Rongsheng, Ma, Jing
Image Captioning is a traditional vision-and-language task that aims to generate the language description of an image. Recent studies focus on scaling up the model size and the number of training data, which significantly increase the cost of model training. Different to these heavy-cost models, we introduce a lightweight image captioning framework (I-Tuning), which contains a small number of trainable parameters. We design a novel I-Tuning cross-attention module to connect the non-trainable pre-trained language decoder GPT2 and vision encoder CLIP-ViT. Since most parameters are not required to be updated during training, our framework is lightweight and fast. Experimental results conducted on three image captioning benchmarks reveal that our framework achieves comparable or better performance than the large-scale baseline systems. But our models contain up to 10 times fewer trainable parameters and require much fewer data for training compared with state-of-the-art baselines.
Meet in the Middle: A New Pre-training Paradigm
Nguyen, Anh, Karampatziakis, Nikos, Chen, Weizhu
Most language models (LMs) are trained and applied in an autoregressive left-to-right fashion, assuming that the next token only depends on the preceding ones. However, this assumption ignores the potential benefits of using the full sequence information during training, and the possibility of having context from both sides during inference. In this paper, we propose a new pre-training paradigm with techniques that jointly improve the training data efficiency and the capabilities of the LMs in the infilling task. The first is a training objective that aligns the predictions of a left-to-right LM with those of a right-to-left LM, trained on the same data but in reverse order. The second is a bidirectional inference procedure that enables both LMs to meet in the middle. We show the effectiveness of our pre-training paradigm with extensive experiments on both programming and natural language models, outperforming strong baselines.
Transformer-based approaches to Sentiment Detection
Ojo, Olumide Ebenezer, Ta, Hoang Thang, Gelbukh, Alexander, Calvo, Hiram, Adebanji, Olaronke Oluwayemisi, Sidorov, Grigori
The use of transfer learning methods is largely responsible for the present breakthrough in Natural Learning Processing (NLP) tasks across multiple domains. In order to solve the problem of sentiment detection, we examined the performance of four different types of well-known state-of-the-art transformer models for text classification. Models such as Bidirectional Encoder Representations from Transformers (BERT), Robustly Optimized BERT Pre-training Approach (RoBERTa), a distilled version of BERT (DistilBERT), and a large bidirectional neural network architecture (XLNet) were proposed. The performance of the four models that were used to detect disaster in the text was compared. All the models performed well enough, indicating that transformer-based models are suitable for the detection of disaster in text. The RoBERTa transformer model performs best on the test dataset with a score of 82.6% and is highly recommended for quality predictions. Furthermore, we discovered that the learning algorithms' performance was influenced by the pre-processing techniques, the nature of words in the vocabulary, unbalanced labeling, and the model parameters.
Input-length-shortening and text generation via attention values
Tan, Neลet รzkan, Peng, Alex Yuxuan, Bensemann, Joshua, Bao, Qiming, Hartill, Tim, Gahegan, Mark, Witbrock, Michael
Identifying words that impact a task's performance more than others is a challenge in natural language processing. Transformers models have recently addressed this issue by incorporating an attention mechanism that assigns greater attention (i.e., relevance) scores to some words than others. Because of the attention mechanism's high computational cost, transformer models usually have an input-length limitation caused by hardware constraints. This limitation applies to many transformers, including the well-known bidirectional encoder representations of the transformer (BERT) model. In this paper, we examined BERT's attention assignment mechanism, focusing on two questions: (1) How can attention be employed to reduce input length? (2) How can attention be used as a control mechanism for conditional text generation? We investigated these questions in the context of a text classification task. We discovered that BERT's early layers assign more critical attention scores for text classification tasks compared to later layers. We demonstrated that the first layer's attention sums could be used to filter tokens in a given sequence, considerably decreasing the input length while maintaining good test accuracy. We also applied filtering, which uses a compute-efficient semantic similarities algorithm, and discovered that retaining approximately 6\% of the original sequence is sufficient to obtain 86.5\% accuracy. Finally, we showed that we could generate data in a stable manner and indistinguishable from the original one by only using a small percentage (10\%) of the tokens with high attention scores according to BERT's first layer.
Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective
Uchendu, Adaku, Le, Thai, Lee, Dongwon
Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality openended texts (so-called "neural texts"), one has to now consider Figure 1: The figure illustrates the quadrant of research problems authorships by humans, machines, or their combination. Due where (1) the GRAY quadrants are the focus of this survey, to the implications and potential threats of neural texts when and (2) The BLACK box indicates the specialized binary AA problem used maliciously, it has become critical to understand the limitations to distinguish neural texts from human texts. of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution released (e.g., FAIR [16, 82], CTRL [59], PPLM [25], T5 [94], Wu-and obfuscation of neural text authorship from a Data Dao
GM is working on a ChatGPT-like digital assistant for cars
General Motors is working on an in-car digital assistant based on the same machine learning models that power ChatGPT. News of the development was first reported earlier this week by Semafor, with GM later sharing confirmation with Reuters. "ChatGPT is going to be in everything," GM Vice President Scott Miller told the outlet. Among other things, the automaker envisions the digital assistant supporting drivers in situations where they may have turned to their vehicle's owner's manual in the past. For instance, the assistant could show you how to replace your car's tire if it suffers a flat.
The ChatGPT list of lists: A collection of 3000+ prompts, examples, use-cases, tools, APIs, extensions, fails and other resources.
ChatGPT has passed a number of university or professional admission tests (this can also tell you something about the tests). The system can typically answer questions that require reasoning and knowledge of the world (even in depth) -- it cannot manipulate physical entities, interpret images or solve maths problems beyond simple arithmetics. Again, what is exciting for me is the incredible bandwith of the system. There are probably only a few human beings who can directly pass medical, legal and business exams at this level. At the moment, however, ChatGPT mostly just passed, the grades weren't insanely great.