Goto

Collaborating Authors

 Large Language Model


ChatGPT: Optimizing Language Models for Dialogue

#artificialintelligence

We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer followup questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests. ChatGPT is a sibling model to InstructGPT, which is trained to follow an instruction in a prompt and provide a detailed response. We are excited to introduce ChatGPT to get users' feedback and learn about its strengths and weaknesses. During the research preview, usage of ChatGPT is free.


Ripple CTO shuts down ChatGPT's XRP conspiracy theory

#artificialintelligence

Ripple's chief technology officer has responded to a conspiracy theory fabricated by Artificial Intelligence (AI) tool ChatGPT, which alleges the XRP Ledger (XRPL) is somehow being secretly controlled by Ripple. According to a Dec. 3 Twitter thread by user Stefan Huber, when asked a series of questions regarding the decentralization of Ripple's XRP Ledger, the ChatGPT bot suggested that while people could participate in the governance of the blockchain, Ripple has the "ultimate control" of XRPL. Asked how this is possible without the consensus of participants and its publicly-available code, the AI alleged that Ripple may have "abilities that are not fully disclosed in the public source code." At one point, the AI said "the ultimate decision-making power" for XRPL "still lies with Ripple Labs" and the company could make changes "even if those changes do not have the support of the supermajority of the participants in the network." It also contrasted the XRPL with Bitcoin (BTC) saying the latter was "truly decentralized."


Internet Sensation ChatGPT Crosses 1 Million Users In 5 Days

#artificialintelligence

Remember the 2014 critically acclaimed American Sci-fi rom-com Her? The Joaquin Phoenix-led movie was loved by movie-goers and tech-enthusiasts across the globe. A movie about how an introverted, lonely writer buys an AI system to help with work, only to fall in love with it. Cut to 2022, it may no longer be just an AI present in the film. On December 2, a dialogue between a The New York Times journalist and a Silicon Valley tech entrepreneur on free speech and censorship was shared by Elon Musk in a quick tweet to his 119.8 million followers, which left the Internet users in a frenzy.


ChatGPT shrugged โ€ข TechCrunch

#artificialintelligence

ChatGPT is a new artificial intelligence (AI) tool that's designed to help people communicate with computers in a more natural and intuitive way -- using natural language processing (NLP) technology. But what's behind the development of ChatGPT and how can the tech be used to help humanity? That rather bland lede, which we've slightly edited, was generated by OpenAI's ChatGPT in response to a prompt by this (human) reporter. TechCrunch kicked off a conversation with the large language model by asking it to explain its function and purpose. We wanted to see if we could use the chatbot-style Q&A format it's most recently been embedded into to probe the tech and get AI to articulate some of its limitations.


The Future has Arrived - SuperSeed

#artificialintelligence

This week, the world changed forever. There are small things that shift all the time. But once in a while, there is a distinct fork in the road. A shift that โ€“ once made โ€“ charts a new course for the world. I'll get back to ChatGPT shortly.


CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain

arXiv.org Artificial Intelligence

The field of cybersecurity is evolving fast. Experts need to be informed about past, current and - in the best case - upcoming threats, because attacks are becoming more advanced, targets bigger and systems more complex. As this cannot be addressed manually, cybersecurity experts need to rely on machine learning techniques. In the texutual domain, pre-trained language models like BERT have shown to be helpful, by providing a good baseline for further fine-tuning. However, due to the domain-knowledge and many technical terms in cybersecurity general language models might miss the gist of textual information, hence doing more harm than good. For this reason, we create a high-quality dataset and present a language model specifically tailored to the cybersecurity domain, which can serve as a basic building block for cybersecurity systems that deal with natural language. The model is compared with other models based on 15 different domain-dependent extrinsic and intrinsic tasks as well as general tasks from the SuperGLUE benchmark. On the one hand, the results of the intrinsic tasks show that our model improves the internal representation space of words compared to the other models. On the other hand, the extrinsic, domain-dependent tasks, consisting of sequence tagging and classification, show that the model is best in specific application scenarios, in contrast to the others. Furthermore, we show that our approach against catastrophic forgetting works, as the model is able to retrieve the previously trained domain-independent knowledge. The used dataset and trained model are made publicly available


Counterfactual reasoning: Do language models need world knowledge for causal understanding?

arXiv.org Artificial Intelligence

Current pre-trained language models have enabled remarkable improvements in downstream tasks, but it remains difficult to distinguish effects of statistical correlation from more systematic logical reasoning grounded on understanding of the real world. In this paper we tease these factors apart by leveraging counterfactual conditionals, which force language models to predict unusual consequences based on hypothetical propositions. We introduce a set of tests drawn from psycholinguistic experiments, as well as larger-scale controlled datasets, to probe counterfactual predictions from a variety of popular pre-trained language models. We find that models are consistently able to override real-world knowledge in counterfactual scenarios, and that this effect is more robust in case of stronger baseline world knowledge -- however, we also find that for most models this effect appears largely to be driven by simple lexical cues. When we mitigate effects of both world knowledge and lexical cues to test knowledge of linguistic nuances of counterfactuals, we find that only GPT-3 shows sensitivity to these nuances, though this sensitivity is also non-trivially impacted by lexical associative factors.


Modern French Poetry Generation with RoBERTa and GPT-2

arXiv.org Artificial Intelligence

We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generation performance of GPT-2. Our evaluation shows that the model can create French poetry successfully. On a 5 point scale, the lowest score of 3.57 was given by human judges to typicality and emotionality of the output poetry while the best score of 3.79 was given to understandability.


Fast DistilBERT on CPUs

arXiv.org Artificial Intelligence

Transformer-based language models have become the standard approach to solving natural language processing tasks. However, industry adoption usually requires the maximum throughput to comply with certain latency constraints that prevents Transformer models from being used in production. To address this gap, model compression techniques such as quantization and pruning may be used to improve inference efficiency. However, these compression techniques require specialized software to apply and deploy at scale. In this work, we propose a new pipeline for creating and running Fast Transformer models on CPUs, utilizing hardware-aware pruning, knowledge distillation, quantization, and our own Transformer inference runtime engine with optimized kernels for sparse and quantized operators. We demonstrate the efficiency of our pipeline by creating a Fast DistilBERT model showing minimal accuracy loss on the question-answering SQuADv1.1 benchmark, and throughput results under typical production constraints and environments. Our results outperform existing state-of-the-art Neural Magic's DeepSparse runtime performance by up to 50% and up to 4.1x performance speedup over ONNX Runtime.


Language Models of Code are Few-Shot Commonsense Learners

arXiv.org Artificial Intelligence

We address the general task of structured commonsense reasoning: given a natural language input, the goal is to generate a graph such as an event -- or a reasoning-graph. To employ large language models (LMs) for this task, existing approaches ``serialize'' the output graph as a flat list of nodes and edges. Although feasible, these serialized graphs strongly deviate from the natural language corpora that LMs were pre-trained on, hindering LMs from generating them correctly. In this paper, we show that when we instead frame structured commonsense reasoning tasks as code generation tasks, pre-trained LMs of code are better structured commonsense reasoners than LMs of natural language, even when the downstream task does not involve source code at all. We demonstrate our approach across three diverse structured commonsense reasoning tasks. In all these natural language tasks, we show that using our approach, a code generation LM (CODEX) outperforms natural-LMs that are fine-tuned on the target task (e.g., T5) and other strong LMs such as GPT-3 in the few-shot setting.