Goto

Collaborating Authors

 Large Language Model


The First ML Value Chain Landscape - KDnuggets

#artificialintelligence

TheSequence is an ML community that has worked with DeepMind, OpenAI, Google Brain, and many more. The world of ML and AI is growing so quickly, to keep up with all of the changes is difficult for professionals and aspiring professionals. The goal at TheSequence is to make people who are interested in the sector smarter about AI without having to read white papers and spend hours on end researching. They also offer other days of knowledge, if you would like to know more click on this link. On October 12th, 2022, TheSequence released the first-ever ML Chain Landscape shaped by data scientists.


Marketing using AI text generation must be carefully managed – Bestgamingpro

#artificialintelligence

OpenAI's flagship natural language processing (NLP) programme, GPT-3, has been available for two years now. Poems, essays, song lyrics, and even comprehensive manifestos could be generated by the AI language tool with just the barest of cues, and its readers were astounded by the quality of the writing produced. OpenAI's GPT-3 is a "foundation model" that was trained using what amounts to the whole of the internet (Wikipedia, Reddit, The New York Times, etc.). In order to determine the most likely responses to each given challenge, it utilises this massive information. Due to the massive scope of this study, only a select few of these basic models exist.


Deep language algorithms predict semantic comprehension from brain activity - Scientific Reports

#artificialintelligence

Deep language algorithms, like GPT-2, have demonstrated remarkable abilities to process text, and now constitute the backbone of automatic translation, summarization and dialogue. However, whether these models encode information that relates to human comprehension still remains controversial. Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but they also predict the extent to which subjects understand the corresponding narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2’s activations. Finally, we show that this mapping reliably correlates ( $$\mathcal {R}=0.50, p<10^{-15}$$ ) with subjects’ comprehension scores as assessed for each story. This effect peaks in the angular, medial temporal and supra-marginal gyri, and is best accounted for by the long-distance dependencies generated in the deep layers of GPT-2. Overall, this study shows how deep language models help clarify the brain computations underlying language comprehension.


AI-generated essays are nothing to worry about (opinion)

#artificialintelligence

September 2022 was apparently the month artificial intelligence essay angst boiled over in academia, as various media outlets published opinion pieces lamenting the rise of AI writing systems that will ruin student writing and pave the way toward unprecedented levels of academic misconduct. Then, on Sept. 23, academic Twitter exploded into a bit of a panic on this topic. The firestorm was prompted by a post to the OpenAI subreddit where user Urdadgirl69 claimed to be getting straight A's with essays "written" using artificial intelligence. Professors on Reddit and Twitter alike expressed frustration and concern about how best to address the threat of AI essays. One of the most poignant and widely retweeted laments came from Redditor ahumanlikeyou, who wrote, "Grading something an AI wrote is an incredibly depressing waste of my life."


Referee: Reference-Free Sentence Summarization with Sharper Controllability through Symbolic Knowledge Distillation

arXiv.org Artificial Intelligence

We present Referee, a novel framework for sentence summarization that can be trained reference-free (i.e., requiring no gold summaries for supervision), while allowing direct control for compression ratio. Our work is the first to demonstrate that reference-free, controlled sentence summarization is feasible via the conceptual framework of Symbolic Knowledge Distillation (West et al., 2022), where latent knowledge in pre-trained language models is distilled via explicit examples sampled from the teacher models, further purified with three types of filters: length, fidelity, and Information Bottleneck. Moreover, we uniquely propose iterative distillation of knowledge, where student models from the previous iteration of distillation serve as teacher models in the next iteration. Starting off from a relatively modest set of GPT3-generated summaries, we demonstrate how iterative knowledge distillation can lead to considerably smaller, but better summarizers with sharper controllability. A useful by-product of this iterative distillation process is a high-quality dataset of sentence-summary pairs with varying degrees of compression ratios. Empirical results demonstrate that the final student models vastly outperform the much larger GPT3-Instruct model in terms of the controllability of compression ratios, without compromising the quality of resulting summarization.


Multilingual Auxiliary Tasks Training: Bridging the Gap between Languages for Zero-Shot Transfer of Hate Speech Detection Models

arXiv.org Artificial Intelligence

Zero-shot cross-lingual transfer learning has been shown to be highly challenging for tasks involving a lot of linguistic specificities or when a cultural gap is present between languages, such as in hate speech detection. In this paper, we highlight this limitation for hate speech detection in several domains and languages using strict experimental settings. Then, we propose to train on multilingual auxiliary tasks -- sentiment analysis, named entity recognition, and tasks relying on syntactic information -- to improve zero-shot transfer of hate speech detection models across languages. We show how hate speech detection models benefit from a cross-lingual knowledge proxy brought by auxiliary tasks fine-tuning and highlight these tasks' positive impact on bridging the hate speech linguistic and cultural gap between languages.


OpenStance: Real-world Zero-shot Stance Detection

arXiv.org Artificial Intelligence

Prior studies of zero-shot stance detection identify the attitude of texts towards unseen topics occurring in the same document corpus. Such task formulation has three limitations: (i) Single domain/dataset. A system is optimized on a particular dataset from a single domain; therefore, the resulting system cannot work well on other datasets; (ii) the model is evaluated on a limited number of unseen topics; (iii) it is assumed that part of the topics has rich annotations, which might be impossible in real-world applications. These drawbacks will lead to an impractical stance detection system that fails to generalize to open domains and open-form topics. This work defines OpenStance: open-domain zero-shot stance detection, aiming to handle stance detection in an open world with neither domain constraints nor topic-specific annotations. The key challenge of OpenStance lies in the open-domain generalization: learning a system with fully unspecific supervision but capable of generalizing to any dataset. To solve OpenStance, we propose to combine indirect supervision, from textual entailment datasets, and weak supervision, from data generated automatically by pre-trained Language Models. Our single system, without any topic-specific supervision, outperforms the supervised method on three popular datasets. To our knowledge, this is the first work that studies stance detection under the open-domain zero-shot setting. All data and code are publicly released.


Exploring Robustness of Prefix Tuning in Noisy Data: A Case Study in Financial Sentiment Analysis

arXiv.org Artificial Intelligence

The invention of transformer-based models such as BERT, GPT, and RoBERTa has enabled researchers and financial companies to finetune these powerful models and use them in different downstream tasks to achieve state-of-the-art performance. Recently, a lightweight alternative (approximately 0.1% - 3% of the original model parameters) to fine-tuning, known as prefix tuning has been introduced. This method freezes the model parameters and only updates the prefix to achieve performance comparable to full fine-tuning. Prefix tuning enables researchers and financial practitioners to achieve similar results with much fewer parameters. In this paper, we explore the robustness of prefix tuning when facing noisy data. Our experiments demonstrate that fine-tuning is more robust to noise than prefix tuning -- the latter method faces a significant decrease in performance on most corrupted data sets with increasing noise levels. Furthermore, prefix tuning has high variances in the F1 scores compared to fine-tuning in many corruption methods. We strongly advocate that caution should be carefully taken when applying the state-of-the-art prefix tuning method to noisy data.


In-Context Learning for Few-Shot Dialogue State Tracking

arXiv.org Artificial Intelligence

Collecting and annotating task-oriented dialogues is time-consuming and costly; thus, zero and few shot learning could greatly benefit dialogue state tracking (DST). In this work, we propose an in-context learning (ICL) framework for zero-shot and few-shot learning DST, where a large pre-trained language model (LM) takes a test instance and a few exemplars as input, and directly decodes the dialogue state without any parameter updates. To better leverage a tabular domain description in the LM prompt, we reformulate DST into a text-to-SQL problem. We also propose a novel approach to retrieve annotated dialogues as exemplars. Empirical results on MultiWOZ show that our method IC-DST substantially outperforms previous fine-tuned state-of-the-art models in few-shot settings. In addition, we test IC-DST in zero-shot settings, in which the model only takes a fixed task instruction as input, finding that it outperforms previous zero-shot methods by a large margin.


IELM: An Open Information Extraction Benchmark for Pre-Trained Language Models

arXiv.org Artificial Intelligence

We introduce a new open information extraction (OIE) benchmark for pre-trained language models (LM). Recent studies have demonstrated that pre-trained LMs, such as BERT and GPT, may store linguistic and relational knowledge. In particular, LMs are able to answer ``fill-in-the-blank'' questions when given a pre-defined relation category. Instead of focusing on pre-defined relations, we create an OIE benchmark aiming to fully examine the open relational information present in the pre-trained LMs. We accomplish this by turning pre-trained LMs into zero-shot OIE systems. Surprisingly, pre-trained LMs are able to obtain competitive performance on both standard OIE datasets (CaRB and Re-OIE2016) and two new large-scale factual OIE datasets (TAC KBP-OIE and Wikidata-OIE) that we establish via distant supervision. For instance, the zero-shot pre-trained LMs outperform the F1 score of the state-of-the-art supervised OIE methods on our factual OIE datasets without needing to use any training sets. Our code and datasets are available at https://github.com/cgraywang/IELM