Large Language Model
Quadapter: Adapter for GPT-2 Quantization
Park, Minseop, You, Jaeseong, Nagel, Markus, Chang, Simyung
Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.
Misogyny classification of German newspaper forum comments
Petrak, Johann, Krenn, Brigitte
This paper presents work on detecting misogyny in the comments of a large Austrian German language newspaper forum. We describe the creation of a corpus of 6600 comments which were annotated with 5 levels of misogyny. The forum moderators were involved as experts in the creation of the annotation guidelines and the annotation of the comments. We also describe the results of training transformer-based classification models for both binarized and original label classification of that corpus.
WikiWhy: Answering and Explaining Cause-and-Effect Questions
Ho, Matthew, Sharma, Aditya, Chang, Justin, Saxon, Michael, Levy, Sharon, Lu, Yujie, Wang, William Yang
As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements.
A Case for Business Process-Specific Foundation Models
Rizk, Yara, Venkateswaran, Praveen, Isahagian, Vatche, Muthusamy, Vinod
The inception of large language models has helped advance state-of-the-art performance on numerous natural language tasks. This has also opened the door for the development of foundation models for other domains and data modalities such as images, code, and music. In this paper, we argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models to handle tasks like process mining, optimization, and decision making. These models should also tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.
Improving the Cross-Lingual Generalisation in Visual Question Answering
Nooralahzadeh, Farhad, Sennrich, Rico
While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available.
Large Language Models are Few-Shot Clinical Information Extractors
Agrawal, Monica, Hegselmann, Stefan, Lang, Hunter, Kim, Yoon, Sontag, David
A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT, perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.
4 AI research trends everyone is (or will be) talking about
Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Using AI in the real world remains challenging in many ways. Organizations are struggling to attract and retain talent, build and deploy AI models, define and apply responsible AI practices, and understand and prepare for regulatory framework compliance. At the same time, the DeepMinds, Googles and Metas of the world are pushing ahead with their AI research. Their talent pool, experience and processes around operationalizing AI research rapidly and at scale puts them on a different level from the rest of the world, creating a de facto AI divide. These are 4 AI research trends that the tech giants are leading on, but everyone else will be talking about and using in the near future.
Better Language Models Without Massive Compute – Google AI Blog
In recent years, language models (LMs) have become more prominent in natural language processing (NLP) research and are also becoming increasingly impactful in practice. Scaling up LMs has been shown to improve performance across a range of NLP tasks. For instance, scaling up language models can improve perplexity across seven orders of magnitude of model sizes, and new abilities such as multi-step reasoning have been observed to arise as a result of model scale. However, one of the challenges of continued scaling is that training new, larger models requires great amounts of computational resources. Moreover, new models are often trained from scratch and do not leverage the weights from previously existing models.
What is Open AI and What Does It Do? - Fronty
OpenAI is a non-profit research organization dedicated to developing and applying artificial intelligence (AI) for the benefit of humanity as a whole. Elon Musk and Sam Altman founded the company in 2015, headquartered in San Francisco, California. OpenAI was founded partly due to its founders' existential fears about the potential for a disaster caused by carelessness and misuse of general-purpose AI. The company focuses on fundamental advances in artificial intelligence and its capabilities. The company's two founders and other investors began with a $1 billion endowment. Elon Musk left the company in February 2018 due to potential conflicts with his work at Tesla, Nikola Tesla's electronics company.