AITopics

Transformer language models such as GPT-2 are difficult to quantize because of outliers in activations leading to a large quantization error. To adapt to the error, one must use quantization-aware training, which entails a fine-tuning process based on the dataset and the training pipeline identical to those for the original model. Pretrained language models, however, often do not grant access to their datasets and training pipelines, forcing us to rely on arbitrary ones for fine-tuning. In that case, it is observed that quantization-aware training overfits the model to the fine-tuning data. For quantization without overfitting, we introduce a quantization adapter (Quadapter), a small set of parameters that are learned to make activations quantization-friendly by scaling them channel-wise. It keeps the model parameters unchanged. By applying our method to the challenging task of quantizing GPT-2, we demonstrate that it effectively prevents the overfitting and improves the quantization performance.

large language model, machine learning, natural language, (20 more...)

2211.16912

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Petrak, Johann, Krenn, Brigitte

Misogyny classification of German newspaper forum comments

This paper presents work on detecting misogyny in the comments of a large Austrian German language newspaper forum. We describe the creation of a corpus of 6600 comments which were annotated with 5 levels of misogyny. The forum moderators were involved as experts in the creation of the annotation guidelines and the annotation of the comments. We also describe the results of training transformer-based classification models for both binarized and original label classification of that corpus.

classification, large language model, machine learning, (22 more...)

2211.17163

Country:

Europe > Austria > Vienna (0.14)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Media > News (0.71)
Information Technology (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

WikiWhy: Answering and Explaining Cause-and-Effect Questions

Ho, Matthew, Sharma, Aditya, Chang, Justin, Saxon, Michael, Levy, Sharon, Lu, Yujie, Wang, William Yang

As large language models (LLMs) grow larger and more sophisticated, assessing their "reasoning" capabilities in natural language grows more challenging. Recent question answering (QA) benchmarks that attempt to assess reasoning are often limited by a narrow scope of covered situations and subject matters. We introduce WikiWhy, a QA dataset built around a novel auxiliary task: explaining why an answer is true in natural language. WikiWhy contains over 9,000 "why" question-answer-rationale triples, grounded on Wikipedia facts across a diverse set of topics. Each rationale is a set of supporting statements connecting the question to the answer. WikiWhy serves as a benchmark for the reasoning capabilities of LLMs because it demands rigorous explicit rationales for each answer to demonstrate the acquisition of implicit commonsense knowledge, which is unlikely to be easily memorized. GPT-3 baselines achieve only 38.7% human-evaluated correctness in the end-to-end answer & explain condition, leaving significant room for future improvements.

explanation, large language model, machine learning, (20 more...)

2210.12152

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(18 more...)

Genre: Research Report (0.65)

Industry:

Energy (0.93)
Education (0.68)
Materials > Metals & Mining (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.38)

Rizk, Yara, Venkateswaran, Praveen, Isahagian, Vatche, Muthusamy, Vinod

A Case for Business Process-Specific Foundation Models

The inception of large language models has helped advance state-of-the-art performance on numerous natural language tasks. This has also opened the door for the development of foundation models for other domains and data modalities such as images, code, and music. In this paper, we argue that business process data representations have unique characteristics that warrant the development of a new class of foundation models to handle tasks like process mining, optimization, and decision making. These models should also tackle the unique challenges of applying AI to business processes which include data scarcity, multi-modal representations, domain specific terminology, and privacy concerns.

data mining, large language model, machine learning, (18 more...)

2210.14739

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Information Technology (0.90)
Banking & Finance > Loans (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Nooralahzadeh, Farhad, Sennrich, Rico

Improving the Cross-Lingual Generalisation in Visual Question Answering

While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available.

large language model, machine learning, question answering, (22 more...)

2209.02982

Country:

North America > Dominican Republic (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(8 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
(2 more...)

Large Language Models are Few-Shot Clinical Information Extractors

Agrawal, Monica, Hegselmann, Stefan, Lang, Hunter, Kim, Yoon, Sontag, David

A long-running goal of the clinical NLP community is the extraction of important variables trapped in clinical notes. However, roadblocks have included dataset shift from the general domain and a lack of public clinical corpora and annotations. In this work, we show that large language models, such as InstructGPT, perform well at zero- and few-shot information extraction from clinical text despite not being trained specifically for the clinical domain. Whereas text classification and generation performance have already been studied extensively in such models, here we additionally demonstrate how to leverage them to tackle a diverse set of NLP tasks which require more structured outputs, including span identification, token-level sequence classification, and relation extraction. Further, due to the dearth of available data to evaluate these systems, we introduce new datasets for benchmarking few-shot clinical information extraction based on a manual re-annotation of the CASI dataset for new tasks. On the clinical extraction tasks we studied, the GPT-3 systems significantly outperform existing zero- and few-shot baselines.

large language model, machine learning, natural language, (19 more...)

2205.12689

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceNov-29-2022, 21:40:27 GMT

4 AI research trends everyone is (or will be) talking about

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Using AI in the real world remains challenging in many ways. Organizations are struggling to attract and retain talent, build and deploy AI models, define and apply responsible AI practices, and understand and prepare for regulatory framework compliance. At the same time, the DeepMinds, Googles and Metas of the world are pushing ahead with their AI research. Their talent pool, experience and processes around operationalizing AI research rapidly and at scale puts them on a different level from the rest of the world, creating a de facto AI divide. These are 4 AI research trends that the tech giants are leading on, but everyone else will be talking about and using in the near future.

deepmind

Industry:

Information Technology (0.43)
Government (0.43)

Technology:

Information Technology > Artificial Intelligence > Applied AI (0.63)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

#artificialintelligenceNov-29-2022, 18:07:21 GMT

A Chat with GPT-3 with Latest Text-Davinci-003

The following is a conversation with an AI assistant. The assistant is helpful, creative, clever, and very friendly. Human: Hello, who are you? AI: I am an AI created by OpenAI. How can I help you today?

latest text-davinci-003, myer and briggs type, reflection, (8 more...)

Genre: Personal > Interview (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

#artificialintelligenceNov-29-2022, 17:00:31 GMT

Better Language Models Without Massive Compute – Google AI Blog

In recent years, language models (LMs) have become more prominent in natural language processing (NLP) research and are also becoming increasingly impactful in practice. Scaling up LMs has been shown to improve performance across a range of NLP tasks. For instance, scaling up language models can improve perplexity across seven orders of magnitude of model sizes, and new abilities such as multi-step reasoning have been observed to arise as a result of model scale. However, one of the challenges of continued scaling is that training new, larger models requires great amounts of computational resources. Moreover, new models are often trained from scratch and do not leverage the weights from previously existing models.

fine-tuning, language model, objective, (14 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

#artificialintelligenceNov-29-2022, 16:25:16 GMT

What is Open AI and What Does It Do? - Fronty

OpenAI is a non-profit research organization dedicated to developing and applying artificial intelligence (AI) for the benefit of humanity as a whole. Elon Musk and Sam Altman founded the company in 2015, headquartered in San Francisco, California. OpenAI was founded partly due to its founders' existential fears about the potential for a disaster caused by carelessness and misuse of general-purpose AI. The company focuses on fundamental advances in artificial intelligence and its capabilities. The company's two founders and other investors began with a $1 billion endowment. Elon Musk left the company in February 2018 due to potential conflicts with his work at Tesla, Nikola Tesla's electronics company.

artificial general intelligence, intelligence, openai, (10 more...)

Country: North America > United States > California > San Francisco County > San Francisco (0.58)

Industry: Leisure & Entertainment (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.99)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.99)