AITopics

Country:

North America > United States > California > San Francisco County > San Francisco (0.25)
North America > United States > Oklahoma > Tulsa County > Tulsa (0.05)
Europe > United Kingdom (0.05)
Asia > China (0.05)

Industry: Law > Litigation (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.74)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.58)

#artificialintelligenceNov-21-2022, 03:00:03 GMT

Productizing Large Language Models

Large Language Models (LLMs) are known for their near-magical ability to learn from very few examples -- as little as zero -- to create language wonders. LLMs can chat, write poetry, write code, and even do basic arithmetic. However, the same properties that make LLMs magical also make them challenging from an engineering perspective. At Replit we have deployed transformer-based language models of all sizes: 100m parameter models for search and spam, 1-10B models for a code autocomplete product we call GhostWriter, and 100B models for features that require a higher reasoning ability. In this post we'll talk about what we've learned about building and hosting large language models.

ghostwriter, language model, llm, (13 more...)

Country: North America > United States (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

arXiv.org Artificial IntelligenceNov-21-2022

Extended Multilingual Protest News Detection -- Shared Task 1, CASE 2021 and 2022

Hürriyetoğlu, Ali, Mutlu, Osman, Duruşan, Fırat, Uca, Onur, Gürel, Alaeddin Selçuk, Radford, Benjamin, Dai, Yaoyao, Hettiarachchi, Hansi, Stoehr, Niklas, Nomoto, Tadashi, Slavcheva, Milena, Vargas, Francielle, Javid, Aaqib, Beyhan, Fatih, Yörük, Erdem

We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Subtask 2 in all languages. Only the following scenarios were not outperformed by new submissions on CASE 2021: Subtask 3 Portuguese \& Subtask 4 English.

large language model, natural language, text classification, (15 more...)

2211.1136

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
Asia > Middle East > Republic of Türkiye > Mersin Province > Mersin (0.04)
South America > Brazil > São Paulo (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.54)

Zhang, Tianjun, Wang, Xuezhi, Zhou, Denny, Schuurmans, Dale, Gonzalez, Joseph E.

TEMPERA: Test-Time Prompting via Reinforcement Learning

arXiv.org Artificial IntelligenceNov-21-2022

Careful prompt design is critical to the use of large language models in zeroshot or few-shot learning. As a consequence, there is a growing interest in automated methods to design optimal prompts. In this work, we propose TEst-tiMe Prompt Editing using Reinforcement leArning (TEMPERA). In contrast to prior prompt generation methods, TEMPERA can efficiently leverage prior knowledge, is adaptive to different queries, and provides an interpretable prompt for every query. To achieve this, we design a novel action space that allows flexible editing of the initial prompts covering a comprehensive set of commonly-used components like instructions, few-shot exemplars, and verbalizers. The proposed method achieves significant gains compared with recent SoTA approaches like prompt tuning, AutoPrompt, and RLPrompt, across a variety of tasks, including sentiment analysis, topic classification, natural language inference, and reading comprehension. Our method achieves 5.33x on average improvement in sample efficiency when compared to the traditional fine-tuning methods. With the recent advances in pre-training large language models (Brown et al., 2020; Fedus et al., 2021; Raffel et al., 2020; Chowdhery et al., 2022), prompting, or in-context learning provides a dataefficient framework for performing NLU (Li & Liang, 2021; Shin et al., 2020b; Gao et al., 2020b). Such methods achieve impressive zero-shot and few-show performance in many downstream tasks. However, the prompt often has to be carefully tuned to achieve consistent performance for each task (Lu et al., 2021). For example, prompt tuning aims to optimize a continuous prefix embedding via gradient descent and directly takes generated output from the frozen pre-trained language model (Lester et al., 2021; Liu et al., 2021b;a). On the contrary, discrete prompt optimization focuses on constructing meaningful instructions, in-context exemplars and verbalizers (Brown et al., 2020; Gao et al., 2020b). Prior work often performs black-box optimization or applies RL-based methods for direct generation (Deng et al., 2022; Sun et al., 2022; Prasad et al., 2022).

large language model, machine learning, tempera, (17 more...)

2211.1189

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Alberta (0.14)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report (0.65)

Industry:

Leisure & Entertainment (0.68)
Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.85)

#artificialintelligenceNov-20-2022, 20:02:37 GMT

Meta Trained an AI on 48M Science Papers. It Was Shut Down After 2 Days

In the first year of the pandemic, science happened at light speed. More than 100,000 papers were published on COVID in those first 12 months -- an unprecedented human effort that produced an unprecedented deluge of new information. It would have been impossible to read and comprehend every one of those studies. No human being could (and, perhaps, none would want to). Galactica is an artificial intelligence developed by Meta AI (formerly known as Facebook Artificial Intelligence Research) with the intention of using machine learning to "organize science."

galactica, information, lecture note, (13 more...)

Country: North America > United States > California > Alameda County > Berkeley (0.05)

Genre: Research Report (0.49)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

#artificialintelligenceNov-20-2022, 07:50:41 GMT

How to create a zero-shot learning text classifier using Hugging Face & Streamlit!

Today I'm excited to have the opportunity to contribute to the 30DaysofStreamlit challenge via this hands-on tutorial! We will create a zero-shot learning text classifier using Hugging Face's API inference and Distilbart! With it you will have the mighty power to classify keyphrases on-the-fly, fast, and without any ML training! You can set these labels dynamically to anything, e.g.: Zero-shot learning (ZSL) differs from traditional machine learning methods as it deals with the ability to recognise objects *without* any training samples. Yet it can build and train models efficiently with the help of transferring intelligence from previously seen categories and auxiliary information.

streamlit cloud, widget, zero-shot learning text classifier, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)

#artificialintelligenceNov-20-2022, 00:50:08 GMT

Stanford debuts first AI benchmark to help understand LLMs

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. In the world of artificial intelligence (AI) and machine learning (ML), 2022 has arguably been the year of foundation models, or AI models trained on a massive scale. From GPT-3 to DALL-E, from BLOOM to Imagen -- another day, it seems, another large language model (LLM) or text-to-image model. But until now, there have been no AI benchmarks to provide a standardized way to evaluate these models, which have developed at a rapidly-accelerated pace over the past couple of years. Don't miss our new special issue: Zero trust: The new security paradigm.

ai benchmark, benchmark, language model, (16 more...)

Industry: Automobiles & Trucks (0.30)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

arXiv.org Artificial IntelligenceNov-20-2022

Modeling Fine-grained Information via Knowledge-aware Hierarchical Graph for Zero-shot Entity Retrieval

Wu, Taiqiang, Bai, Xingyu, Guo, Weigang, Liu, Weijie, Li, Siheng, Yang, Yujiu

Zero-shot entity retrieval, aiming to link mentions to candidate entities under the zero-shot setting, is vital for many tasks in Natural Language Processing. Most existing methods represent mentions/entities via the sentence embeddings of corresponding context from the Pre-trained Language Model. However, we argue that such coarse-grained sentence embeddings can not fully model the mentions/entities, especially when the attention scores towards mentions/entities are relatively low. In this work, we propose GER, a \textbf{G}raph enhanced \textbf{E}ntity \textbf{R}etrieval framework, to capture more fine-grained information as complementary to sentence embeddings. We extract the knowledge units from the corresponding context and then construct a mention/entity centralized graph. Hence, we can learn the fine-grained information about mention/entity by aggregating information from these knowledge units. To avoid the graph information bottleneck for the central mention/entity node, we construct a hierarchical graph and design a novel Hierarchical Graph Attention Network~(HGAN). Experimental results on popular benchmarks demonstrate that our proposed GER framework performs better than previous state-of-the-art models. The code has been available at https://github.com/wutaiqiang/GER-WSDM2023.

large language model, machine learning, mention entity, (19 more...)

2211.10991

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > Singapore > Central Region > Singapore (0.05)
(7 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.84)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Murty, Shikhar, Manning, Christopher D., Lundberg, Scott, Ribeiro, Marco Tulio

Fixing Model Bugs with Natural Language Patches

arXiv.org Artificial IntelligenceNov-20-2022

Current approaches for fixing systematic problems in NLP models (e.g. regex patches, finetuning on more data) are either brittle, or labor-intensive and liable to shortcuts. In contrast, humans often provide corrections to each other through natural language. Taking inspiration from this, we explore natural language patches -- declarative statements that allow developers to provide corrective feedback at the right level of abstraction, either overriding the model (``if a review gives 2 stars, the sentiment is negative'') or providing additional information the model may lack (``if something is described as the bomb, then it is good''). We model the task of determining if a patch applies separately from the task of integrating patch information, and show that with a small amount of synthetic data, we can teach models to effectively use real patches on real data -- 1 to 7 patches improve accuracy by ~1-4 accuracy points on different slices of a sentiment analysis dataset, and F1 by 7 points on a relation extraction dataset. Finally, we show that finetuning on as many as 100 labeled examples may be needed to match the performance of a small set of language patches.

entity2, large language model, natural language, (19 more...)

2211.03318

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Hawaii (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.65)

Industry:

Consumer Products & Services > Restaurants (0.48)
Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.35)

arXiv.org Artificial IntelligenceNov-20-2022

The Stack: 3 TB of permissively licensed source code

Kocetkov, Denis, Li, Raymond, Allal, Loubna Ben, Li, Jia, Mou, Chenghao, Ferrandis, Carlos Muñoz, Jernite, Yacine, Mitchell, Margaret, Hughes, Sean, Wolf, Thomas, Bahdanau, Dzmitry, von Werra, Leandro, de Vries, Harm

Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language processing but also for code understanding and generation. To stimulate open and responsible research on LLMs for code, we introduce The Stack, a 3.1 TB dataset consisting of permissively licensed source code in 30 programming languages. We describe how we collect the full dataset, construct a permissively licensed subset, present a data governance plan, discuss limitations, and show promising results on text2code benchmarks by training 350M-parameter decoders on different Python subsets. We find that (1) near-deduplicating the data significantly boosts performance across all experiments, and (2) it is possible to match previously reported HumanEval and MBPP performance using only permissively licensed data. We make the dataset available at https://hf.co/BigCode, provide a tool called "Am I in The Stack" (https://hf.co/spaces/bigcode/in-the-stack) for developers to search The Stack for copies of their code, and provide a process for code to be removed from the dataset by following the instructions at https://www.bigcode-project.org/docs/about/the-stack/.

artificial intelligence, large language model, natural language, (15 more...)

2211.15533

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > Dominican Republic (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)