AITopics | Large Language Model

Collaborating Authors

Large Language Model

News Overviews Instructional Materials AI-Alerts Classics

A bot that watched 70,000 hours of Minecraft could unlock AI's next big thing

MIT Technology ReviewNov-25-2022, 12:32:52 GMT

The result is a breakthrough for a technique known as imitation learning, in which neural networks are trained how to perform tasks by watching humans do them. Imitation learning can be used to train AI to control robot arms, drive cars or navigate webpages. There is a vast amount of video online showing people doing different tasks. By tapping into this resource, the researchers hope to do for imitation learning what GPT-3 did for large language models. "In the last few years we've seen the rise of this GPT-3 paradigm where we see amazing capabilities come from big models trained on enormous swathes of the internet," says Bowen Baker at OpenAI, one of the team behind the new Minecraft bot.

large language model, machine learning, natural language, (11 more...)

MIT Technology Review

Industry: Leisure & Entertainment > Games > Computer Games (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.63)

Add feedback

Transformer-based Model for Word Level Language Identification in Code-mixed Kannada-English Texts

Tonja, Atnafu Lambebo, Yigezu, Mesay Gemeda, Kolesnikova, Olga, Tash, Moein Shahiki, Sidorov, Grigori, Gelbuk, Alexander

arXiv.org Artificial IntelligenceNov-25-2022

Using code-mixed data in natural language processing (NLP) research currently gets a lot of attention. Language identification of social media code-mixed text has been an interesting problem of study in recent years due to the advancement and influences of social media in communication. This paper presents the Instituto Polit\'ecnico Nacional, Centro de Investigaci\'on en Computaci\'on (CIC) team's system description paper for the CoLI-Kanglish shared task at ICON2022. In this paper, we propose the use of a Transformer based model for word-level language identification in code-mixed Kannada English texts. The proposed model on the CoLI-Kenglish dataset achieves a weighted F1-score of 0.84 and a macro F1-score of 0.61.

information retrieval, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2211.14459

Country:

Africa (0.05)
South America (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.79)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.72)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.56)

Add feedback

Solving math word problems with process- and outcome-based feedback

Uesato, Jonathan, Kushman, Nate, Kumar, Ramana, Song, Francis, Siegel, Noah, Wang, Lisa, Creswell, Antonia, Irving, Geoffrey, Higgins, Irina

arXiv.org Artificial IntelligenceNov-25-2022

Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% $\to$ 12.7% final-answer error and 14.0% $\to$ 3.4% reasoning error among final-answer-correct solutions.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2211.14275

Country: North America > United States > Maryland > Baltimore (0.04)

Genre:

Workflow (1.00)
Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

CodeExp: Explanatory Code Document Generation

Cui, Haotian, Wang, Chenglong, Huang, Junjie, Inala, Jeevana Priya, Mytkowicz, Todd, Wang, Bo, Gao, Jianfeng, Duan, Nan

arXiv.org Artificial IntelligenceNov-25-2022

Developing models that can automatically generate detailed code explanation can greatly benefit software maintenance and programming education. However, existing code-to-text generation models often produce only high-level summaries of code that do not capture implementation-level choices essential for these scenarios. To fill in this gap, we propose the code explanation generation task. We first conducted a human study to identify the criteria for high-quality explanatory docstring for code. Based on that, we collected and refined a large-scale code docstring corpus and formulated automatic evaluation metrics that best match human assessments. Finally, we present a multi-stage fine-tuning strategy and baseline models for the task. Our experiments show that (1) our refined training dataset lets models achieve better performance in the explanation generation tasks compared to larger unrefined data (15x larger), and (2) fine-tuned models can generate well-structured long docstrings comparable to human-written ones. We envision our training dataset, human-evaluation protocol, recommended metrics, and fine-tuning strategy can boost future code explanation research. The code and annotated data are available at https://github.com/subercui/CodeExp.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2211.15395

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Misleading the Covid-19 vaccination discourse on Twitter: An exploratory study of infodemic around the pandemic

Sharma, Shakshi, Sharma, Rajesh, Datta, Anwitaman

arXiv.org Artificial IntelligenceNov-25-2022

In this work, we collect a moderate-sized representative corpus of tweets (200,000 approx.) pertaining Covid-19 vaccination spanning over a period of seven months (September 2020 - March 2021). Following a Transfer Learning approach, we utilize the pre-trained Transformer-based XLNet model to classify tweets as Misleading or Non-Misleading and validate against a random subset of results manually. We build on this to study and contrast the characteristics of tweets in the corpus that are misleading in nature against non-misleading ones. This exploratory analysis enables us to design features (such as sentiments, hashtags, nouns, pronouns, etc) that can, in turn, be exploited for classifying tweets as (Non-)Misleading using various ML models in an explainable manner. Specifically, several ML models are employed for prediction, with up to 90% accuracy, and the importance of each feature is explained using SHAP Explainable AI (XAI) tool. While the thrust of this work is principally exploratory analysis in order to obtain insights on the online discourse on Covid-19 vaccination, we conclude the paper by outlining how these insights provide the foundations for a more actionable approach to mitigate misinformation. The curated dataset and code is made available (Github repository) so that the research community at large can reproduce, compare against, or build upon this work.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2108.10735

Country:

North America > United States (0.93)
Europe > Estonia > Tartu County > Tartu (0.04)
Asia > Singapore (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.34)

Add feedback

TRAC: A Textual Benchmark for Reasoning about Actions and Change

He, Weinan, Huang, Canming, Xiao, Zhanhao, Liu, Yongmei

arXiv.org Artificial IntelligenceNov-25-2022

Reasoning about actions and change (RAC) is essential to understand and interact with the ever-changing environment. Previous AI research has shown the importance of fundamental and indispensable knowledge of actions, i.e., preconditions and effects. However, traditional methods rely on logical formalization which hinders practical applications. With recent transformer-based language models (LMs), reasoning over text is desirable and seemingly feasible, leading to the question of whether LMs can effectively and efficiently learn to solve RAC problems. We propose four essential RAC tasks as a comprehensive textual benchmark and generate problems in a way that minimizes the influence of other linguistic requirements (e.g., grounding) to focus on RAC. The resulting benchmark, TRAC, encompassing problems of various complexities, facilitates a more granular evaluation of LMs, precisely targeting the structural generalization ability much needed for RAC. Experiments with three high-performing transformers indicates that additional efforts are needed to tackle challenges raised by TRAC.

large language model, logic & formal reasoning, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2211.1393

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

The Future of AI and Machine Learning with the Advent of GPT-4

#artificialintelligenceNov-24-2022, 15:52:57 GMT

Machine learning (ML) is a process of teaching computers to learn from data, without being explicitly programmed. Machine learning is a field of computer science that began with the goal of creating intelligent algorithms that could learn from and make predictions on data. This process of learning from data is similar to the way humans learn. In supervised learning, the computer is given a set of training data, which includes both the input data (such as features or variables) and the desired output ( such as labels). The goal is for the computer to learn a general rule that can be used to predict the output for new data.

advent, ai and machine learning, computer, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)

Add feedback

We could run out of data to train AI language programs

MIT Technology ReviewNov-24-2022, 09:28:27 GMT

The trouble is, the types of data typically used for training language models may be used up in the near future--as early as 2026, according to a paper by researchers from Epoch, an AI research and forecasting organization, that is yet to be peer reviewed. The issue stems from the fact that, as researchers build more powerful models with greater capabilities, they have to find ever more texts to train them on. Large language model researchers are increasingly concerned that they are going to run out of this sort of data, says Teven Le Scao, a researcher at AI company Hugging Face, who was not involved in Epoch's work. The issue stems partly from the fact that language AI researchers filter the data they use to train models into two categories: high quality and low quality. The line between the two categories can be fuzzy, says Pablo Villalobos, a staff researcher at Epoch and the lead author of the paper, but text from the former is viewed as better-written and is often produced by professional writers.

category, language model, train ai language program, (6 more...)

MIT Technology Review

Country: North America > United States > California (0.18)

Genre: Research Report (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.43)

Add feedback

What Does an AI Say to Another?

#artificialintelligenceNov-24-2022, 02:45:14 GMT

I want to share an experiment with you. The latest posts have been a streak of not-so-good news, non-optimistic takes, and anti-hype arguments. I think it's paramount to talk about all that, but it's as important to let a positive vibe out every so often. Otherwise, we risk burning out--and I don't want that! That's why today I bring you a different perspective on AI.

gpt-3, gpt-3 and j1-jumbo, j1-jumbo, (10 more...)

#artificialintelligence

Genre: Personal > Interview (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.49)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

How Huge Protein Language Models Could Disrupt Structural Biology

#artificialintelligenceNov-24-2022, 01:00:16 GMT

Then in the adjacent field of chemistry, I've presented how both DeepMind and Google are working on accelerating quantum calculations. Even TikTok seems to have plans to assist quantum calculations with ML methods, as it was recently hiring people formed in these areas.

language model, protein, sequence, (13 more...)

#artificialintelligence

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Communications > Social Media (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback