AITopics

2208.03299

Country:

North America > Bermuda (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England (0.04)
(11 more...)

Genre: Research Report (0.83)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceNov-16-2022

Reflect, Not Reflex: Inference-Based Common Ground Improves Dialogue Response Quality

Zhou, Pei, Cho, Hyundong, Jandaghi, Pegah, Lee, Dong-Ho, Lin, Bill Yuchen, Pujara, Jay, Ren, Xiang

Human communication relies on common ground (CG), the mutual knowledge and beliefs shared by participants, to produce coherent and interesting conversations. In this paper, we demonstrate that current response generation (RG) models produce generic and dull responses in dialogues because they act reflexively, failing to explicitly model CG, both due to the lack of CG in training data and the standard RG training procedure. We introduce Reflect, a dataset that annotates dialogues with explicit CG (materialized as inferences approximating shared knowledge and beliefs) and solicits 9k diverse human-generated responses each following one common ground. Using Reflect, we showcase the limitations of current dialogue data and RG models: less than half of the responses in current data are rated as high quality (sensible, specific, and interesting) and models trained using this data have even lower quality, while most Reflect responses are judged high quality. Next, we analyze whether CG can help models produce better-quality responses by using Reflect CG to guide RG models. Surprisingly, we find that simply prompting GPT3 to "think" about CG generates 30% more quality responses, showing promising benefits to integrating CG into the RG process.

large language model, machine learning, natural language, (23 more...)

2211.09267

Country:

North America > United States > California (0.14)
Asia > Middle East > Jordan (0.05)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(5 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.53)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)
Information Technology > Communications > Social Media > Crowdsourcing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

#artificialintelligenceNov-15-2022, 16:45:55 GMT

La veille de la cybersécurité

Large language models (LLMs) have a dirty secret: they require vast amounts of energy to train and run. What's more, it's still a bit of a mystery exactly how big these models' carbon footprints really are. AI startup Hugging Face believes it's come up with a new, better way to calculate that more precisely, by estimating emissions produced during the model's whole life cycle rather than just during training. It could be a step toward more realistic data from tech companies about the carbon footprint of their AI products at a time when experts are calling for the sector to do a better job of evaluating AI's environmental impact. Hugging Face's work is published in a non-peer-reviewed paper.

hugging face, veille

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

MIT Technology ReviewNov-15-2022, 12:00:00 GMT

Why we need to do a better job of measuring AI's carbon footprint

I've just published a story about the first attempt to calculate the broader emissions of one of the most popular AI products right now--large language models--and how it could help nudge the tech sector to do more to clean up its act. AI startup Hugging Face calculated the emissions of its large language model BLOOM, and its researchers found that the training process emitted 25 metric tons of carbon. However, those emissions doubled when they took the wider hardware and infrastructure costs of running the model into account. They published their work in a paper posted on arXiv that's yet to be peer reviewed. The finding in itself isn't hugely surprising, and BLOOM is way "cleaner" than large language models like OpenAI's GPT-3 and Meta's OPT, because it was trained on a French supercomputer powered by nuclear energy. Instead, the significance of this work is that it points to a better way to calculate AI models' climate impact, by going beyond just the training to the way they're used in the real world.

large language model, machine learning, natural language, (12 more...)

MIT Technology Review

Country: North America > Canada > Quebec > Montreal (0.06)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.58)

#artificialintelligenceNov-15-2022, 04:20:44 GMT

Kue Balok Mang Salam (Promotional Sentence Created by GPT3)

Kue Balok Mang Salam is a cake shop in the Telkom University area that offers a variety of delicious and affordable cakes. The shop is open every day from 15pm to 10pm, and offers a variety of cakes, including chocolate, vanilla, and strawberry. Kue Balok Mang Salam is the perfect place to get your hands on delicious Sundanese specialties like "Kue Balok", "Mie Tek-Tek" and refreshing drinks like "Ice Chocolate", "Ice Matcha", "Lemon Tea" at an affordable price!

delicious kue balok, kue balok mang salam, promotional sentence, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)

Large Language Models and the Reverse Turing Test

Sejnowski, Terrence

Large Language Models (LLMs) have been transformative. They are pre-trained foundational models that are self-supervised and can be adapted with fine tuning to a wide range of natural language tasks, each of which previously would have required a separate network model. This is one step closer to the extraordinary versatility of human language. GPT-3 and more recently LaMDA can carry on dialogs with humans on many topics after minimal priming with a few examples. However, there has been a wide range of reactions and debate on whether these LLMs understand what they are saying or exhibit signs of intelligence. This high variance is exhibited in three interviews with LLMs reaching wildly different conclusions. A new possibility was uncovered that could explain this divergence. What appears to be intelligence in LLMs may in fact be a mirror that reflects the intelligence of the interviewer, a remarkable twist that could be considered a Reverse Turing Test. If so, then by studying interviews we may be learning more about the intelligence and beliefs of the interviewer than the intelligence of the LLMs. As LLMs become more capable they may transform the way we interact with machines and how they interact with each other. Increasingly, LLMs are being coupled with sensorimotor devices. LLMs can talk the talk, but can they walk the walk? A road map for achieving artificial general autonomy is outlined with seven major improvements inspired by brain systems. LLMs could be used to uncover new insights into brain function by downloading brain data during natural behaviors.

large language model, machine learning, natural language, (21 more...)

doi: 10.1162/neco_a_01563

2207.14382

Country:

Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.05)
Africa > Middle East > Egypt (0.05)
Atlantic Ocean > North Atlantic Ocean > English Channel (0.04)
(7 more...)

Genre:

Personal > Interview (0.46)
Personal > Honors (0.46)

Industry:

Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Education > Educational Setting (1.00)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Introducing Semantics into Speech Encoders

Xu, Derek, Dong, Shuyan, Wang, Changhan, Kim, Suyoun, Lin, Zhaojiang, Shrivastava, Akshat, Li, Shang-Wen, Tseng, Liang-Hsuan, Baevski, Alexei, Lin, Guan-Ting, Lee, Hung-yi, Sun, Yizhou, Wang, Wei

Recent studies find existing self-supervised speech encoders contain primarily acoustic rather than semantic information. As a result, pipelined supervised automatic speech recognition (ASR) to large language model (LLM) systems achieve state-of-the-art results on semantic spoken language tasks by utilizing rich semantic representations from the LLM. These systems come at the cost of labeled audio transcriptions, which is expensive and time-consuming to obtain. We propose a task-agnostic unsupervised way of incorporating semantic information from LLMs into self-supervised speech encoders without labeled audio transcriptions. By introducing semantics, we improve existing speech encoder spoken language understanding performance by over 10\% on intent classification, with modest gains in named entity resolution and slot filling, and spoken question answering FF1 score by over 2\%. Our unsupervised approach achieves similar performance as supervised methods trained on over 100 hours of labeled audio transcripts, demonstrating the feasibility of unsupervised semantic augmentations to existing speech encoders.

artificial intelligence, large language model, natural language, (17 more...)

2211.08402

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Taiwan (0.04)
Asia > South Korea > Gyeonggi-do > Suwon (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Flamingo: a Visual Language Model for Few-Shot Learning

Alayrac, Jean-Baptiste, Donahue, Jeff, Luc, Pauline, Miech, Antoine, Barr, Iain, Hasson, Yana, Lenc, Karel, Mensch, Arthur, Millican, Katie, Reynolds, Malcolm, Ring, Roman, Rutherford, Eliza, Cabi, Serkan, Han, Tengda, Gong, Zhitao, Samangooei, Sina, Monteiro, Marianne, Menick, Jacob, Borgeaud, Sebastian, Brock, Andrew, Nematzadeh, Aida, Sharifzadeh, Sahand, Binkowski, Mikolaj, Barreira, Ricardo, Vinyals, Oriol, Zisserman, Andrew, Simonyan, Karen

Building models that can be rapidly adapted to novel tasks using only a handful of annotated examples is an open challenge for multimodal machine learning research. We introduce Flamingo, a family of Visual Language Models (VLM) with this ability. We propose key architectural innovations to: (i) bridge powerful pretrained vision-only and language-only models, (ii) handle sequences of arbitrarily interleaved visual and textual data, and (iii) seamlessly ingest images or videos as inputs. Thanks to their flexibility, Flamingo models can be trained on large-scale multimodal web corpora containing arbitrarily interleaved text and images, which is key to endow them with in-context few-shot learning capabilities. We perform a thorough evaluation of our models, exploring and measuring their ability to rapidly adapt to a variety of image and video tasks. These include open-ended tasks such as visual question-answering, where the model is prompted with a question which it has to answer; captioning tasks, which evaluate the ability to describe a scene or an event; and close-ended tasks such as multiple-choice visual question-answering. For tasks lying anywhere on this spectrum, a single Flamingo model can achieve a new state of the art with few-shot learning, simply by prompting the model with task-specific examples. On numerous benchmarks, Flamingo outperforms models fine-tuned on thousands of times more task-specific data.

large language model, machine learning, natural language, (16 more...)

2204.14198

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry:

Education (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Zosa, Elaine, Pivovarova, Lidia

Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

This paper presents M3L-Contrast -- a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into a shared topic space. Our model is trained jointly on texts and images and takes advantage of pretrained document and image embeddings to abstract the complexities between different languages and modalities. As a multilingual topic model, it produces aligned language-specific topics and as multimodal model, it infers textual representations of semantic concepts in images. We demonstrate that our model is competitive with a zero-shot topic model in predicting topic distributions for comparable multilingual data and significantly outperforms a zero-shot model in predicting topic distributions for comparable texts and images. We also show that our model performs almost as well on unaligned embeddings as it does on aligned embeddings.

large language model, natural language, topic model, (19 more...)

2211.08057

Country:

Europe > Finland > Uusimaa > Helsinki (0.04)
Asia > Middle East > Jordan (0.04)
South America (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.82)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)

Delobelle, Pieter, Winters, Thomas, Berendt, Bettina

RobBERT-2022: Updating a Dutch Language Model to Account for Evolving Language Use

Large transformer-based language models, e.g. BERT and GPT-3, outperform previous architectures on most natural language processing tasks. Such language models are first pre-trained on gigantic corpora of text and later used as base-model for finetuning on a particular task. Since the pre-training step is usually not repeated, base models are not up-to-date with the latest information. In this paper, we update RobBERT, a RoBERTa-based state-of-the-art Dutch language model, which was trained in 2019. First, the tokenizer of RobBERT is updated to include new high-frequent tokens present in the latest Dutch OSCAR corpus, e.g. corona-related words. Then we further pre-train the RobBERT model using this dataset. To evaluate if our new model is a plug-in replacement for RobBERT, we introduce two additional criteria based on concept drift of existing tokens and alignment for novel tokens.We found that for certain language tasks this update results in a significant performance increase. These results highlight the benefit of continually updating a language model to account for evolving language use.

large language model, machine learning, natural language, (19 more...)

2211.08192

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.31)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)