Goto

Collaborating Authors

 Large Language Model


AnyTOD: A Programmable Task-Oriented Dialog System

arXiv.org Artificial Intelligence

We propose AnyTOD, an end-to-end, zero-shot task-oriented dialog (TOD) system capable of handling unseen tasks without task-specific training. We view TOD as a program executed by a language model (LM), where program logic and ontology is provided by a designer as a schema. To enable generalization to unseen schemas and programs without prior training, AnyTOD adopts a neuro-symbolic approach. A neural LM keeps track of events occurring during a conversation and a symbolic program implementing the dialog policy is executed to recommend next actions AnyTOD should take. This approach drastically reduces data annotation and model training requirements, addressing the enduring challenge of rapidly adapting a TOD system to unseen tasks and domains. We demonstrate state-of-the-art results on STAR, ABCD and SGD benchmarks. We also demonstrate strong zero-shot transfer ability in low-resource settings, such as zero-shot on MultiWOZ. In addition, we release STARv2, an updated version of the STAR dataset with richer annotations, for benchmarking zero-shot end-to-end TOD models.


Can Pretrained Language Models (Yet) Reason Deductively?

arXiv.org Artificial Intelligence

Acquiring factual knowledge with Pretrained Language Models (PLMs) has attracted increasing attention, showing promising performance in many knowledge-intensive tasks. Their good performance has led the community to believe that the models do possess a modicum of reasoning competence rather than merely memorising the knowledge. In this paper, we conduct a comprehensive evaluation of the learnable deductive (also known as explicit) reasoning capability of PLMs. Through a series of controlled experiments, we posit two main findings. (i) PLMs inadequately generalise learned logic rules and perform inconsistently against simple adversarial surface form edits. (ii) While the deductive reasoning fine-tuning of PLMs does improve their performance on reasoning over unseen knowledge facts, it results in catastrophically forgetting the previously learnt knowledge. Our main results suggest that PLMs cannot yet perform reliable deductive reasoning, demonstrating the importance of controlled examinations and probing of PLMs' reasoning abilities; we reach beyond (misleading) task performance, revealing that PLMs are still far from human-level reasoning capabilities, even for simple deductive tasks.


EntityCS: Improving Zero-Shot Cross-lingual Transfer with Entity-Centric Code Switching

arXiv.org Artificial Intelligence

Accurate alignment between languages is fundamental for improving cross-lingual pre-trained language models (XLMs). Motivated by the natural phenomenon of code-switching (CS) in multilingual speakers, CS has been used as an effective data augmentation method that offers language alignment at the word- or phrase-level, in contrast to sentence-level via parallel instances. Existing approaches either use dictionaries or parallel sentences with word alignment to generate CS data by randomly switching words in a sentence. However, such methods can be suboptimal as dictionaries disregard semantics, and syntax might become invalid after random word switching. In this work, we propose EntityCS, a method that focuses on Entity-level Code-Switching to capture fine-grained cross-lingual semantics without corrupting syntax. We use Wikidata and English Wikipedia to construct an entity-centric CS corpus by switching entities to their counterparts in other languages. We further propose entity-oriented masking strategies during intermediate model training on the EntityCS corpus for improving entity prediction. Evaluation of the trained models on four entity-centric downstream tasks shows consistent improvements over the baseline with a notable increase of 10% in Fact Retrieval. We release the corpus and models to assist research on code-switching and enriching XLMs with external knowledge.


Understanding Transformer Memorization Recall Through Idioms

arXiv.org Artificial Intelligence

To produce accurate predictions, language models (LMs) must balance between generalization and memorization. Yet, little is known about the mechanism by which transformer LMs employ their memorization capacity. When does a model decide to output a memorized phrase, and how is this phrase then retrieved from memory? In this work, we offer the first methodological framework for probing and characterizing recall of memorized sequences in transformer LMs. First, we lay out criteria for detecting model inputs that trigger memory recall, and propose idioms as inputs that typically fulfill these criteria. Next, we construct a dataset of English idioms and use it to compare model behavior on memorized vs. non-memorized inputs. Specifically, we analyze the internal prediction construction process by interpreting the model's hidden representations as a gradual refinement of the output probability distribution. We find that across different model sizes and architectures, memorized predictions are a two-step process: early layers promote the predicted token to the top of the output distribution, and upper layers increase model confidence. This suggests that memorized information is stored and retrieved in the early layers of the network. Last, we demonstrate the utility of our methodology beyond idioms in memorized factual statements. Overall, our work makes a first step towards understanding memory recall, and provides a methodological basis for future studies of transformer memorization.


Mindmaps using ChatGPT and PlantUML

#artificialintelligence

In my previous 2 part series on using Mermaid.js with ChatGPT to build system diagrams (part 1 and part 2), we looked at the capabilities of ChatGPT to build sequence, activity, state and C4 models. PlantUML lets you create a wide range of diagrams from text descriptions.


ChatGPT is not wasteful of energy: 5 reasons

#artificialintelligence

Yep, naysayers are now focusing on ChatGPT's energy use. We just cracked the technology. The scaremongering on the energy use of these brand new and undoubtedly breakthrough'large language models' (LLMs) like ChatGPT is -- IMO as a PhD AI developer and huge fan of sustainability -- highly misguided for numerous reasons:


Google AI is about to kill openAI's Chatbot ChatGPT - The Aisan

#artificialintelligence

I love how people are talking about how ChatGPT is gonna replace Google while Google just quietly uses a more advanced dialogue system behind the scenes. ChatGPT is a big step in the right direction but they're still a few years behind Google.


We are still the Product in the AI era.

#artificialintelligence

Current consumer hype about AI, specifically ChatGPT, shakes many tech giants' status quo. Microsoft had its hand shoved early into the technology company making this quake. And it quickly stepped up to secure its seat by setting a product vision and letting people know. Whereas Google, believed to have owned a better tech, scrambled to set a seat in the hype. But the highly expected demonstration seems to be arranged haphazardly, falling short of enthusiasts' expectations.


Exploring ChatGPT: The Advanced AI Language Model

#artificialintelligence

In recent years, the field of artificial intelligence has made significant advancements, and one of the most impressive examples of this progress is the language model known as ChatGPT. In this blog post, we'll take a closer look at what ChatGPT is, how it works, and what makes it stand out from other language models. ChatGPT is a large language model developed by OpenAI. It's a transformer-based neural network that has been trained on a massive amount of text data and can generate human-like responses to text inputs. ChatGPT is part of OpenAI's suite of advanced AI language models and is designed to be an all-purpose conversational AI that can handle a wide range of tasks, including answering questions, translating text, summarizing long documents, and much more.


Roses are red

#artificialintelligence

I had so much fun getting GPT-3 to generate simple one-line Valentine's Day cards last year that this year I decided to see if I could generate cards with more complicated messages. I focused on the classic "roses are red, violets are blue" rhyme, figuring that language models like GPT-3 would have seen lots of examples of this structure during their internet training. Rhyming poetry is notoriously difficult for text-generating algorithms, and I wanted to make it easy. To find out what I should draw, I added to its text with "Illustration is of". And I would create the card according to its instructions.