AITopics | Zheng, Yinhe

Collaborating Authors

Zheng, Yinhe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Survey on Out-of-Distribution Detection in NLP

Lang, Hao, Zheng, Yinhe, Li, Yixuan, Sun, Jian, Huang, Fei, Li, Yongbin

arXiv.org Artificial IntelligenceDec-27-2023

Out-of-distribution (OOD) detection is essential for the reliable and safe deployment of machine learning systems in the real world. Great progress has been made over the past years. This paper presents the first review of recent advances in OOD detection with a particular focus on natural language processing approaches. First, we provide a formal definition of OOD detection and discuss several related fields. We then categorize recent algorithms into three classes according to the data they used: (1) OOD data available, (2) OOD data unavailable + in-distribution (ID) label available, and (3) OOD data unavailable + ID label unavailable. Third, we introduce datasets, applications, and metrics. Finally, we summarize existing work and present potential future research topics.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.03236

Country: North America > United States > Wisconsin (0.14)

Genre:

Overview (1.00)
Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

PIPPA: A Partially Synthetic Conversational Dataset

Gosling, Tear, Dale, Alpin, Zheng, Yinhe

arXiv.org Artificial IntelligenceAug-10-2023

With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.

artificial intelligence, chatbot, natural language, (15 more...)

arXiv.org Artificial Intelligence

2308.05884

Country: Asia > China (0.14)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Domain Incremental Lifelong Learning in an Open World

Dai, Yi, Lang, Hao, Zheng, Yinhe, Yu, Bowen, Huang, Fei, Li, Yongbin

arXiv.org Artificial IntelligenceMay-11-2023

Lifelong learning (LL) is an important ability for NLP models to learn new tasks continuously. Architecture-based approaches are reported to be effective implementations for LL models. However, it is non-trivial to extend previous approaches to domain incremental LL scenarios since they either require access to task identities in the testing phase or cannot handle samples from unseen tasks. In this paper, we propose \textbf{Diana}: a \underline{d}ynam\underline{i}c \underline{a}rchitecture-based lifelo\underline{n}g le\underline{a}rning model that tries to learn a sequence of tasks with a prompt-enhanced language model. Four types of hierarchically organized prompts are used in Diana to capture knowledge from different granularities. Specifically, we dedicate task-level prompts to capture task-specific knowledge to retain high LL performances and maintain instance-level prompts to learn knowledge shared across input samples to improve the model's generalization performance. Moreover, we dedicate separate prompts to explicitly model unseen tasks and introduce a set of prompt key vectors to facilitate knowledge sharing between tasks. Extensive experiments demonstrate that Diana outperforms state-of-the-art LL models, especially in handling unseen tasks. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/diana}.

machine learning, meta prompt, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.06555

Country:

North America > United States (1.00)
Europe (1.00)

Genre:

Research Report > Experimental Study (0.47)
Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Continuing Education (0.61)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Long-Tailed Question Answering in an Open World

Dai, Yi, Lang, Hao, Zheng, Yinhe, Huang, Fei, Li, Yongbin

arXiv.org Artificial IntelligenceMay-11-2023

Real-world data often have an open long-tailed distribution, and building a unified QA model supporting various tasks is vital for practical QA applications. However, it is non-trivial to extend previous QA approaches since they either require access to seen tasks of adequate samples or do not explicitly model samples from unseen tasks. In this paper, we define Open Long-Tailed QA (OLTQA) as learning from long-tailed distributed data and optimizing performance over seen and unseen QA tasks. We propose an OLTQA model that encourages knowledge sharing between head, tail and unseen tasks, and explicitly mines knowledge from a large pre-trained language model (LM). Specifically, we organize our model through a pool of fine-grained components and dynamically combine these components for an input to facilitate knowledge sharing. A retrieve-then-rerank frame is further introduced to select in-context examples, which guild the LM to generate text that express knowledge for QA tasks. Moreover, a two-stage training approach is introduced to pre-train the framework by knowledge distillation (KD) from the LM and then jointly train the frame and a QA model through an adaptive mutual KD method. On a large-scale OLTQA dataset we curate from 43 existing QA datasets, our model consistently outperforms the state-of-the-art. We release the code and data at \url{https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/oltqa}.

computational linguistic, machine learning, question answering, (18 more...)

arXiv.org Artificial Intelligence

2305.06557

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.29)

Genre: Research Report (0.82)

Industry:

Education (0.47)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.43)

Add feedback

Out-of-Domain Intent Detection Considering Multi-turn Dialogue Contexts

Lang, Hao, Zheng, Yinhe, Hui, Binyuan, Huang, Fei, Li, Yongbin

arXiv.org Artificial IntelligenceMay-4-2023

Out-of-Domain (OOD) intent detection is vital for practical dialogue systems, and it usually requires considering multi-turn dialogue contexts. However, most previous OOD intent detection approaches are limited to single dialogue turns. In this paper, we introduce a context-aware OOD intent detection (Caro) framework to model multi-turn contexts in OOD intent detection tasks. Specifically, we follow the information bottleneck principle to extract robust representations from multi-turn dialogue contexts. Two different views are constructed for each input sample and the superfluous information not related to intent detection is removed using a multi-view information bottleneck loss. Moreover, we also explore utilizing unlabeled data in Caro. A two-stage training process is introduced to mine OOD samples from these unlabeled data, and these OOD samples are used to train the resulting model with a bootstrapping approach. Comprehensive experiments demonstrate that Caro establishes state-of-the-art performances on multi-turn OOD detection tasks by improving the F1-OOD score of over $29\%$ compared to the previous best method.

artificial intelligence, detection, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.03237

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.56)

Add feedback

Empathetic Response Generation via Emotion Cause Transition Graph

Qian, Yushan, Wang, Bo, Lin, Ting-En, Zheng, Yinhe, Zhu, Ying, Zhao, Dongming, Hou, Yuexian, Wu, Yuchuan, Li, Yongbin

arXiv.org Artificial IntelligenceFeb-23-2023

Empathetic dialogue is a human-like behavior that requires the perception of both affective factors (e.g., emotion status) and cognitive factors (e.g., cause of the emotion). Besides concerning emotion status in early work, the latest approaches study emotion causes in empathetic dialogue. These approaches focus on understanding and duplicating emotion causes in the context to show empathy for the speaker. However, instead of only repeating the contextual causes, the real empathic response often demonstrate a logical and emotion-centered transition from the causes in the context to those in the responses. In this work, we propose an emotion cause transition graph to explicitly model the natural transition of emotion causes between two adjacent turns in empathetic dialogue. With this graph, the concept words of the emotion causes in the next turn can be predicted and used by a specifically designed concept-aware decoder to generate the empathic response. Automatic and human experimental results on the benchmark dataset demonstrate that our method produces more empathetic, coherent, informative, and specific responses than existing models.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2302.11787

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.46)

Add feedback

Prompt Conditioned VAE: Enhancing Generative Replay for Lifelong Learning in Task-Oriented Dialogue

Zhao, Yingxiu, Zheng, Yinhe, Tian, Zhiliang, Gao, Chang, Yu, Bowen, Yu, Haiyang, Li, Yongbin, Sun, Jian, Zhang, Nevin L.

arXiv.org Artificial IntelligenceNov-24-2022

Lifelong learning (LL) is vital for advanced task-oriented dialogue (ToD) systems. To address the catastrophic forgetting issue of LL, generative replay methods are widely employed to consolidate past knowledge with generated pseudo samples. However, most existing generative replay methods use only a single task-specific token to control their models. This scheme is usually not strong enough to constrain the generative model due to insufficient information involved. In this paper, we propose a novel method, prompt conditioned VAE for lifelong learning (PCLL), to enhance generative replay by incorporating tasks' statistics. PCLL captures task-specific distributions with a conditional variational autoencoder, conditioned on natural language prompts to guide the pseudo-sample generation. Moreover, it leverages a distillation process to further consolidate past knowledge by alleviating the noise in pseudo samples. Experiments on natural language understanding tasks of ToD systems demonstrate that PCLL significantly outperforms competitive baselines in building LL models.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.07783

Country:

North America > United States > California (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Educational Setting > Continuing Education (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Semi-Supervised Lifelong Language Learning

Zhao, Yingxiu, Zheng, Yinhe, Yu, Bowen, Tian, Zhiliang, Lee, Dongkyu, Sun, Jian, Yu, Haiyang, Li, Yongbin, Zhang, Nevin L.

arXiv.org Artificial IntelligenceNov-23-2022

Lifelong learning aims to accumulate knowledge and alleviate catastrophic forgetting when learning tasks sequentially. However, existing lifelong language learning methods only focus on the supervised learning setting. Unlabeled data, which can be easily accessed in real-world scenarios, are underexplored. In this paper, we explore a novel setting, semi-supervised lifelong language learning (SSLL), where a model learns sequentially arriving language tasks with both labeled and unlabeled data. We propose an unlabeled data enhanced lifelong learner to explore SSLL. Specially, we dedicate task-specific modules to alleviate catastrophic forgetting and design two modules to exploit unlabeled data: (1) a virtual supervision enhanced task solver is constructed on a teacher-student framework to mine the underlying knowledge from unlabeled data; and (2) a backward augmented learner is built to encourage knowledge transfer from newly arrived unlabeled data to previous tasks. Experimental results on various language tasks demonstrate our model's effectiveness and superiority over competitive baselines under the new setting SSLL.

artificial intelligence, machine learning, unlabeled data, (13 more...)

arXiv.org Artificial Intelligence

2211.1305

Country: Europe (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Education > Curriculum > Subject-Specific Education (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)

Add feedback

Improving Meta-learning for Low-resource Text Classification and Generation via Memory Imitation

Zhao, Yingxiu, Tian, Zhiliang, Yao, Huaxiu, Zheng, Yinhe, Lee, Dongkyu, Song, Yiping, Sun, Jian, Zhang, Nevin L.

arXiv.org Artificial IntelligenceJul-14-2022

Building models of natural language processing (NLP) is challenging in low-resource scenarios where only limited data are available. Optimization-based meta-learning algorithms achieve promising results in low-resource scenarios by adapting a well-generalized model initialization to handle new tasks. Nonetheless, these approaches suffer from the memorization overfitting issue, where the model tends to memorize the meta-training tasks while ignoring support sets when adapting to new tasks. To address this issue, we propose a memory imitation meta-learning (MemIML) method that enhances the model's reliance on support sets for task adaptation. Specifically, we introduce a task-specific memory module to store support set information and construct an imitation module to force query sets to imitate the behaviors of some representative support-set samples stored in the memory. A theoretical analysis is provided to prove the effectiveness of our method, and empirical results also demonstrate that our method outperforms competitive baselines on both text classification and generation tasks.

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2203.1167

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Transferable Persona-Grounded Dialogues via Grounded Minimal Edits

Wu, Chen Henry, Zheng, Yinhe, Mao, Xiaoxi, Huang, Minlie

arXiv.org Artificial IntelligenceSep-16-2021

Grounded dialogue models generate responses that are grounded on certain concepts. Limited by the distribution of grounded dialogue data, models trained on such data face the transferability challenges in terms of the data distribution and the type of grounded concepts. To address the challenges, we propose the grounded minimal editing framework, which minimally edits existing responses to be grounded on the given concept. Focusing on personas, we propose Grounded Minimal Editor (GME), which learns to edit by disentangling and recombining persona-related and persona-agnostic parts of the response. To evaluate persona-grounded minimal editing, we present the PersonaMinEdit dataset, and experimental results show that GME outperforms competitive baselines by a large margin. To evaluate the transferability, we experiment on the test set of BlendedSkillTalk and show that GME can edit dialogue models' responses to largely improve their persona consistency while preserving the use of knowledge and empathy.

artificial intelligence, neural network, proceedings, (21 more...)

arXiv.org Artificial Intelligence

2109.07713

Country:

Europe (1.00)
Asia (0.69)
North America > United States > Pennsylvania (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback