AITopics | Yao, Junfeng

Collaborating Authors

Yao, Junfeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation

Wu, Suhang, Tang, Jialong, Yang, Baosong, Wang, Ante, Jia, Kaidi, Yu, Jiawei, Yao, Junfeng, Su, Jinsong

arXiv.org Artificial IntelligenceOct-29-2024

RALMs (Retrieval-Augmented Language Models) broaden their knowledge scope by incorporating external textual resources. However, the multilingual nature of global knowledge necessitates RALMs to handle diverse languages, a topic that has received limited research focus. In this work, we propose \textit{Futurepedia}, a carefully crafted benchmark containing parallel texts across eight representative languages. We evaluate six multilingual RALMs using our benchmark to explore the challenges of multilingual RALMs. Experimental results reveal linguistic inequalities: 1) high-resource languages stand out in Monolingual Knowledge Extraction; 2) Indo-European languages lead RALMs to provide answers directly from documents, alleviating the challenge of expressing answers across languages; 3) English benefits from RALMs' selection bias and speaks louder in multilingual knowledge selection. Based on these findings, we offer advice for improving multilingual Retrieval Augmented Generation. For monolingual knowledge extraction, careful attention must be paid to cascading errors from translating low-resource languages into high-resource ones. In cross-lingual knowledge transfer, encouraging RALMs to provide answers within documents in different languages can improve transfer performance. For multilingual knowledge selection, incorporating more non-English documents and repositioning English documents can help mitigate RALMs' selection bias. Through comprehensive experiments, we underscore the complexities inherent in multilingual RALMs and offer valuable insights for future research.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.2197

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

A Learning Rate Path Switching Training Paradigm for Version Updates of Large Language Models

Wang, Zhihao, Liu, Shiyu, Huang, Jianheng, Wang, Zheng, Liao, Yixuan, Chen, Xiaoxin, Yao, Junfeng, Su, Jinsong

arXiv.org Artificial IntelligenceOct-5-2024

Due to the continuous emergence of new data, version updates have become an indispensable requirement for Large Language Models (LLMs). The training paradigms for version updates of LLMs include pre-training from scratch (PTFS) and continual pre-training (CPT). Preliminary experiments demonstrate that PTFS achieves better pre-training performance, while CPT has lower training cost. Moreover, their performance and training cost gaps widen progressively with version updates. To investigate the underlying reasons for this phenomenon, we analyze the effect of learning rate adjustments during the two stages of CPT: preparing an initialization checkpoint and continual pre-training based on this checkpoint. We find that a large learning rate in the first stage and a complete learning rate decay process in the second stage are crucial for version updates of LLMs. Hence, we propose a learning rate path switching training paradigm. Our paradigm comprises one main path, where we pre-train a LLM with the maximal learning rate, and multiple branching paths, each of which corresponds to an update of the LLM with newly-added training data. Extensive experiments demonstrate the effectiveness and generalization of our paradigm. Particularly, when training four versions of LLMs, our paradigm reduces the total training cost to 58% compared to PTFS, while maintaining comparable pre-training performance.

large language model, machine learning, paradigm, (19 more...)

arXiv.org Artificial Intelligence

2410.04103

Country: Asia > China (0.29)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Mitigating the Negative Impact of Over-association for Conversational Query Production

Wang, Ante, Song, Linfeng, Min, Zijun, Xu, Ge, Wang, Xiaoli, Yao, Junfeng, Su, Jinsong

arXiv.org Artificial IntelligenceSep-29-2024

Conversational query generation aims at producing search queries from dialogue histories, which are then used to retrieve relevant knowledge from a search engine to help knowledge-based dialogue systems. Trained to maximize the likelihood of gold queries, previous models suffer from the data hunger issue, and they tend to both drop important concepts from dialogue histories and generate irrelevant concepts at inference time. We attribute these issues to the over-association phenomenon where a large number of gold queries are indirectly related to the dialogue topics, because annotators may unconsciously perform reasoning with their background knowledge when generating these gold queries. We carefully analyze the negative effects of this phenomenon on pretrained Seq2seq query producers and then propose effective instance-level weighting strategies for training to mitigate these issues from multiple perspectives. Experiments on two benchmarks, Wizard-of-Internet and DuSinc, show that our strategies effectively alleviate the negative effects and lead to significant performance gains (2%-5% across automatic metrics and human evaluation). Further analysis shows that our model selects better concepts from dialogue histories and is 10 times more data efficient than the baseline. The code is available at https://github.com/DeepLearnXMU/QG-OverAsso.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2409.19572

Country: Asia > China > Fujian Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

Huang, Jianheng, Cui, Leyang, Wang, Ante, Yang, Chengyi, Liao, Xinting, Song, Linfeng, Yao, Junfeng, Su, Jinsong

arXiv.org Artificial IntelligenceMay-25-2024

Large language models (LLMs) suffer from catastrophic forgetting during continual learning. Conventional rehearsal-based methods rely on previous training data to retain the model's ability, which may not be feasible in real-world applications. When conducting continual learning based on a publicly-released LLM checkpoint, the availability of the original training data may be non-existent. To address this challenge, we propose a framework called Self-Synthesized Rehearsal (SSR) that uses the LLM to generate synthetic instances for rehearsal. Concretely, we first employ the base LLM for in-context learning to generate synthetic instances. Subsequently, we utilize the latest LLM to refine the instance outputs based on the synthetic inputs, preserving its acquired ability. Finally, we select diverse high-quality synthetic instances for rehearsal in future stages. Experimental results demonstrate that SSR achieves superior or comparable performance compared to conventional rehearsal-based approaches while being more data-efficient. Besides, SSR effectively preserves the generalization capabilities of LLMs in general domains.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2403.01244

Country:

Asia > China (0.46)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

On the Information Redundancy in Non-Autoregressive Translation

Wang, Zhihao, Wang, Longyue, Su, Jinsong, Yao, Junfeng, Tu, Zhaopeng

arXiv.org Artificial IntelligenceMay-4-2024

Token repetition is a typical form of multi-modal problem in fully non-autoregressive translation (NAT). In this work, we revisit the multi-modal problem in recently proposed NAT models. Our study reveals that these advanced models have introduced other types of information redundancy errors, which cannot be measured by the conventional metric - the continuous repetition ratio. By manually annotating the NAT outputs, we identify two types of information redundancy errors that correspond well to lexical and reordering multi-modality problems. Since human annotation is time-consuming and labor-intensive, we propose automatic metrics to evaluate the two types of redundant errors. Our metrics allow future studies to evaluate new methods and gain a more comprehensive understanding of their effectiveness.

artificial intelligence, machine translation, natural language, (15 more...)

arXiv.org Artificial Intelligence

2405.02673

Country:

North America > United States (0.46)
Asia (0.29)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.68)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.48)

Add feedback

TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation

Wang, Yaoxiang, Wu, Zhiyong, Yao, Junfeng, Su, Jinsong

arXiv.org Artificial IntelligenceFeb-15-2024

The emergence of Large Language Models (LLMs) like ChatGPT has inspired the development of LLM-based agents capable of addressing complex, real-world tasks. However, these agents often struggle during task execution due to methodological constraints, such as error propagation and limited adaptability. To address this issue, we propose a multi-agent framework based on dynamic Task Decomposition and Agent Generation (TDAG). This framework dynamically decomposes complex tasks into smaller subtasks and assigns each to a specifically generated subagent, thereby enhancing adaptability in diverse and unpredictable real-world tasks. Simultaneously, existing benchmarks often lack the granularity needed to evaluate incremental progress in complex, multi-step tasks. In response, we introduce ItineraryBench in the context of travel planning, featuring interconnected, progressively complex tasks with a fine-grained evaluation system. ItineraryBench is designed to assess agents' abilities in memory, planning, and tool usage across tasks of varying complexity. Our experimental results reveal that TDAG significantly outperforms established baselines, showcasing its superior adaptability and context awareness in complex task scenarios.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2402.10178

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.46)

Industry: Consumer Products & Services > Travel (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Revisiting Non-Autoregressive Translation at Scale

Wang, Zhihao, Wang, Longyue, Su, Jinsong, Yao, Junfeng, Tu, Zhaopeng

arXiv.org Artificial IntelligenceJun-2-2023

In real-world systems, scaling has been critical for improving the translation quality in autoregressive translation (AT), which however has not been well studied for non-autoregressive translation (NAT). In this work, we bridge the gap by systematically studying the impact of scaling on NAT behaviors. Extensive experiments on six WMT benchmarks over two advanced NAT models show that scaling can alleviate the commonly-cited weaknesses of NAT models, resulting in better translation performance. To reduce the side-effect of scaling on decoding speed, we empirically investigate the impact of NAT encoder and decoder on the translation performance. Experimental results on the large-scale WMT20 En-De show that the asymmetric architecture (e.g. bigger encoder and smaller decoder) can achieve comparable performance with the scaling model, while maintaining the superiority of decoding speed with standard NAT models. To this end, we establish a new benchmark by validating scaled NAT models on the scaled dataset, which can be regarded as a strong baseline for future works. We release code and system outputs at https://github.com/DeepLearnXMU/Scaling4NAT.

machine learning, nat model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2305.16155

Country: Asia > China > Fujian Province (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AAN+: Generalized Average Attention Network for Accelerating Neural Transformer

Zhang, Biao (a:1:{s:5:"en_US";s:23:"University of Edinburgh";}) | Xiong, Deyi | Ge, Yubin | Yao, Junfeng | Yue, Hao | Su, Jinsong

Journal of Artificial Intelligence ResearchNov-6-2022

Transformer benefits from the high parallelization of attention networks in fast training, but it still suffers from slow decoding partially due to the linear dependency O(m) of the decoder self-attention on previous target words at inference. In this paper, we propose a generalized average attention network (AAN+) aiming at speeding up decoding by reducing the dependency from O(m) to O(1). We find that the learned self-attention weights in the decoder follow some patterns which can be approximated via a dynamic structure. Based on this insight, we develop AAN+, extending our previously proposed average attention (Zhang et al., 2018a, AAN) to support more general position- and content-based attention patterns. AAN+ only requires to maintain a small constant number of hidden states during decoding, ensuring its O(1) dependency. We apply AAN+ as a drop-in replacement of the decoder selfattention and conduct experiments on machine translation (with diverse language pairs), table-to-text generation and document summarization. With masking tricks and dynamic programming, AAN+ enables Transformer to decode sentences around 20% faster without largely compromising in the training speed and the generation performance. Our results further reveal the importance of the localness (neighboring words) in AAN+ and its capability in modeling long-range dependency.

machine learning, natural language, translation, (23 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.1.13896

AI Access Foundation

13896

Journal of Artificial Intelligence Research

Country:

Europe (1.00)
Asia > China (0.94)
North America > United States > Pennsylvania (0.28)

Genre: Research Report > New Finding (0.88)

Industry:

Leisure & Entertainment (0.67)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Improving Tree-Structured Decoder Training for Code Generation via Mutual Learning

Xie, Binbin, Su, Jinsong, Ge, Yubin, Li, Xiang, Cui, Jianwei, Yao, Junfeng, Wang, Bin

arXiv.org Artificial IntelligenceMay-31-2021

Code generation aims to automatically generate a piece of code given an input natural language utterance. Currently, among dominant models, it is treated as a sequence-to-tree task, where a decoder outputs a sequence of actions corresponding to the pre-order traversal of an Abstract Syntax Tree. However, such a decoder only exploits the preorder traversal based preceding actions, which are insufficient to ensure correct action predictions. In this paper, we first throughly analyze the context modeling difference between neural code generation models with different traversals based decodings (preorder traversal vs breadth-first traversal), and then propose to introduce a mutual learning framework to jointly train these models. Under this framework, we continuously enhance both two models via mutual distillation, which involves synchronous executions of two one-to-one knowledge transfers at each training step. More specifically, we alternately choose one model as the student and the other as its teacher, and require the student to fit the training data and the action prediction distributions of its teacher. By doing so, both models can fully absorb the knowledge from each other and thus could be improved simultaneously. Experimental results and in-depth analysis on several benchmark datasets demonstrate the effectiveness of our approach. We release our code at https://github.com/DeepLearnXMU/CGML.

deep learning, neural network, traversal, (18 more...)

arXiv.org Artificial Intelligence

2105.14796

Country:

Asia > China > Fujian Province (0.14)
North America > United States > Illinois (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Automatic Programming (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Discriminative Reordering Model Adaptation via Structural Learning

Zhang, Biao (Xiamen University) | Su, Jinsong (Xiamen University) | Xiong, Deyi (Soochow University) | Duan, Hong (Xiamen University) | Yao, Junfeng (Xiamen University)

AAAI ConferencesJul-15-2015

Reordering model adaptation remains a big challenge in statistical machine translation because reordering patterns of translation units often vary dramatically from one domain to another. In this paper, we propose a novel adaptive discriminative reordering model (DRM) based on structural learning, which can capture correspondences among reordering features from two different domains. Exploiting both in-domain and out-of-domain monolingual corpora, our model learns a shared feature representation for cross-domain phrase reordering. Incorporating features of this representation, the DRM trained on out-of-domain corpus generalizes better to in-domain data. Experiment results on the NIST Chinese-English translation task show that our approach significantly outperforms a variety of baselines.

adaptation, inductive learning, machine translation, (18 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: Asia > China (0.47)

Genre: Research Report > Experimental Study (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.69)

Add feedback