AITopics | Liu, Xiaoqian

Collaborating Authors

Liu, Xiaoqian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

Luo, Yingfeng, Zheng, Tong, Mu, Yongyu, Li, Bei, Zhang, Qinghong, Gao, Yongqi, Xu, Ziqiang, Feng, Peinan, Liu, Xiaoqian, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceMar-9-2025

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems using a single pre-trained Transformer decoder, while encoder-decoder architectures, which were the standard in earlier NMT models, have received relatively less attention. In this paper, we explore translation models that are universal, efficient, and easy to optimize, by marrying the world of LLMs with the world of NMT. We apply LLMs to NMT encoding and leave the NMT decoder unchanged. We also develop methods for adapting LLMs to work better with the NMT decoder. Furthermore, we construct a new dataset involving multiple tasks to assess how well the machine translation system generalizes across various tasks. Evaluations on the WMT and our datasets show that results using our method match or surpass a range of baselines in terms of translation quality, but achieve $2.4 \sim 6.5 \times$ inference speedups and a $75\%$ reduction in the memory footprint of the KV cache. It also demonstrates strong generalization across a variety of translation-related tasks.

large language model, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2503.06594

Country:

Europe (1.00)
North America (0.67)
Asia > China (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Liu, Xiaoqian, Wang, Ke, Li, Yongbin, Wu, Yuchuan, Ma, Wentao, Kong, Aobo, Huang, Fei, Jiao, Jianbin, Zhang, Junge

arXiv.org Artificial IntelligenceFeb-17-2025

Large Language Models (LLMs) have shown impressive reasoning capabilities in well-defined problems with clear solutions, such as mathematics and coding. However, they still struggle with complex real-world scenarios like business negotiations, which require strategic reasoning-an ability to navigate dynamic environments and align long-term goals amidst uncertainty. Existing methods for strategic reasoning face challenges in adaptability, scalability, and transferring strategies to new contexts. To address these issues, we propose explicit policy optimization (EPO) for strategic reasoning, featuring an LLM that provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. To improve adaptability and policy transferability, we train the strategic reasoning model via multi-turn reinforcement learning (RL) using process rewards and iterative self-play, without supervised fine-tuning (SFT) as a preliminary step. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment through enhanced strategic reasoning, achieving state-of-the-art performance on social dialogue and web navigation tasks. Our findings reveal various collaborative reasoning mechanisms emergent in EPO and its effectiveness in generating novel strategies, underscoring its potential for strategic reasoning in real-world applications.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.12486

Country: North America > Mexico (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

SDPO: Segment-Level Direct Preference Optimization for Social Agents

Kong, Aobo, Ma, Wentao, Zhao, Shiwan, Li, Yongbin, Wu, Yuchuan, Wang, Ke, Liu, Xiaoqian, Li, Qicheng, Qin, Yong, Huang, Fei

arXiv.org Artificial IntelligenceJan-3-2025

Social agents powered by large language models (LLMs) can simulate human social behaviors but fall short in handling complex goal-oriented social dialogues. Direct Preference Optimization (DPO) has proven effective in aligning LLM behavior with human preferences across a variety of agent tasks. Existing DPO-based approaches for multi-turn interactions are divided into turn-level and session-level methods. The turn-level method is overly fine-grained, focusing exclusively on individual turns, while session-level methods are too coarse-grained, often introducing training noise. To address these limitations, we propose Segment-Level Direct Preference Optimization (SDPO), which focuses on specific key segments within interactions to optimize multi-turn agent behavior while minimizing training noise. Evaluations on the SOTOPIA benchmark demonstrate that SDPO-tuned agents consistently outperform both existing DPO-based methods and proprietary LLMs like GPT-4o, underscoring SDPO's potential to advance the social intelligence of LLM-based agents. We release our code and data at https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/SDPO.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.01821

Country:

Asia > Thailand (0.15)
Asia > China (0.14)
North America > United States (0.14)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Modular-based Strategy for Mitigating Gradient Conflicts in Simultaneous Speech Translation

Liu, Xiaoqian, Du, Yangfan, Wang, Jianjin, Ge, Yuan, Xu, Chen, Xiao, Tong, Chen, Guocheng, Zhu, Jingbo

arXiv.org Artificial IntelligenceDec-30-2024

Simultaneous Speech Translation (SimulST) involves generating target language text while continuously processing streaming speech input, presenting significant real-time challenges. Multi-task learning is often employed to enhance SimulST performance but introduces optimization conflicts between primary and auxiliary tasks, potentially compromising overall efficiency. The existing model-level conflict resolution methods are not well-suited for this task which exacerbates inefficiencies and leads to high GPU memory consumption. To address these challenges, we propose a Modular Gradient Conflict Mitigation (MGCM) strategy that detects conflicts at a finer-grained modular level and resolves them utilizing gradient projection. Experimental results demonstrate that MGCM significantly improves SimulST performance, particularly under medium and high latency conditions, achieving a 0.68 BLEU score gain in offline tasks. Additionally, MGCM reduces GPU memory consumption by over 95\% compared to other conflict mitigation methods, establishing it as a robust solution for SimulST tasks.

artificial intelligence, mitigating gradient conflict, simultaneous speech translation, (1 more...)

arXiv.org Artificial Intelligence

2409.15911

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Add feedback

Revisiting Interpolation Augmentation for Speech-to-Text Generation

Xu, Chen, Wang, Jie, Liu, Xiaoqian, Dong, Qianqian, Zhang, Chunliang, Xiao, Tong, Zhu, Jingbo, Man, Dapeng, Yang, Wu

arXiv.org Artificial IntelligenceJun-22-2024

Speech-to-text (S2T) generation systems frequently face challenges in low-resource scenarios, primarily due to the lack of extensive labeled datasets. One emerging solution is constructing virtual training samples by interpolating inputs and labels, which has notably enhanced system generalization in other domains. Despite its potential, this technique's application in S2T tasks has remained under-explored. In this paper, we delve into the utility of interpolation augmentation, guided by several pivotal questions. Our findings reveal that employing an appropriate strategy in interpolation augmentation significantly enhances performance across diverse tasks, architectures, and data scales, offering a promising avenue for more robust S2T systems in resource-constrained settings.

machine learning, natural language, specaugment, (17 more...)

arXiv.org Artificial Intelligence

2406.15846

Country:

Asia (0.94)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Queensland (0.14)
North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting > Online (0.54)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Recent Advances in End-to-End Simultaneous Speech Translation

Liu, Xiaoqian, Hu, Guoqiang, Du, Yangfan, He, Erfeng, Luo, YingFeng, Xu, Chen, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJun-1-2024

Simultaneous speech translation (SimulST) is a demanding task that involves generating translations in real-time while continuously processing speech input. This paper offers a comprehensive overview of the recent developments in SimulST research, focusing on four major challenges. Firstly, the complexities associated with processing lengthy and continuous speech streams pose significant hurdles. Secondly, satisfying real-time requirements presents inherent difficulties due to the need for immediate translation output. Thirdly, striking a balance between translation quality and latency constraints remains a critical challenge. Finally, the scarcity of annotated data adds another layer of complexity to the task. Through our exploration of these challenges and the proposed solutions, we aim to provide valuable insights into the current landscape of SimulST research and suggest promising directions for future exploration.

artificial intelligence, natural language, translation, (20 more...)

arXiv.org Artificial Intelligence

2406.00497

Country: Asia > China > Liaoning Province (0.14)

Genre:

Overview (0.48)
Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Position: Foundation Agents as the Paradigm Shift for Decision Making

Liu, Xiaoqian, Lou, Xingzhou, Jiao, Jianbin, Zhang, Junge

arXiv.org Artificial IntelligenceMay-29-2024

Decision making demands intricate interplay between perception, memory, and reasoning to discern optimal policies. Conventional approaches to decision making face challenges related to low sample efficiency and poor generalization. In contrast, foundation models in language and vision have showcased rapid adaptation to diverse new tasks. Therefore, we advocate for the construction of foundation agents as a transformative shift in the learning paradigm of agents. This proposal is underpinned by the formulation of foundation agents with their fundamental characteristics and challenges motivated by the success of large language models (LLMs). Moreover, we specify the roadmap of foundation agents from large interactive data collection or generation, to self-supervised pretraining and adaptation, and knowledge and value alignment with LLMs. Lastly, we pinpoint critical research questions derived from the formulation and delineate trends for foundation agents supported by real-world use cases, addressing both technical and theoretical aspects to propel the field towards a more comprehensive and impactful future.

large language model, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2405.17009

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > Experimental Study (0.34)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.93)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Self-supervised Pretraining for Decision Foundation Model: Formulation, Pipeline and Challenges

Liu, Xiaoqian, Jiao, Jianbin, Zhang, Junge

arXiv.org Artificial IntelligenceJan-5-2024

Decision-making is a dynamic process requiring Self-supervised pretraining has enabled large sequence perception, memory, and reasoning to make models to realize few-shot or even zero-shot adaptation in choices and find optimal policies. Traditional natural language processing (NLP) [OpenAI, 2023] and computer approaches to decision-making suffer from sample vision (CV) tasks [Bai et al., 2023]. Through pretraining efficiency and generalization, while largescale on large generic corpora or visual data (images and self-supervised pretraining has enabled fast videos), knowledge about the world and human society is adaptation with fine-tuning or few-shot learning learned which can be utilized in various downstream task in language and vision. We thus argue to integrate learning with few samples so as to improve sample efficiency knowledge acquired from generic largescale and generalization.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2401.00031

Country: North America > United States (0.14)

Genre:

Research Report (0.50)
Overview (0.46)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Benchmarking Continual Learning from Cognitive Perspectives

Liu, Xiaoqian, Zhang, Junge, Zhang, Mingyi, Yang, Peipei

arXiv.org Artificial IntelligenceDec-6-2023

Continual learning addresses the problem of continuously acquiring and transferring knowledge without catastrophic forgetting of old concepts. While humans achieve continual learning via diverse neurocognitive mechanisms, there is a mismatch between cognitive properties and evaluation methods of continual learning models. First, the measurement of continual learning models mostly relies on evaluation metrics at a micro-level, which cannot characterize cognitive capacities of the model. Second, the measurement is method-specific, emphasizing model strengths in one aspect while obscuring potential weaknesses in other respects. To address these issues, we propose to integrate model cognitive capacities and evaluation metrics into a unified evaluation paradigm. We first characterize model capacities via desiderata derived from cognitive properties supporting human continual learning. The desiderata concern (1) adaptability in varying lengths of task sequence; (2) sensitivity to dynamic task variations; and (3) efficiency in memory usage and training time consumption. Then we design evaluation protocols for each desideratum to assess cognitive capacities of recent continual learning models. Experimental results show that no method we consider has satisfied all the desiderata and is still far away from realizing truly continual learning. Although some methods exhibit some degree of adaptability and efficiency, no method is able to identify task relationships when encountering dynamic task variations, or achieve a trade-off in learning similarities and differences between tasks. Inspired by these results, we discuss possible factors that influence model performance in these desiderata and provide guidance for the improvement of continual learning models.

continual learning, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2312.03309

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Education (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Cognitive Architectures (0.74)

Add feedback

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

Xu, Chen, Liu, Xiaoqian, He, Erfeng, Zhang, Yuhao, Dong, Qianqian, Xiao, Tong, Zhu, Jingbo, Man, Dapeng, Yang, Wu

arXiv.org Artificial IntelligenceSep-21-2023

In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Building upon the recent advances in CTC application, we develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art performances on the MuST-C ST benchmarks under resource-constrained scenarios. Intriguingly, our method also yields significant improvements in speech recognition performance, revealing the effect of cross-lingual learning on transcription and demonstrating its broad applicability. The source code is available at https://github.com/xuchennlp/S2T.

artificial intelligence, speech recognition, translation, (17 more...)

arXiv.org Artificial Intelligence

2309.12234

Country:

Asia > China (0.28)
North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback