AITopics | Xu, Chen

Collaborating Authors

Xu, Chen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Post-Training Quantization for Re-parameterization via Coarse & Fine Weight Splitting

Yang, Dawei, He, Ning, Hu, Xing, Yuan, Zhihang, Yu, Jiangyong, Xu, Chen, Jiang, Zhe

arXiv.org Artificial IntelligenceDec-16-2023

Although neural networks have made remarkable advancements in various applications, they require substantial computational and memory resources. Network quantization is a powerful technique to compress neural networks, allowing for more efficient and scalable AI deployments. Recently, Re-parameterization has emerged as a promising technique to enhance model performance while simultaneously alleviating the computational burden in various computer vision tasks. However, the accuracy drops significantly when applying quantization on the re-parameterized networks. We identify that the primary challenge arises from the large variation in weight distribution across the original branches. To address this issue, we propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight, and develop an improved KL metric to determine optimal quantization scales for activation. To the best of our knowledge, our approach is the first work that enables post-training quantization applicable on re-parameterized networks. For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss. The code is in https://github.com/NeonHo/Coarse-Fine-Weight-Split.git

artificial intelligence, machine learning, quantization, (14 more...)

arXiv.org Artificial Intelligence

2312.10588

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > Promising Solution (0.66)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Rethinking and Improving Multi-task Learning for End-to-end Speech Translation

Zhang, Yuhao, Xu, Chen, Li, Bei, Chen, Hao, Xiao, Tong, Zhang, Chunliang, Zhu, Jingbo

arXiv.org Artificial IntelligenceNov-7-2023

Significant improvements in end-to-end speech translation (ST) have been achieved through the application of multi-task learning. However, the extent to which auxiliary tasks are highly consistent with the ST task, and how much this approach truly helps, have not been thoroughly studied. In this paper, we investigate the consistency between different tasks, considering different times and modules. We find that the textual encoder primarily facilitates cross-modal conversion, but the presence of noise in speech impedes the consistency between text and speech representations. Furthermore, we propose an improved multi-task learning (IMTL) approach for the ST task, which bridges the modal gap by mitigating the difference in length and representation. We conduct experiments on the MuST-C dataset. The results demonstrate that our method attains state-of-the-art results. Moreover, when additional data is used, we achieve the new SOTA result on MuST-C English to Spanish task with 20.8% of the training time required by the current SOTA method.

consistency, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.0381

Country:

Europe (0.68)
Asia > China > Liaoning Province (0.14)
North America > United States > Minnesota (0.14)
(2 more...)

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Flow-based distributionally robust optimization

Xu, Chen, Lee, Jonghyeok, Cheng, Xiuyuan, Xie, Yao

arXiv.org Machine LearningNov-6-2023

We present a computationally efficient framework, called FlowDRO, for solving flow-based distributionally robust optimization (DRO) problems with Wasserstein uncertainty sets while aiming to find continuous worst-case distribution (also called the Least Favorable Distribution, LFD). The requirement for LFD to be continuous is so that the algorithm can be scalable to problems with larger sample sizes and achieve better generalization capability for the induced robust algorithms. To tackle the computationally challenging infinitely dimensional optimization problem, we leverage flow-based models and continuous-time invertible transport maps between the data distribution and the target distribution. We also develop a Wasserstein proximal gradient flow type of algorithm. In theory, we establish the equivalence of the solution by optimal transport map to the original formulation, as well as the dual form of the problem through Wasserstein calculus and Brenier theorem. In practice, we parameterize the transport maps by a sequence of neural networks progressively trained in blocks by gradient descent. Our computational framework is general, can handle high-dimensional data with large sample sizes, and can be useful for various applications. We demonstrate its usage in adversarial learning, distributionally robust hypothesis testing, and a new mechanism for data-driven distribution perturbation differential privacy, where the proposed method gives strong empirical performance on real high-dimensional data.

artificial intelligence, flowdro, machine learning, (18 more...)

arXiv.org Machine Learning

2310.19253

Country:

Europe (0.67)
North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Normalizing flow neural networks by JKO scheme

Xu, Chen, Cheng, Xiuyuan, Xie, Yao

arXiv.org Machine LearningOct-28-2023

Normalizing flow is a class of deep generative models for efficient sampling and likelihood estimation, which achieves attractive performance, particularly in high dimensions. The flow is often implemented using a sequence of invertible residual blocks. Existing works adopt special network architectures and regularization of flow trajectories. In this paper, we develop a neural ODE flow network called JKO-iFlow, inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which unfolds the discrete-time dynamic of the Wasserstein gradient flow. The proposed method stacks residual blocks one after another, allowing efficient block-wise training of the residual blocks, avoiding sampling SDE trajectories and score matching or variational learning, thus reducing the memory load and difficulty in end-to-end training. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the induced trajectory in probability space to improve the model accuracy further. Experiments with synthetic and real data show that the proposed JKO-iFlow network achieves competitive performance compared with existing flow and diffusion models at a significantly reduced computational and memory cost.

artificial intelligence, machine learning, residual block, (17 more...)

arXiv.org Machine Learning

2212.14424

Country:

North America > United States > California (0.28)
Asia > Middle East > Jordan (0.24)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry:

Materials > Chemicals > Industrial Gases > Liquified Gas (0.45)
Materials > Chemicals > Commodity Chemicals > Petrochemicals > LNG (0.45)
Energy > Oil & Gas > Midstream (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Computing high-dimensional optimal transport by flow neural networks

Xu, Chen, Cheng, Xiuyuan, Xie, Yao

arXiv.org Machine LearningOct-4-2023

The problem of finding a transport map between two general distributions P and Q in high dimension is essential in statistics, optimization, and machine learning. When both distributions are only accessible via finite samples, the transport map needs to be learned from data. In spite of the modeling and computational challenges, this setting has applications in many fields. For example, transfer learning in domain adaption aims to obtain a model on the target domain at a lower cost by making use of an existing pre-trained model on the source domain [Courty et al., 2014, 2017], and this can be achieved by transporting the source domain samples to the target domain using the transport map. The (optimal) transport has also been applied to achieve model fairness [Silvia et al., 2020]. By transporting distributions corresponding to different sensitive attributes to a common distribution, an unfair model is calibrated to match certain desired fairness criteria (e.g., demographic parity [Jiang et al., 2020]). The transport map can also be used to provide intermediate interpolating distributions between P and Q. In density ratio estimation (DRE), this bridging facilitates the so-called "telescopic" DRE [Rhodes et al., 2020] which has been shown to be more accurate when P and Q significantly differ. Furthermore, learning such a transport map between two sets of images can facilitate solving problems in computer vision, such as image restoration and image-to-image translation [Isola et al., 2017].

artificial intelligence, machine learning, optimal transport, (17 more...)

arXiv.org Machine Learning

2305.11857

Country:

Europe > France (0.28)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Bridging the Gaps of Both Modality and Language: Synchronous Bilingual CTC for Speech Translation and Speech Recognition

Xu, Chen, Liu, Xiaoqian, He, Erfeng, Zhang, Yuhao, Dong, Qianqian, Xiao, Tong, Zhu, Jingbo, Man, Dapeng, Yang, Wu

arXiv.org Artificial IntelligenceSep-21-2023

In this study, we present synchronous bilingual Connectionist Temporal Classification (CTC), an innovative framework that leverages dual CTC to bridge the gaps of both modality and language in the speech translation (ST) task. Utilizing transcript and translation as concurrent objectives for CTC, our model bridges the gap between audio and text as well as between source and target languages. Building upon the recent advances in CTC application, we develop an enhanced variant, BiL-CTC+, that establishes new state-of-the-art performances on the MuST-C ST benchmarks under resource-constrained scenarios. Intriguingly, our method also yields significant improvements in speech recognition performance, revealing the effect of cross-lingual learning on transcription and demonstrating its broad applicability. The source code is available at https://github.com/xuchennlp/S2T.

artificial intelligence, speech recognition, translation, (17 more...)

arXiv.org Artificial Intelligence

2309.12234

Country:

Asia > China (0.28)
North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Recent Advances in Direct Speech-to-text Translation

Xu, Chen, Ye, Rong, Dong, Qianqian, Zhao, Chengqi, Ko, Tom, Wang, Mingxuan, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJun-20-2023

Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly. In this paper, we present a comprehensive survey on direct speech translation aiming to summarize the current state-of-the-art techniques. First, we categorize the existing research work into three directions based on the main challenges -- modeling burden, data scarcity, and application issues. To tackle the problem of modeling burden, two main structures have been proposed, encoder-decoder framework (Transformer and the variants) and multitask frameworks. For the challenge of data scarcity, recent work resorts to many sophisticated techniques, such as data augmentation, pre-training, knowledge distillation, and multilingual modeling. We analyze and summarize the application issues, which include real-time, segmentation, named entity, gender bias, and code-switching. Finally, we discuss some promising directions for future work.

machine learning, natural language, translation, (20 more...)

arXiv.org Artificial Intelligence

2306.11646

Country: Asia > China > Liaoning Province (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Modality Adaption or Regularization? A Case Study on End-to-End Speech Translation

Han, Yuchen, Xu, Chen, Xiao, Tong, Zhu, Jingbo

arXiv.org Artificial IntelligenceJun-13-2023

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between pre-training and fine-tuning. However, we observe that this gap occurs in the early stages of fine-tuning, but does not have a major impact on the final performance. On the other hand, we find that there has another gap, which we call the "capacity gap": high resource tasks (such as ASR and MT) always require a large model to fit, when the model is reused for a low resource task (E2E ST), it will get a sub-optimal performance due to the over-fitting. In a case study, we find that the regularization plays a more important role than the well-designed modality adaption method, which achieves 29.0 for en-de and 40.3 for en-fr on the MuST-C dataset. Code and models are available at https://github.com/hannlp/TAB.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.0765

Country:

Europe (1.00)
Asia > China > Liaoning Province (0.14)
Oceania > Australia > Queensland (0.14)
(2 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

PolyVoice: Language Models for Speech to Speech Translation

Dong, Qianqian, Huang, Zhiying, Tian, Qiao, Xu, Chen, Ko, Tom, Zhao, Yunlong, Feng, Siyuan, Li, Tang, Wang, Kexin, Cheng, Xuxin, Yue, Fengpeng, Bai, Ye, Chen, Xi, Lu, Lu, Ma, Zejun, Wang, Yuping, Wang, Mingxuan, Wang, Yuxuan

arXiv.org Artificial IntelligenceJun-13-2023

We propose PolyVoice, a language model-based framework for speech-to-speech translation (S2ST) system. Our framework consists of two language models: a translation language model and a speech synthesis language model. We use discretized speech units, which are generated in a fully unsupervised way, and thus our framework can be used for unwritten languages. For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model. This grants our framework the ability to preserve the voice characteristics and the speaking style of the original speech. We examine our system on Chinese $\rightarrow$ English and English $\rightarrow$ Spanish pairs. Experimental results show that our system can generate speech with high translation quality and audio quality. Speech samples are available at https://speechtranslation.github.io/polyvoice.

artificial intelligence, machine translation, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.02982

Country:

Europe (1.00)
North America > Canada > Quebec (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Progression Cognition Reinforcement Learning with Prioritized Experience for Multi-Vehicle Pursuit

Li, Xinhang, Yang, Yiying, Yuan, Zheng, Wang, Zhe, Wang, Qinwen, Xu, Chen, Li, Lei, He, Jianhua, Zhang, Lin

arXiv.org Artificial IntelligenceJun-8-2023

Multi-vehicle pursuit (MVP) such as autonomous police vehicles pursuing suspects is important but very challenging due to its mission and safety critical nature. While multi-agent reinforcement learning (MARL) algorithms have been proposed for MVP problem in structured grid-pattern roads, the existing algorithms use randomly training samples in centralized learning, which leads to homogeneous agents showing low collaboration performance. For the more challenging problem of pursuing multiple evading vehicles, these algorithms typically select a fixed target evading vehicle for pursuing vehicles without considering dynamic traffic situation, which significantly reduces pursuing success rate. To address the above problems, this paper proposes a Progression Cognition Reinforcement Learning with Prioritized Experience for MVP (PEPCRL-MVP) in urban multi-intersection dynamic traffic scenes. PEPCRL-MVP uses a prioritization network to assess the transitions in the global experience replay buffer according to the parameters of each MARL agent. With the personalized and prioritized experience set selected via the prioritization network, diversity is introduced to the learning process of MARL, which can improve collaboration and task related performance. Furthermore, PEPCRL-MVP employs an attention module to extract critical features from complex urban traffic environments. These features are used to develop progression cognition method to adaptively group pursuing vehicles. Each group efficiently target one evading vehicle in dynamic driving environments. Extensive experiments conducted with a simulator over unstructured roads of an urban area show that PEPCRL-MVP is superior to other state-of-the-art methods. Specifically, PEPCRL-MVP improves pursuing efficiency by 3.95% over TD3-DMAP and its success rate is 34.78% higher than that of MADDPG. Codes are open sourced.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2306.05016

Country: North America > United States (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Transportation > Ground > Road (1.00)
Education (0.93)
Transportation > Infrastructure & Services (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback