AITopics | Chen, Yidong

Collaborating Authors

Chen, Yidong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Representation Purification for End-to-End Speech Translation

Zhang, Chengwei, Zhou, Yue, Zhao, Rui, Chen, Yidong, Shi, Xiaodong

arXiv.org Artificial IntelligenceDec-5-2024

Speech-to-text translation (ST) is a cross-modal task that involves converting spoken language into text in a different language. Previous research primarily focused on enhancing speech translation by facilitating knowledge transfer from machine translation, exploring various methods to bridge the gap between speech and text modalities. Despite substantial progress made, factors in speech that are not relevant to translation content, such as timbre and rhythm, often limit the efficiency of knowledge transfer. In this paper, we conceptualize speech representation as a combination of content-agnostic and content-relevant factors. We examine the impact of content-agnostic factors on translation performance through preliminary experiments and observe a significant performance deterioration when content-agnostic perturbations are introduced to speech signals. To address this issue, we propose a \textbf{S}peech \textbf{R}epresentation \textbf{P}urification with \textbf{S}upervision \textbf{E}nhancement (SRPSE) framework, which excludes the content-agnostic components within speech representations to mitigate their negative impact on ST. Experiments on MuST-C and CoVoST-2 datasets demonstrate that SRPSE significantly improves translation performance across all translation directions in three settings and achieves preeminent performance under a \textit{transcript-free} setting.

artificial intelligence, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2412.04266

Country:

Asia > China (0.14)
North America > Canada (0.14)
Europe > Belgium (0.14)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Conditional Variational Autoencoder for Sign Language Translation with Cross-Modal Alignment

Zhao, Rui, Zhang, Liang, Fu, Biao, Hu, Cong, Su, Jinsong, Chen, Yidong

arXiv.org Artificial IntelligenceDec-25-2023

Sign language translation (SLT) aims to convert continuous sign language videos into textual sentences. As a typical multi-modal task, there exists an inherent modality gap between sign language videos and spoken language text, which makes the cross-modal alignment between visual and textual modalities crucial. However, previous studies tend to rely on an intermediate sign gloss representation to help alleviate the cross-modal problem thereby neglecting the alignment across modalities that may lead to compromised results. To address this issue, we propose a novel framework based on Conditional Variational autoencoder for SLT (CV-SLT) that facilitates direct and sufficient cross-modal alignment between sign language videos and spoken language text. Specifically, our CV-SLT consists of two paths with two Kullback-Leibler (KL) divergences to regularize the outputs of the encoder and decoder, respectively. In the prior path, the model solely relies on visual information to predict the target text; whereas in the posterior path, it simultaneously encodes visual information and textual knowledge to reconstruct the target text. The first KL divergence optimizes the conditional variational autoencoder and regularizes the encoder outputs, while the second KL divergence performs a self-distillation from the posterior path to the prior path, ensuring the consistency of decoder outputs. We further enhance the integration of textual information to the posterior path by employing a shared Attention Residual Gaussian Distribution (ARGD), which considers the textual information in the posterior path as a residual component relative to the prior path. Extensive experiments conducted on public datasets (PHOENIX14T and CSL-daily) demonstrate the effectiveness of our framework, achieving new state-of-the-art results while significantly alleviating the cross-modal representation discrepancy.

machine learning, natural language, translation, (15 more...)

arXiv.org Artificial Intelligence

2312.15645

Country:

Europe (0.68)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Texas (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Layer-wise Representation Fusion for Compositional Generalization

Zheng, Yafang, Lin, Lei, Li, Shuangtao, Yuan, Yuxuan, Lai, Zhaohong, Liu, Shan, Fu, Biao, Chen, Yidong, Shi, Xiaodong

arXiv.org Artificial IntelligenceDec-21-2023

Existing neural models are demonstrated to struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. A key reason for failure on CG is that the syntactic and semantic representations of sequences in both the uppermost layer of the encoder and decoder are entangled. However, previous work concentrates on separating the learning of syntax and semantics instead of exploring the reasons behind the representation entanglement (RE) problem to solve it. We explain why it exists by analyzing the representation evolving mechanism from the bottom to the top of the Transformer layers. We find that the ``shallow'' residual connections within each layer fail to fuse previous layers' information effectively, leading to information forgetting between layers and further the RE problems. Inspired by this, we propose LRF, a novel \textbf{L}ayer-wise \textbf{R}epresentation \textbf{F}usion framework for CG, which learns to fuse previous layers' information back into the encoding and decoding process effectively through introducing a \emph{fuse-attention module} at each encoder and decoder layer. LRF achieves promising results on two realistic benchmarks, empirically demonstrating the effectiveness of our proposal.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2307.10799

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.98)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

MOPRD: A multidisciplinary open peer review dataset

Lin, Jialiang, Song, Jiaxin, Zhou, Zhangping, Chen, Yidong, Shi, Xiaodong

arXiv.org Artificial IntelligenceNov-14-2023

Open peer review is a growing trend in academic publications. Public access to peer review data can benefit both the academic and publishing communities. It also serves as a great support to studies on review comment generation and further to the realization of automated scholarly paper review. However, most of the existing peer review datasets do not provide data that cover the whole peer review process. Apart from this, their data are not diversified enough as the data are mainly collected from the field of computer science. These two drawbacks of the currently available peer review datasets need to be addressed to unlock more opportunities for related studies. In response, we construct MOPRD, a multidisciplinary open peer review dataset. This dataset consists of paper metadata, multiple version manuscripts, review comments, meta-reviews, author's rebuttal letters, and editorial decisions. Moreover, we propose a modular guided review comment generation method based on MOPRD. Experiments show that our method delivers better performance as indicated by both automatic metrics and human evaluation. We also explore other potential applications of MOPRD, including meta-review generation, editorial decision prediction, author rebuttal generation, and scientometric analysis. MOPRD is a strong endorsement for further studies in peer review-related research and other applications.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s00521-023-08891-5

2212.04972

Country:

Asia > China (0.14)
Asia > Taiwan (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Adapting Offline Speech Translation Models for Streaming with Future-Aware Distillation and Inference

Fu, Biao, Liao, Minpeng, Fan, Kai, Huang, Zhongqiang, Chen, Boxing, Chen, Yidong, Shi, Xiaodong

arXiv.org Artificial IntelligenceOct-26-2023

A popular approach to streaming speech translation is to employ a single offline model with a wait-k policy to support different latency requirements, which is simpler than training multiple online models with different latency constraints. However, there is a mismatch problem in using a model trained with complete utterances for streaming inference with partial input. We demonstrate that speech representations extracted at the end of a streaming input are significantly different from those extracted from a complete utterance. To address this issue, we propose a new approach called Future-Aware Streaming Translation (FAST) that adapts an offline ST model for streaming input. FAST includes a Future-Aware Inference (FAI) strategy that incorporates future context through a trainable masked embedding, and a Future-Aware Distillation (FAD) framework that transfers future context from an approximation of full speech to streaming input. Our experiments on the MuST-C EnDe, EnEs, and EnFr benchmarks show that FAST achieves better trade-offs between translation quality and latency than strong baselines. Extensive analyses suggest that our methods effectively alleviate the aforementioned mismatch problem between offline training and online inference.

artificial intelligence, natural language, representation, (17 more...)

arXiv.org Artificial Intelligence

2303.07914

Country:

Europe (0.93)
Asia > China (0.46)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Learning to Compose Representations of Different Encoder Layers towards Improving Compositional Generalization

Lin, Lei, Li, Shuangtao, Zheng, Yafang, Fu, Biao, Liu, Shan, Chen, Yidong, Shi, Xiaodong

arXiv.org Artificial IntelligenceOct-18-2023

Recent studies have shown that sequence-to-sequence (seq2seq) models struggle with compositional generalization (CG), i.e., the ability to systematically generalize to unseen compositions of seen components. There is mounting evidence that one of the reasons hindering CG is the representation of the encoder uppermost layer is entangled, i.e., the syntactic and semantic representations of sequences are entangled. However, we consider that the previously identified representation entanglement problem is not comprehensive enough. Additionally, we hypothesize that the source keys and values representations passing into different decoder layers are also entangled. Starting from this intuition, we propose \textsc{CompoSition} (\textbf{Compo}se \textbf{S}yntactic and Semant\textbf{i}c Representa\textbf{tion}s), an extension to seq2seq models which learns to compose representations of different encoder layers dynamically for different tasks, since recent studies reveal that the bottom layers of the Transformer encoder contain more syntactic information and the top ones contain more semantic information. Specifically, we introduce a \textit{composed layer} between the encoder and decoder to compose different encoder layers' representations to generate specific keys and values passing into different decoder layers. \textsc{CompoSition} achieves competitive results on two comprehensive and realistic benchmarks, which empirically demonstrates the effectiveness of our proposal. Codes are available at~\url{https://github.com/thinkaboutzero/COMPOSITION}.

artificial intelligence, natural language, text processing, (4 more...)

arXiv.org Artificial Intelligence

2305.12169

Genre: Research Report (0.89)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.53)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.53)

Add feedback

CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment

Zheng, Jiangbin, Wang, Yile, Tan, Cheng, Li, Siyuan, Wang, Ge, Xia, Jun, Chen, Yidong, Li, Stan Z.

arXiv.org Artificial IntelligenceApr-12-2023

Sign language recognition (SLR) is a weakly supervised task that annotates sign videos as textual glosses. Recent studies show that insufficient training caused by the lack of large-scale available sign datasets becomes the main bottleneck for SLR. Most SLR works thereby adopt pretrained visual modules and develop two mainstream solutions. The multi-stream architectures extend multi-cue visual features, yielding the current SOTA performances but requiring complex designs and might introduce potential noise. Alternatively, the advanced single-cue SLR frameworks using explicit cross-modal alignment between visual and textual modalities are simple and effective, potentially competitive with the multi-cue framework. In this work, we propose a novel contrastive visual-textual transformation for SLR, CVT-SLR, to fully explore the pretrained knowledge of both the visual and language modalities. Based on the single-cue cross-modal alignment framework, we propose a variational autoencoder (VAE) for pretrained contextual knowledge while introducing the complete pretrained language module. The VAE implicitly aligns visual and textual modalities while benefiting from pretrained contextual knowledge as the traditional contextual module. Meanwhile, a contrastive cross-modal alignment algorithm is designed to explicitly enhance the consistency constraints. Extensive experiments on public datasets (PHOENIX-2014 and PHOENIX-2014T) demonstrate that our proposed CVT-SLR consistently outperforms existing single-cue methods and even outperforms SOTA multi-cue methods.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.05725

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Industry: Education > Curriculum > Subject-Specific Education (0.75)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

A Token-level Contrastive Framework for Sign Language Translation

Fu, Biao, Ye, Peigen, Zhang, Liang, Yu, Pei, Hu, Cong, Chen, Yidong, Shi, Xiaodong

arXiv.org Artificial IntelligenceMar-21-2023

Sign Language Translation (SLT) is a promising technology to bridge the communication gap between the deaf and the hearing people. Recently, researchers have adopted Neural Machine Translation (NMT) methods, which usually require large-scale corpus for training, to achieve SLT. However, the publicly available SLT corpus is very limited, which causes the collapse of the token representations and the inaccuracy of the generated tokens. To alleviate this issue, we propose ConSLT, a novel token-level \textbf{Con}trastive learning framework for \textbf{S}ign \textbf{L}anguage \textbf{T}ranslation , which learns effective token representations by incorporating token-level contrastive learning into the SLT decoding process. Concretely, ConSLT treats each token and its counterpart generated by different dropout masks as positive pairs during decoding, and then randomly samples $K$ tokens in the vocabulary that are not in the current sentence to construct negative examples. We conduct comprehensive experiments on two benchmarks (PHOENIX14T and CSL-Daily) for both end-to-end and cascaded settings. The experimental results demonstrate that ConSLT can achieve better translation quality than the strong baselines.

artificial intelligence, machine translation, natural language, (14 more...)

arXiv.org Artificial Intelligence

2204.04916

Country: Asia (0.46)

Genre: Research Report > New Finding (0.49)

Industry: Education > Curriculum > Subject-Specific Education (0.76)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Towards Better Document-level Relation Extraction via Iterative Inference

Zhang, Liang, Su, Jinsong, Chen, Yidong, Miao, Zhongjian, Min, Zijun, Hu, Qingguo, Shi, Xiaodong

arXiv.org Artificial IntelligenceNov-25-2022

Document-level relation extraction (RE) aims to extract the relations between entities from the input document that usually containing many difficultly-predicted entity pairs whose relations can only be predicted through relational inference. Existing methods usually directly predict the relations of all entity pairs of input document in a one-pass manner, ignoring the fact that predictions of some entity pairs heavily depend on the predicted results of other pairs. To deal with this issue, in this paper, we propose a novel document-level RE model with iterative inference. Our model is mainly composed of two modules: 1) a base module expected to provide preliminary relation predictions on entity pairs; 2) an inference module introduced to refine these preliminary predictions by iteratively dealing with difficultly-predicted entity pairs depending on other pairs in an easy-to-hard manner. Unlike previous methods which only consider feature information of entity pairs, our inference module is equipped with two Extended Cross Attention units, allowing it to exploit both feature information and previous predictions of entity pairs during relational inference. Furthermore, we adopt a two-stage strategy to train our model. At the first stage, we only train our base module. During the second stage, we train the whole model, where contrastive learning is introduced to enhance the training of inference module. Experimental results on three commonly-used datasets show that our model consistently outperforms other competitive baselines.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2211.1447

Country: Asia (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Predicting drug response of tumors from integrated genomic profiles by deep neural networks

Chiu, Yu-Chiao, Chen, Hung-I Harry, Zhang, Tinghe, Zhang, Songyao, Gorthi, Aparna, Wang, Li-Ju, Huang, Yufei, Chen, Yidong

arXiv.org Machine LearningMay-20-2018

The study of high-throughput genomic profiles from a pharmacogenomics viewpoint has provided unprecedented insights into the oncogenic features modulating drug response. A recent screening of ~1,000 cancer cell lines to a collection of anti-cancer drugs illuminated the link between genotypes and vulnerability. However, due to essential differences between cell lines and tumors, the translation into predicting drug response in tumors remains challenging. Here we proposed a DNN model to predict drug response based on mutation and expression profiles of a cancer cell or a tumor. The model contains a mutation and an expression encoders pre-trained using a large pan-cancer dataset to abstract core representations of high-dimension data, followed by a drug response predictor network. Given a pair of mutation and expression profiles, the model predicts IC50 values of 265 drugs. We trained and tested the model on a dataset of 622 cancer cell lines and achieved an overall prediction performance of mean squared error at 1.96 (log-scale IC50 values). The performance was superior in prediction error or stability than two classical methods and four analog DNNs of our model. We then applied the model to predict drug response of 9,059 tumors of 33 cancer types. The model predicted both known, including EGFR inhibitors in non-small cell lung cancer and tamoxifen in ER+ breast cancer, and novel drug targets. The comprehensive analysis further revealed the molecular mechanisms underlying the resistance to a chemotherapeutic drug docetaxel in a pan-cancer setting and the anti-cancer potential of a novel agent, CX-5461, in treating gliomas and hematopoietic malignancies. Overall, our model and findings improve the prediction of drug response and the identification of novel therapeutic options.

deep learning, drug response, neural network, (18 more...)

arXiv.org Machine Learning

1805.07702

Country: North America > United States > Texas (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.94)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.65)

Add feedback