AITopics | Chen, Junjie

Collaborating Authors

Chen, Junjie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models

Li, Jiahao, Wu, Zhourun, Lin, Wenhao, Luo, Jiawei, Zhang, Jun, Chen, Qingcai, Chen, Junjie

arXiv.org Artificial IntelligenceJul-16-2023

Motivation: Enhancers are important cis-regulatory elements that regulate a wide range of biological functions and enhance the transcription of target genes. Although many feature extraction methods have been proposed to improve the performance of enhancer identification, they cannot learn position-related multiscale contextual information from raw DNA sequences. Results: In this article, we propose a novel enhancer identification method (iEnhancer-ELM) based on BERT-like enhancer language models. iEnhancer-ELM tokenizes DNA sequences with multi-scale k-mers and extracts contextual information of different scale k-mers related with their positions via an multi-head attention mechanism. We first evaluate the performance of different scale k-mers, then ensemble them to improve the performance of enhancer identification. The experimental results on two popular benchmark datasets show that our model outperforms stateof-the-art methods. We further illustrate the interpretability of iEnhancer-ELM. For a case study, we discover 30 enhancer motifs via a 3-mer-based model, where 12 of motifs are verified by STREME and JASPAR, demonstrating our model has a potential ability to unveil the biological mechanism of enhancer. Availability and implementation: The models and associated code are available at https://github.com/chen-bioinfo/iEnhancer-ELM Contact: junjiechen@hit.edu.cn Supplementary information: Supplementary data are available at Bioinformatics Advances online.

bioinformatics, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1093/bioadv/vbad043

2212.01495

Country:

Asia > China (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Generic and Robust Root Cause Localization for Multi-Dimensional Data in Online Service Systems

Li, Zeyan, Chen, Junjie, Chen, Yihao, Luo, Chengyang, Zhao, Yiwei, Sun, Yongqian, Sui, Kaixin, Wang, Xiping, Liu, Dapeng, Jin, Xing, Wang, Qi, Pei, Dan

arXiv.org Artificial IntelligenceMay-5-2023

Localizing root causes for multi-dimensional data is critical to ensure online service systems' reliability. When a fault occurs, only the measure values within specific attribute combinations are abnormal. Such attribute combinations are substantial clues to the underlying root causes and thus are called root causes of multidimensional data. This paper proposes a generic and robust root cause localization approach for multi-dimensional data, PSqueeze. We propose a generic property of root cause for multi-dimensional data, generalized ripple effect (GRE). Based on it, we propose a novel probabilistic cluster method and a robust heuristic search method. Moreover, we identify the importance of determining external root causes and propose an effective method for the first time in literature. Our experiments on two real-world datasets with 5400 faults show that the F1-score of PSqueeze outperforms baselines by 32.89%, while the localization time is around 10 seconds across all cases. The F1-score in determining external root causes of PSqueeze achieves 0.90. Furthermore, case studies in several production systems demonstrate that PSqueeze is helpful to fault diagnosis in the real world.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2305.03331

Country: Asia > China (1.00)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

An Improved StarGAN for Emotional Voice Conversion: Enhancing Voice Quality and Data Augmentation

He, Xiangheng, Chen, Junjie, Rizos, Georgios, Schuller, Björn W.

arXiv.org Artificial IntelligenceJul-18-2021

Emotional Voice Conversion (EVC) aims to convert the emotional style of a source speech signal to a target style while preserving its content and speaker identity information. Previous emotional conversion studies do not disentangle emotional information from emotion-independent information that should be preserved, thus transforming it all in a monolithic manner and generating audio of low quality, with linguistic distortions. To address this distortion problem, we propose a novel StarGAN framework along with a two-stage training process that separates emotional features from those independent of emotion by using an autoencoder with two encoders as the generator of the Generative Adversarial Network (GAN). The proposed model achieves favourable results in both the objective evaluation and the subjective evaluation in terms of distortion, which reveals that the proposed model can effectively reduce distortion. Furthermore, in data augmentation experiments for end-to-end speech emotion recognition, the proposed StarGAN model achieves an increase of 2% in Micro-F1 and 5% in Macro-F1 compared to the baseline StarGAN model, which indicates that the proposed model is more valuable for data augmentation.

artificial intelligence, conversion, neural network, (18 more...)

arXiv.org Artificial Intelligence

2107.08361

Country: Asia > Japan (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.57)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.35)

Add feedback

Neural Code Summarization: How Far Are We?

Shi, Ensheng, Wang, Yanlin, Du, Lun, Chen, Junjie, Han, Shi, Zhang, Hongyu, Zhang, Dongmei, Sun, Hongbin

arXiv.org Artificial IntelligenceJul-15-2021

Source code summaries are important for the comprehension and maintenance of programs. However, there are plenty of programs with missing, outdated, or mismatched summaries. Recently, deep learning techniques have been exploited to automatically generate summaries for given code snippets. To achieve a profound understanding of how far we are from solving this problem, in this paper, we conduct a systematic and in-depth analysis of five state-of-the-art neural source code summarization models on three widely used datasets. Our evaluation results suggest that: (1) The BLEU metric, which is widely used by existing work for evaluating the performance of the summarization models, has many variants. Ignoring the differences among the BLEU variants could affect the validity of the claimed results. Furthermore, we discover an important, previously unknown bug about BLEU calculation in a commonly-used software package. (2) Code pre-processing choices can have a large impact on the summarization performance, therefore they should not be ignored. (3) Some important characteristics of datasets (corpus size, data splitting method, and duplication ratio) have a significant impact on model evaluation. Based on the experimental results, we give some actionable guidelines on more systematic ways for evaluating code summarization and choosing the best method in different scenarios. We also suggest possible future research directions. We believe that our results can be of great help for practitioners and researchers in this interesting area.

dataset, deep learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2107.07112

Country: Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback