AITopics | Du, Wenjie

Collaborating Authors

Du, Wenjie

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Time-MQA: Time Series Multi-Task Question Answering with Context Enhancement

Kong, Yaxuan, Yang, Yiyuan, Hwang, Yoontae, Du, Wenjie, Zohren, Stefan, Wang, Zhangyang, Jin, Ming, Wen, Qingsong

arXiv.org Artificial IntelligenceFeb-26-2025

Time series data are foundational in finance, healthcare, and energy domains. However, most existing methods and datasets remain focused on a narrow spectrum of tasks, such as forecasting or anomaly detection. To bridge this gap, we introduce Time Series Multi-Task Question Answering (Time-MQA), a unified framework that enables natural language queries across multiple time series tasks - numerical analytical tasks and open-ended question answering with reasoning. Central to Time-MQA is the TSQA dataset, a large-scale dataset containing $\sim$200k question-answer pairs derived from diverse time series spanning environment, traffic, etc. This comprehensive resource covers various time series lengths and promotes robust model development. We further demonstrate how continually pre-training large language models (Mistral 7B, Llama-3 8B, and Qwen-2.5 7B) on the TSQA dataset enhanced time series reasoning capabilities, moving beyond mere numeric tasks and enabling more advanced and intuitive interactions with temporal data. The complete TSQA dataset, models, executable codes, user study questionnaires for evaluation, and results have all been open-sourced.

large language model, machine learning, question answering, (20 more...)

arXiv.org Artificial Intelligence

2503.01875

Country: North America > United States > Texas (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

AutoDroid-V2: Boosting SLM-based GUI Agents via Code Generation

Wen, Hao, Tian, Shizuo, Pavlov, Borislav, Du, Wenjie, Li, Yixuan, Chang, Ge, Zhao, Shanhui, Liu, Jiacheng, Liu, Yunxin, Zhang, Ya-Qin, Li, Yuanchun

arXiv.org Artificial IntelligenceDec-26-2024

Large language models (LLMs) have brought exciting new advances to mobile UI agents, a long-standing research field that aims to complete arbitrary natural language tasks through mobile UI interactions. However, existing UI agents usually demand high reasoning capabilities of powerful large models that are difficult to be deployed locally on end-users' devices, which raises huge concerns about user privacy and centralized serving cost. One way to reduce the required model size is to customize a smaller domain-specific model with high-quality training data, e.g. large-scale human demonstrations of diverse types of apps and tasks, while such datasets are extremely difficult to obtain. Inspired by the remarkable coding abilities of recent small language models (SLMs), we propose to convert the UI task automation problem to a code generation problem, which can be effectively solved by an on-device SLM and efficiently executed with an on-device code interpreter. Unlike normal coding tasks that can be extensively pretrained with public datasets, generating UI automation code is challenging due to the diversity, complexity, and variability of target apps. Therefore, we adopt a document-centered approach that automatically builds fine-grained API documentation for each app and generates diverse task samples based on this documentation. By guiding the agent with the synthetic documents and task samples, it learns to generate precise and efficient scripts to complete unseen tasks. Based on detailed comparisons with state-of-the-art mobile UI agents, our approach effectively improves the mobile task automation with significantly higher success rates and lower latency/token consumption. Code will be open-sourced.

gui element, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.18116

Country: Asia (0.67)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

FlexMol: A Flexible Toolkit for Benchmarking Molecular Relational Learning

Liu, Sizhe, Xia, Jun, Zhang, Lecheng, Liu, Yuchen, Liu, Yue, Du, Wenjie, Gao, Zhangyang, Hu, Bozhen, Tan, Cheng, Xiang, Hongxin, Li, Stan Z.

arXiv.org Artificial IntelligenceOct-19-2024

Molecular relational learning (MRL) is crucial for understanding the interaction behaviors between molecular pairs, a critical aspect of drug discovery and development. However, the large feasible model space of MRL poses significant challenges to benchmarking, and existing MRL frameworks face limitations in flexibility and scope. To address these challenges, avoid repetitive coding efforts, and ensure fair comparison of models, we introduce FlexMol, a comprehensive toolkit designed to facilitate the construction and evaluation of diverse model architectures across various datasets and performance metrics. FlexMol offers a robust suite of preset model components, including 16 drug encoders, 13 protein sequence encoders, 9 protein structure encoders, and 7 interaction layers. With its easy-to-use API and flexibility, FlexMol supports the dynamic construction of over 70, 000 distinct combinations of model architectures. Additionally, we provide detailed benchmark results and code examples to demonstrate FlexMol's effectiveness in simplifying and standardizing MRL model development and comparison.

artificial intelligence, encoder, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.1501

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text-guided Diffusion Model for 3D Molecule Generation

Luo, Yanchen, Fang, Junfeng, Li, Sihang, Liu, Zhiyuan, Wu, Jiancan, Zhang, An, Du, Wenjie, Wang, Xiang

arXiv.org Artificial IntelligenceOct-4-2024

De novo molecule design, the process of generating molecules with specific, chemically viable structures for target properties, is a cornerstone in the fields of biology, chemistry, and drug discovery (Hajduk and Greer, 2007; Mandal et al., 2009; Pyzer-Knapp et al., 2015; Barakat et al., 2014). It not only allows for the creation of subject molecules but also provides insights into the relationship between molecular structure and function, enabling the prediction and manipulation of biological activity. Constrained by the immense diversity of chemical space, manually generating property-specific molecules remains a daunting challenge (Gaudelet et al., 2021). However, the generation of molecules that precisely meet specific requirements, including the creation of tailor-made molecules, is a complex task due to the vastness of the chemical space and the intricate relationship between molecular structure and function. Overcoming this challenge is crucial for advancing our understanding of biological systems and for the development of new therapeutic agents.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2410.03803

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TSI-Bench: Benchmarking Time Series Imputation

Du, Wenjie, Wang, Jun, Qian, Linglong, Yang, Yiyuan, Liu, Fanxing, Wang, Zepu, Ibrahim, Zina, Liu, Haoxin, Zhao, Zhiyuan, Zhou, Yingjie, Wang, Wenjia, Ding, Kaize, Liang, Yuxuan, Prakash, B. Aditya, Wen, Qingsong

arXiv.org Artificial IntelligenceJun-18-2024

Effective imputation is a crucial preprocessing step for time series analysis. Despite the development of numerous deep learning algorithms for time series imputation, the community lacks standardized and comprehensive benchmark platforms to effectively evaluate imputation performance across different settings. Moreover, although many deep learning forecasting algorithms have demonstrated excellent performance, whether their modeling achievements can be transferred to time series imputation tasks remains unexplored. To bridge these gaps, we develop TSI-Bench, the first (to our knowledge) comprehensive benchmark suite for time series imputation utilizing deep learning techniques. The TSI-Bench pipeline standardizes experimental settings to enable fair evaluation of imputation algorithms and identification of meaningful insights into the influence of domain-appropriate missingness ratios and patterns on model performance. Furthermore, TSI-Bench innovatively provides a systematic paradigm to tailor time series forecasting algorithms for imputation purposes. Our extensive study across 34,804 experiments, 28 algorithms, and 8 datasets with diverse missingness scenarios demonstrates TSI-Bench's effectiveness in diverse downstream tasks and potential to unlock future directions in time series imputation research and analysis.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.12747

Country: North America > United States > California (0.27)

Genre: Research Report (1.00)

Industry:

Information Technology (0.92)
Health & Medicine > Health Care Technology (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

NovoBench: Benchmarking Deep Learning-based De Novo Peptide Sequencing Methods in Proteomics

Zhou, Jingbo, Chen, Shaorong, Xia, Jun, Liu, Sizhe, Ling, Tianze, Du, Wenjie, Liu, Yue, Yin, Jianwei, Li, Stan Z.

arXiv.org Artificial IntelligenceJun-16-2024

Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the high-throughput analysis of protein composition in biological tissues. Many deep learning methods have been developed for \emph{de novo} peptide sequencing task, i.e., predicting the peptide sequence for the observed mass spectrum. However, two key challenges seriously hinder the further advancement of this important task. Firstly, since there is no consensus for the evaluation datasets, the empirical results in different research papers are often not comparable, leading to unfair comparison. Secondly, the current methods are usually limited to amino acid-level or peptide-level precision and recall metrics. In this work, we present the first unified benchmark NovoBench for \emph{de novo} peptide sequencing, which comprises diverse mass spectrum data, integrated models, and comprehensive evaluation metrics. Recent impressive methods, including DeepNovo, PointNovo, Casanovo, InstaNovo, AdaNovo and $\pi$-HelixNovo are integrated into our framework. In addition to amino acid-level and peptide-level precision and recall, we evaluate the models' performance in terms of identifying post-tranlational modifications (PTMs), efficiency and robustness to peptide length, noise peaks and missing fragment ratio, which are important influencing factors while seldom be considered. Leveraging this benchmark, we conduct a large-scale study of current methods, report many insightful findings that open up new possibilities for future development. The benchmark will be open-sourced to facilitate future research and application.

artificial intelligence, machine learning, peptide, (11 more...)

arXiv.org Artificial Intelligence

2406.11906

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unveiling the Secrets: How Masking Strategies Shape Time Series Imputation

Qian, Linglong, Ibrahim, Zina, Du, Wenjie, Yang, Yiyuan, Dobson, Richard JB

arXiv.org Machine LearningMay-26-2024

In this study, we explore the impact of different masking strategies on time series imputation models. We evaluate the effects of pre-masking versus in-mini-batch masking, normalization timing, and the choice between augmenting and overlaying artificial missingness. Using three diverse datasets, we benchmark eleven imputation models with different missing rates. Our results demonstrate that masking strategies significantly influence imputation accuracy, revealing that more sophisticated and data-driven masking designs are essential for robust model evaluation. We advocate for refined experimental designs and comprehensive disclosureto better simulate real-world patterns, enhancing the practical applicability of imputation models.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Machine Learning

2405.17508

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

AdaNovo: Adaptive \emph{De Novo} Peptide Sequencing with Conditional Mutual Information

Xia, Jun, Chen, Shaorong, Zhou, Jingbo, Ling, Tianze, Du, Wenjie, Liu, Sizhe, Li, Stan Z.

arXiv.org Artificial IntelligenceMar-15-2024

Tandem mass spectrometry has played a pivotal role in advancing proteomics, enabling the analysis of protein composition in biological samples. Despite the development of various deep learning methods for identifying amino acid sequences (peptides) responsible for observed spectra, challenges persist in \emph{de novo} peptide sequencing. Firstly, prior methods struggle to identify amino acids with post-translational modifications (PTMs) due to their lower frequency in training data compared to canonical amino acids, further resulting in decreased peptide-level identification precision. Secondly, diverse types of noise and missing peaks in mass spectra reduce the reliability of training data (peptide-spectrum matches, PSMs). To address these challenges, we propose AdaNovo, a novel framework that calculates conditional mutual information (CMI) between the spectrum and each amino acid/peptide, using CMI for adaptive model training. Extensive experiments demonstrate AdaNovo's state-of-the-art performance on a 9-species benchmark, where the peptides in the training set are almost completely disjoint from the peptides of the test sets. Moreover, AdaNovo excels in identifying amino acids with PTMs and exhibits robustness against data noise. The supplementary materials contain the official code.

amino acid, artificial intelligence, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2403.07013

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

MolTC: Towards Molecular Relational Modeling In Language Models

Fang, Junfeng, Zhang, Shuai, Wu, Chang, Liu, Zhiyuan, Li, Sihang, Wang, Kun, Du, Wenjie, Wang, Xiang

arXiv.org Artificial IntelligenceFeb-8-2024

Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Recently, the adoption of large language models (LLMs), known for their vast knowledge repositories and advanced logical inference capabilities, has emerged as a promising way for efficient and effective MRL. Despite their potential, these methods predominantly rely on the textual data, thus not fully harnessing the wealth of structural information inherent in molecular graphs. Moreover, the absence of a unified framework exacerbates the issue of information underutilization, as it hinders the sharing of interaction mechanism learned across diverse datasets. To address these challenges, this work proposes a novel LLM-based multi-modal framework for Molecular inTeraction prediction following Chain-of-Thought (CoT) theory, termed MolTC, which effectively integrate graphical information of two molecules in pair. For achieving a unified MRL, MolTC innovatively develops a dynamic parameter-sharing strategy for cross-dataset information sharing. Moreover, to train MolTC efficiently, we introduce a Multi-hierarchical CoT concept to refine its training paradigm, and conduct a comprehensive Molecular Interactive Instructions dataset for the development of biochemical LLMs involving MRL. Our experiments, conducted across various datasets involving over 4,000,000 molecular pairs, exhibit the superiority of our method over current GNN and LLM-based baselines. Code is available at https://github.com/MangoKiller/MolTC.

artificial intelligence, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.03781

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Deep Learning for Multivariate Time Series Imputation: A Survey

Wang, Jun, Du, Wenjie, Cao, Wei, Zhang, Keli, Wang, Wenjia, Liang, Yuxuan, Wen, Qingsong

arXiv.org Artificial IntelligenceFeb-6-2024

The ubiquitous missing values cause the multivariate time series data to be partially observed, destroying the integrity of time series and hindering the effective time series data analysis. Recently deep learning imputation methods have demonstrated remarkable success in elevating the quality of corrupted time series data, subsequently enhancing performance in downstream tasks. In this paper, we conduct a comprehensive survey on the recently proposed deep learning imputation methods. First, we propose a taxonomy for the reviewed methods, and then provide a structured review of these methods by highlighting their strengths and limitations. We also conduct empirical experiments to study different methods and compare their enhancement for downstream tasks. Finally, the open issues for future research on multivariate time series imputation are pointed out. All code and configurations of this work, including a regularly maintained multivariate time series imputation paper list, can be found in the GitHub repository~\url{https://github.com/WenjieDu/Awesome\_Imputation}.

artificial intelligence, imputation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2402.04059

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback