AITopics | Wang, Tao

Collaborating Authors

Wang, Tao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DiffAD: A Unified Diffusion Modeling Approach for Autonomous Driving

Wang, Tao, Zhang, Cong, Qu, Xingguang, Li, Kun, Liu, Weiwei, Huang, Chang

arXiv.org Artificial IntelligenceMar-15-2025

End-to-end autonomous driving (E2E-AD) has rapidly emerged as a promising approach toward achieving full autonomy. However, existing E2E-AD systems typically adopt a traditional multi-task framework, addressing perception, prediction, and planning tasks through separate task-specific heads. Despite being trained in a fully differentiable manner, they still encounter issues with task coordination, and the system complexity remains high. In this work, we introduce DiffAD, a novel diffusion probabilistic model that redefines autonomous driving as a conditional image generation task. By rasterizing heterogeneous targets onto a unified bird's-eye view (BEV) and modeling their latent distribution, DiffAD unifies various driving objectives and jointly optimizes all driving tasks in a single framework, significantly reducing system complexity and harmonizing task coordination. The reverse process iteratively refines the generated BEV image, resulting in more robust and realistic driving behaviors. Closed-loop evaluations in Carla demonstrate the superiority of the proposed method, achieving a new state-of-the-art Success Rate and Driving Score. The code will be made publicly available.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.1217

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (0.84)
Automobiles & Trucks (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
(2 more...)

Add feedback

The Box is in the Pen: Evaluating Commonsense Reasoning in Neural Machine Translation

He, Jie, Wang, Tao, Xiong, Deyi, Liu, Qun

arXiv.org Artificial IntelligenceMar-5-2025

Does neural machine translation yield translations that are congenial with common sense? In this paper, we present a test suite to evaluate the commonsense reasoning capability of neural machine translation. The test suite consists of three test sets, covering lexical and contextless/contextual syntactic ambiguity that requires commonsense knowledge to resolve. We manually create 1,200 triples, each of which contain a source sentence and two contrastive translations, involving 7 different common sense types. Language models pretrained on large-scale corpora, such as BERT, GPT-2, achieve a commonsense reasoning accuracy of lower than 72% on target translations of this test suite. We conduct extensive experiments on the test suite to evaluate commonsense reasoning in neural machine translation and investigate factors that have impact on this capability. Our experiments and analyses demonstrate that neural machine translation performs poorly on commonsense reasoning of the three ambiguity types in terms of both reasoning accuracy (60.1%) and reasoning consistency (31%). The built commonsense test suite is available at https://github.com/tjunlp-lab/CommonMT.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.03308

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

StickMotion: Generating 3D Human Motions by Drawing a Stickman

Wang, Tao, Wu, Zhihua, He, Qiaozhi, Chu, Jiaming, Qian, Ling, Cheng, Yu, Xing, Junliang, Zhao, Jian, Jin, Lei

arXiv.org Artificial IntelligenceMar-5-2025

Text-to-motion generation, which translates textual descriptions into human motions, has been challenging in accurately capturing detailed user-imagined motions from simple text inputs. This paper introduces StickMotion, an efficient diffusion-based network designed for multi-condition scenarios, which generates desired motions based on traditional text and our proposed stickman conditions for global and local control of these motions, respectively. We address the challenges introduced by the user-friendly stickman from three perspectives: 1) Data generation. We develop an algorithm to generate hand-drawn stickmen automatically across different dataset formats. 2) Multi-condition fusion. We propose a multi-condition module that integrates into the diffusion process and obtains outputs of all possible condition combinations, reducing computational complexity and enhancing StickMotion's performance compared to conventional approaches with the self-attention module. 3) Dynamic supervision. We empower StickMotion to make minor adjustments to the stickman's position within the output sequences, generating more natural movements through our proposed dynamic supervision strategy. Through quantitative experiments and user studies, sketching stickmen saves users about 51.5% of their time generating motions consistent with their imagination. Our codes, demos, and relevant data will be released to facilitate further research and validation within the scientific community.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.04829

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Human Computer Interaction (0.88)
Information Technology > Artificial Intelligence > Natural Language (0.68)
(3 more...)

Add feedback

Rapid morphology characterization of two-dimensional TMDs and lateral heterostructures based on deep learning

He, Junqi, Zhang, Yujie, Wang, Jialu, Wang, Tao, Zhang, Pan, Cai, Chengjie, Yang, Jinxing, Lin, Xiao, Yang, Xiaohui

arXiv.org Artificial IntelligenceMar-1-2025

Leveraging advancements in artificial intelligence, we introduce a deep learning-based method for efficiently characterizing heterostructures and 2D materials, specifically MoS 2-MoSe 2 lateral heterostructures and MoS 2 flakes with varying shapes and thicknesses. By utilizing YOLO models, we achieve an accuracy rate of over 94.67% in identifying these materials. Additionally, we explore the application of transfer learning across different materials, which further enhances model performance. This model exhibits robust generalization and anti-interference ability, ensuring reliable results in diverse scenarios. To facilitate practical use, we have developed an application that enables real-time analysis directly from optical microscope images, making the process significantly faster and more cost-effective than traditional methods. This deep learning-driven approach represents a promising tool for the rapid and accurate characterization of 2D materials, opening new avenues for research and development in material science. Keywords 2D material, TMDs, lateral heterostructure, deep learning, instance segmentation, morphology characterization Introduction Two-dimensional (2D) materials have attracted significant attention due to their excellent mechanical, electrical, thermal, and optical properties, making them ideal candidates for next-generation technologies.

artificial intelligence, heterostructure, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.0047

Country:

Asia > China (0.49)
North America > United States > Texas (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Pseudo-Knowledge Graph: Meta-Path Guided Retrieval and In-Graph Text for RAG-Equipped LLM

Yang, Yuxin, Wu, Haoyang, Wang, Tao, Yang, Jia, Ma, Hao, Luo, Guojie

arXiv.org Artificial IntelligenceFeb-28-2025

The advent of Large Language Models (LLMs) has revolutionized natural language processing. However, these models face challenges in retrieving precise information from vast datasets. Retrieval-Augmented Generation (RAG) was developed to combining LLMs with external information retrieval systems to enhance the accuracy and context of responses. Despite improvements, RAG still struggles with comprehensive retrieval in high-volume, low-information-density databases and lacks relational awareness, leading to fragmented answers. To address this, this paper introduces the Pseudo-Knowledge Graph (PKG) framework, designed to overcome these limitations by integrating Meta-path Retrieval, In-graph Text and Vector Retrieval into LLMs. By preserving natural language text and leveraging various retrieval techniques, the PKG offers a richer knowledge representation and improves accuracy in information retrieval. Extensive evaluations using Open Compass and MultiHop-RAG benchmarks demonstrate the framework's effectiveness in managing large volumes of data and complex relationships.

information retrieval, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.00309

Country:

Asia (1.00)
Europe > United Kingdom > England (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Food & Agriculture > Agriculture (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

IMM-MOT: A Novel 3D Multi-object Tracking Framework with Interacting Multiple Model Filter

Liu, Xiaohong, Zhao, Xulong, Liu, Gang, Wu, Zili, Wang, Tao, Meng, Lei, Wang, Yuhan

arXiv.org Artificial IntelligenceFeb-12-2025

3D Multi-Object Tracking (MOT) provides the trajectories of surrounding objects, assisting robots or vehicles in smarter path planning and obstacle avoidance. Existing 3D MOT methods based on the Tracking-by-Detection framework typically use a single motion model to track an object throughout its entire tracking process. However, objects may change their motion patterns due to variations in the surrounding environment. In this paper, we introduce the Interacting Multiple Model filter in IMM-MOT, which accurately fits the complex motion patterns of individual objects, overcoming the limitation of single-model tracking in existing approaches. In addition, we incorporate a Damping Window mechanism into the trajectory lifecycle management, leveraging the continuous association status of trajectories to control their creation and termination, reducing the occurrence of overlooked low-confidence true targets. Furthermore, we propose the Distance-Based Score Enhancement module, which enhances the differentiation between false positives and true positives by adjusting detection scores, thereby improving the effectiveness of the Score Filter. On the NuScenes Val dataset, IMM-MOT outperforms most other single-modal models using 3D point clouds, achieving an AMOTA of 73.8%. Our project is available at https://github.com/Ap01lo/IMM-MOT.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Artificial Intelligence

2502.09672

Country: Asia > China > Shaanxi Province (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.54)

Add feedback

KPIs 2024 Challenge: Advancing Glomerular Segmentation from Patch- to Slide-Level

Deng, Ruining, Yao, Tianyuan, Tang, Yucheng, Guo, Junlin, Lu, Siqi, Xiong, Juming, Yu, Lining, Cap, Quan Huu, Cai, Pengzhou, Lan, Libin, Zhao, Ze, Galdran, Adrian, Kumar, Amit, Deotale, Gunjan, Das, Dev Kumar, Paik, Inyoung, Lee, Joonho, Lee, Geongyu, Chen, Yujia, Li, Wangkai, Li, Zhaoyang, Hou, Xuege, Wu, Zeyuan, Wang, Shengjin, Fischer, Maximilian, Kramer, Lars, Du, Anghong, Zhang, Le, Sanchez, Maria Sanchez, Ulloa, Helena Sanchez, Heredia, David Ribalta, Garcia, Carlos Perez de Arenaza, Xu, Shuoyu, He, Bingdou, Cheng, Xinping, Wang, Tao, Moreau, Noemie, Bozek, Katarzyna, Innani, Shubham, Baid, Ujjwal, Kefas, Kaura Solomon, Landman, Bennett A., Wang, Yu, Zhao, Shilin, Yin, Mengmeng, Yang, Haichun, Huo, Yuankai

arXiv.org Artificial IntelligenceFeb-11-2025

Chronic kidney disease (CKD) is a major global health issue, affecting over 10% of the population and causing significant mortality. While kidney biopsy remains the gold standard for CKD diagnosis and treatment, the lack of comprehensive benchmarks for kidney pathology segmentation hinders progress in the field. To address this, we organized the Kidney Pathology Image Segmentation (KPIs) Challenge, introducing a dataset that incorporates preclinical rodent models of CKD with over 10,000 annotated glomeruli from 60+ Periodic Acid Schiff (PAS)-stained whole slide images. The challenge includes two tasks, patch-level segmentation and whole slide image segmentation and detection, evaluated using the Dice Similarity Coefficient (DSC) and F1-score. By encouraging innovative segmentation methods that adapt to diverse CKD models and tissue conditions, the KPIs Challenge aims to advance kidney pathology analysis, establish new benchmarks, and enable precise, large-scale quantification for disease research and diagnosis.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2502.07288

Country:

Asia > China (0.68)
Europe (0.68)
North America > United States > Indiana (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.89)

Add feedback

Multi-Physics Simulations via Coupled Fourier Neural Operator

Li, Shibo, Wang, Tao, Sun, Yifei, Tang, Hewei

arXiv.org Artificial IntelligenceJan-29-2025

Physical simulations are essential tools across critical fields such as mechanical and aerospace engineering, chemistry, meteorology, etc.. While neural operators, particularly the Fourier Neural Operator (FNO), have shown promise in predicting simulation results with impressive performance and efficiency, they face limitations when handling real-world scenarios involving coupled multiphysics outputs. Current neural operator methods either overlook the correlations between multiple physical processes or employ simplistic architectures that inadequately capture these relationships. To overcome these challenges, we introduce a novel coupled multi-physics neural operator learning (COMPOL) framework that extends the capabilities of Fourier operator layers to model interactions among multiple physical processes. Our approach implements feature aggregation through recurrent and attention mechanisms, enabling comprehensive modeling of coupled interactions. Our method's core is an innovative system for aggregating latent features from multi-physics processes. These aggregated features serve as enriched information sources for neural operator layers, allowing our framework to capture complex physical relationships accurately. We evaluated our coupled multi-physics neural operator across diverse physical simulation tasks, including biological systems, fluid mechanics, and multiphase flow in porous media. Our proposed model demonstrates a two to three-fold improvement in predictive performance compared to existing approaches.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.17296

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

Large Language Models for Bioinformatics

Ruan, Wei, Lyu, Yanjun, Zhang, Jing, Cai, Jiazhang, Shu, Peng, Ge, Yang, Lu, Yao, Gao, Shang, Wang, Yue, Wang, Peilong, Zhao, Lin, Wang, Tao, Liu, Yufang, Fang, Luyang, Liu, Ziyu, Liu, Zhengliang, Li, Yiwei, Wu, Zihao, Chen, Junhao, Jiang, Hanqi, Pan, Yi, Yang, Zhenyuan, Chen, Jingyuan, Liang, Shizhe, Zhang, Wei, Ma, Terry, Dou, Yuan, Zhang, Jianli, Gong, Xinyu, Gan, Qi, Zou, Yusong, Chen, Zebang, Qian, Yuanxin, Yu, Shuo, Lu, Jin, Song, Kenan, Wang, Xianqiao, Sikora, Andrea, Li, Gang, Li, Xiang, Li, Quanzheng, Wang, Yingfeng, Zhang, Lu, Abate, Yohannes, He, Lifang, Zhong, Wenxuan, Liu, Rongjie, Huang, Chao, Liu, Wei, Shen, Ye, Ma, Ping, Zhu, Hongtu, Yan, Yajun, Zhu, Dajiang, Liu, Tianming

arXiv.org Artificial IntelligenceJan-9-2025

With the rapid advancements in large language model (LLM) technology and the emergence of bioinformatics-specific language models (BioLMs), there is a growing need for a comprehensive analysis of the current landscape, computational characteristics, and diverse applications. This survey aims to address this need by providing a thorough review of BioLMs, focusing on their evolution, classification, and distinguishing features, alongside a detailed examination of training methodologies, datasets, and evaluation frameworks. We explore the wide-ranging applications of BioLMs in critical areas such as disease diagnosis, drug discovery, and vaccine development, highlighting their impact and transformative potential in bioinformatics. We identify key challenges and limitations inherent in BioLMs, including data privacy and security concerns, interpretability issues, biases in training data and model outputs, and domain adaptation complexities. Finally, we highlight emerging trends and future directions, offering valuable insights to guide researchers and clinicians toward advancing BioLMs for increasingly sophisticated biological and clinical applications.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2501.06271

Country: North America > United States > Minnesota (0.27)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Vaccines (1.00)
Health & Medicine > Therapeutic Area > Oncology (1.00)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Leveraging Cross-Attention Transformer and Multi-Feature Fusion for Cross-Linguistic Speech Emotion Recognition

Zhao, Ruoyu, Jiang, Xiantao, Yu, F. Richard, Leung, Victor C. M., Wang, Tao, Zhang, Shaohu

arXiv.org Artificial IntelligenceJan-6-2025

Speech Emotion Recognition (SER) plays a crucial role in enhancing human-computer interaction. Cross-Linguistic SER (CLSER) has been a challenging research problem due to significant variability in linguistic and acoustic features of different languages. In this study, we propose a novel approach HuMP-CAT, which combines HuBERT, MFCC, and prosodic characteristics. These features are fused using a cross-attention transformer (CAT) mechanism during feature extraction. Transfer learning is applied to gain from a source emotional speech dataset to the target corpus for emotion recognition. We use IEMOCAP as the source dataset to train the source model and evaluate the proposed method on seven datasets in five languages (e.g., English, German, Spanish, Italian, and Chinese). We show that, by fine-tuning the source model with a small portion of speech from the target datasets, HuMP-CAT achieves an average accuracy of 78.75% across the seven datasets, with notable performance of 88.69% on EMODB (German language) and 79.48% on EMOVO (Italian language). Our extensive evaluation demonstrates that HuMP-CAT outperforms existing methods across multiple target languages.

data mining, emotion recognition, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.10408

Country:

North America > United States > California > Santa Clara County (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(5 more...)

Add feedback