AITopics | Li, Jun

Collaborating Authors

Li, Jun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Invertible Koopman neural operator for data-driven modeling of partial differential equations

Jin, Yuhong, Cong, Andong, Hou, Lei, Gao, Qiang, Ge, Xiangdong, Zhu, Chonglong, Feng, Yongzhi, Li, Jun

arXiv.org Artificial IntelligenceMar-25-2025

INN is introduced to eliminate dependency on reconstruction loss. Koopman operator is parameterized in frequency space to ensure resolution-invariance. By preprocessing, such as interpolation, IKNO is available for non-Cartesian domains. In various numerical and real-world examples, IKNO performs over FNO and KNO. R. ChinaA R T I C L E I N F OKeywords: Deep learning Invertible neural network Koopman operator Data-driven modeling Neural operator Partial differential equations A B S T R A C T Koopman operator theory is a popular candidate for data-driven modeling because it provides a global linearization representation for nonlinear dynamical systems. However, existing Koopman operator-based methods suffer from shortcomings in constructing the well-behaved observable function and its inverse and are inefficient enough when dealing with partial differential equations (PDEs). To address these issues, this paper proposes the Invertible Koopman Neural Operator (IKNO), a novel data-driven modeling approach inspired by the Koopman operator theory and neural operator. IKNO leverages an Invertible Neural Network to parameterize observable function and its inverse simultaneously under the same learnable parameters, explicitly guaranteeing the reconstruction relation, thus eliminating the dependency on the reconstruction loss, which is an essential improvement over the original Koopman Neural Operator (KNO). The structured linear matrix inspired by the Koopman operator theory is parameterized to learn the evolution of observables' low-frequency modes in the frequency space rather than directly in the observable space, sustaining IKNO is resolution-invariant like other neural operators. Moreover, with preprocessing such as interpolation and dimension expansion, IKNO can be extended to operator learning tasks defined on non-Cartesian domains. We fully support the above claims based on rich numerical and real-world examples and demonstrate the effectiveness of IKNO and superiority over other neural operators.1. Introduction Complex nonlinear dynamical systems are ubiquitous in many engineering fields, such as aerospace and vibration control, and modeling these systems is an important research topic [1-3]. Traditional knowledge-driven modeling approaches usually use a priori expertise to build a set of differential or algebraic equations to describe or explain phenomena of interest, having achieved relative maturity. However, in many scenarios, some key parameters, even expressions of systems of concern, may be difficult to measure or give accurately, making establishing a physical model that can accurately characterize systems' evolution challenging. In recent years, as big data technology and computer performance have improved, data-driven modeling approaches have gained extensive attention from researchers, providing a feasible route to solve the aforementioned problems [4-6].

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.19717

Country: Asia > China (0.46)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas (0.46)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Visual and Text Prompt Segmentation: A Novel Multi-Model Framework for Remote Sensing

Zi, Xing, Jin, Kairui, Tao, Xian, Li, Jun, Braytee, Ali, Shah, Rajiv Ratn, Prasad, Mukesh

arXiv.org Artificial IntelligenceMar-10-2025

Pixel-level segmentation is essential in remote sensing, where foundational vision models like CLIP and Segment Anything Model(SAM) have demonstrated significant capabilities in zero-shot segmentation tasks. Despite their advances, challenges specific to remote sensing remain substantial. Firstly, The SAM without clear prompt constraints, often generates redundant masks, and making post-processing more complex. Secondly, the CLIP model, mainly designed for global feature alignment in foundational models, often overlooks local objects crucial to remote sensing. This oversight leads to inaccurate recognition or misplaced focus in multi-target remote sensing imagery. Thirdly, both models have not been pre-trained on multi-scale aerial views, increasing the likelihood of detection failures. To tackle these challenges, we introduce the innovative VTPSeg pipeline, utilizing the strengths of Grounding DINO, CLIP, and SAM for enhanced open-vocabulary image segmentation. The Grounding DINO+(GD+) module generates initial candidate bounding boxes, while the CLIP Filter++(CLIP++) module uses a combination of visual and textual prompts to refine and filter out irrelevant object bounding boxes, ensuring that only pertinent objects are considered. Subsequently, these refined bounding boxes serve as specific prompts for the FastSAM model, which executes precise segmentation. Our VTPSeg is validated by experimental and ablation study results on five popular remote sensing image segmentation datasets.

machine learning, natural language, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2503.07911

Country:

Oceania > Australia (0.14)
Europe > Netherlands (0.14)
Asia > China (0.14)

Genre: Research Report (0.51)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Enhancing Abnormality Grounding for Vision Language Models with Knowledge Descriptions

Li, Jun, Liu, Che, Bai, Wenjia, Arcucci, Rossella, Bercea, Cosmin I., Schnabel, Julia A.

arXiv.org Artificial IntelligenceMar-5-2025

Visual Language Models (VLMs) have demonstrated impressive capabilities in visual grounding tasks. However, their effectiveness in the medical domain, particularly for abnormality detection and localization within medical images, remains underexplored. A major challenge is the complex and abstract nature of medical terminology, which makes it difficult to directly associate pathological anomaly terms with their corresponding visual features. In this work, we introduce a novel approach to enhance VLM performance in medical abnormality detection and localization by leveraging decomposed medical knowledge. Instead of directly prompting models to recognize specific abnormalities, we focus on breaking down medical concepts into fundamental attributes and common visual patterns. This strategy promotes a stronger alignment between textual descriptions and visual features, improving both the recognition and localization of abnormalities in medical images.We evaluate our method on the 0.23B Florence-2 base model and demonstrate that it achieves comparable performance in abnormality grounding to significantly larger 7B LLaVA-based medical VLMs, despite being trained on only 1.5% of the data used for such models. Experimental results also demonstrate the effectiveness of our approach in both known and previously unseen abnormalities, suggesting its strong generalization capabilities.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.03278

Country:

Europe > Germany (0.15)
North America > United States (0.14)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Reading Your Heart: Learning ECG Words and Sentences via Pre-training ECG Language Model

Jin, Jiarui, Wang, Haoyu, Li, Hongyan, Li, Jun, Pan, Jiahui, Hong, Shenda

arXiv.org Artificial IntelligenceFeb-15-2025

Electrocardiogram (ECG) is essential for the clinical diagnosis of arrhythmias and other heart diseases, but deep learning methods based on ECG often face limitations due to the need for high-quality annotations. Although previous ECG self-supervised learning (eSSL) methods have made significant progress in representation learning from unannotated ECG data, they typically treat ECG signals as ordinary time-series data, segmenting the signals using fixed-size and fixed-step time windows, which often ignore the form and rhythm characteristics and latent semantic relationships in ECG signals. In this work, we introduce a novel perspective on ECG signals, treating heartbeats as words and rhythms as sentences. Based on this perspective, we first designed the QRS-Tokenizer, which generates semantically meaningful ECG sentences from the raw ECG signals. Building on these, we then propose HeartLang, a novel self-supervised learning framework for ECG language processing, learning general representations at form and rhythm levels. Additionally, we construct the largest heartbeat-based ECG vocabulary to date, which will further advance the development of ECG language processing. We evaluated HeartLang across six public ECG datasets, where it demonstrated robust competitiveness against other eSSL methods. Our data and code are publicly available at https://github.com/PKUDigitalHealth/HeartLang. Electrocardiogram (ECG) is a common type of clinical data used to monitor cardiac activity, and is frequently employed in diagnosing cardiac diseases or conditions impairing myocardial function (Hong et al., 2020; Liu et al., 2021). A primary limitation of using supervised deep learning methods for ECG signal analysis is their dependency on largescale, expert-reviewed, annotated high-quality data.

ecg sentence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.10707

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SFADNet: Spatio-temporal Fused Graph based on Attention Decoupling Network for Traffic Prediction

Wu, Mei, Weng, Wenchao, Li, Jun, Lin, Yiqian, Chen, Jing, Seng, Dewen

arXiv.org Artificial IntelligenceJan-7-2025

In recent years, traffic flow prediction has played a crucial role in the management of intelligent transportation systems. However, traditional prediction methods are often limited by static spatial modeling, making it difficult to accurately capture the dynamic and complex relationships between time and space, thereby affecting prediction accuracy. This paper proposes an innovative traffic flow prediction network, SFADNet, which categorizes traffic flow into multiple traffic patterns based on temporal and spatial feature matrices. For each pattern, we construct an independent adaptive spatio-temporal fusion graph based on a cross-attention mechanism, employing residual graph convolution modules and time series modules to better capture dynamic spatio-temporal relationships under different fine-grained traffic patterns. Extensive experimental results demonstrate that SFADNet outperforms current state-of-the-art baselines across four large-scale datasets.

artificial intelligence, machine learning, proceedings, (17 more...)

arXiv.org Artificial Intelligence

2501.0406

Country: Asia > China (0.32)

Genre: Research Report (0.84)

Industry:

Telecommunications (0.58)
Transportation > Infrastructure & Services (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Data Science (0.95)

Add feedback

Pre-training a Density-Aware Pose Transformer for Robust LiDAR-based 3D Human Pose Estimation

An, Xiaoqi, Zhao, Lin, Gong, Chen, Li, Jun, Yang, Jian

arXiv.org Artificial IntelligenceDec-17-2024

With the rapid development of autonomous driving, LiDAR-based 3D Human Pose Estimation (3D HPE) is becoming a research focus. However, due to the noise and sparsity of LiDAR-captured point clouds, robust human pose estimation remains challenging. Most of the existing methods use temporal information, multi-modal fusion, or SMPL optimization to correct biased results. In this work, we try to obtain sufficient information for 3D HPE only by modeling the intrinsic properties of low-quality point clouds. Hence, a simple yet powerful method is proposed, which provides insights both on modeling and augmentation of point clouds. Specifically, we first propose a concise and effective density-aware pose transformer (DAPT) to get stable keypoint representations. By using a set of joint anchors and a carefully designed exchange module, valid information is extracted from point clouds with different densities. Then 1D heatmaps are utilized to represent the precise locations of the keypoints. Secondly, a comprehensive LiDAR human synthesis and augmentation method is proposed to pre-train the model, enabling it to acquire a better human body prior. We increase the diversity of point clouds by randomly sampling human positions and orientations and by simulating occlusions through the addition of laser-level masks. Extensive experiments have been conducted on multiple datasets, including IMU-annotated LidarHuman26M, SLOPER4D, and manually annotated Waymo Open Dataset v2.0 (Waymo), HumanM3. Our method demonstrates SOTA performance in all scenarios. In particular, compared with LPFormer on Waymo, we reduce the average MPJPE by $10.0mm$. Compared with PRN on SLOPER4D, we notably reduce the average MPJPE by $20.7mm$.

artificial intelligence, machine learning, point cloud, (12 more...)

arXiv.org Artificial Intelligence

2412.13454

Genre: Research Report (1.00)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model

Li, Yaoyu, Huang, Chaosheng, Yang, Dongsheng, Liu, Wenbo, Li, Jun

arXiv.org Artificial IntelligenceDec-5-2024

In this paper, a learning based Model Predictive Control (MPC) using a low dimensional residual model is proposed for autonomous driving. One of the critical challenge in autonomous driving is the complexity of vehicle dynamics, which impedes the formulation of accurate vehicle model. Inaccurate vehicle model can significantly impact the performance of MPC controller. To address this issue, this paper decomposes the nominal vehicle model into invariable and variable elements. The accuracy of invariable component is ensured by calibration, while the deviations in the variable elements are learned by a low-dimensional residual model. The features of residual model are selected as the physical variables most correlated with nominal model errors. Physical constraints among these features are formulated to explicitly define the valid region within the feature space. The formulated model and constraints are incorporated into the MPC framework and validated through both simulation and real vehicle experiments. The results indicate that the proposed method significantly enhances the model accuracy and controller performance.

artificial intelligence, machine learning, residual model, (15 more...)

arXiv.org Artificial Intelligence

2412.03874

Country:

Asia > China (0.14)
North America > United States (0.14)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Energy > Oil & Gas > Downstream (1.00)
Automobiles & Trucks (1.00)
Information Technology > Robotics & Automation (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences

Kuang, Haoxuan, Deng, Kunxiang, You, Linlin, Li, Jun

arXiv.org Artificial IntelligenceNov-27-2024

Electric vehicle charging demand prediction is important for vacant charging pile recommendation and charging infrastructure planning, thus facilitating vehicle electrification and green energy development. The performance of previous spatio-temporal studies is still far from satisfactory nowadays because urban region attributes and multivariate temporal influences are not adequately taken into account. To tackle these issues, we propose a learning approach for citywide electric vehicle charging demand prediction, named CityEVCP. To learn non-pairwise relationships in urban areas, we cluster service areas by the types and numbers of points of interest in the areas and develop attentive hypergraph networks accordingly. Graph attention mechanisms are employed for information propagation between neighboring areas. Additionally, we propose a variable selection network to adaptively learn dynamic auxiliary information and improve the Transformer encoder utilizing gated mechanisms for fluctuating charging time-series data. Experiments on a citywide electric vehicle charging dataset demonstrate the performances of our proposed approach compared with a broad range of competing baselines. Furthermore, we demonstrate the impact of dynamic influences on prediction results in different areas of the city and the effectiveness of our area clustering method.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2410.18766

Country: Asia > China > Guangdong Province (0.28)

Genre: Research Report (1.00)

Industry:

Transportation > Ground > Road (1.00)
Transportation > Electric Vehicle (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Kernel Orthogonality does not necessarily imply a Decrease in Feature Map Redundancy in CNNs: Convolutional Similarity Minimization

Belmekki, Zakariae, Li, Jun, Reuter, Patrick, Jáuregui, David Antonio Gómez, Jenkins, Karl

arXiv.org Artificial IntelligenceNov-5-2024

Convolutional Neural Networks (CNNs) have been heavily used in Deep Learning due to their success in various tasks. Nonetheless, it has been observed that CNNs suffer from redundancy in feature maps, leading to inefficient capacity utilization. Efforts to mitigate and solve this problem led to the emergence of multiple methods, amongst which is kernel orthogonality through variant means. In this work, we challenge the common belief that kernel orthogonality leads to a decrease in feature map redundancy, which is, supposedly, the ultimate objective behind kernel orthogonality. We prove, theoretically and empirically, that kernel orthogonality has an unpredictable effect on feature map similarity and does not necessarily decrease it. Based on our theoretical result, we propose an effective method to reduce feature map similarity independently of the input of the CNN. This is done by minimizing a novel loss function we call Convolutional Similarity. Empirical results show that minimizing the Convolutional Similarity increases the performance of classification models and can accelerate their convergence. Furthermore, using our proposed method pushes towards a more efficient use of the capacity of models, allowing the use of significantly smaller models to achieve the same levels of performance.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2411.03226

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Adversarial Federated Consensus Learning for Surface Defect Classification Under Data Heterogeneity in IIoT

Cui, Jixuan, Li, Jun, Mei, Zhen, Ni, Yiyang, Chen, Wen, Li, Zengxiang

arXiv.org Artificial IntelligenceOct-31-2024

The challenge of data scarcity hinders the application of deep learning in industrial surface defect classification (SDC), as it's difficult to collect and centralize sufficient training data from various entities in Industrial Internet of Things (IIoT) due to privacy concerns. Federated learning (FL) provides a solution by enabling collaborative global model training across clients while maintaining privacy. However, performance may suffer due to data heterogeneity-discrepancies in data distributions among clients. In this paper, we propose a novel personalized FL (PFL) approach, named Adversarial Federated Consensus Learning (AFedCL), for the challenge of data heterogeneity across different clients in SDC. First, we develop a dynamic consensus construction strategy to mitigate the performance degradation caused by data heterogeneity. Through adversarial training, local models from different clients utilize the global model as a bridge to achieve distribution alignment, alleviating the problem of global knowledge forgetting. Complementing this strategy, we propose a consensus-aware aggregation mechanism. It assigns aggregation weights to different clients based on their efficacy in global knowledge learning, thereby enhancing the global model's generalization capabilities. Finally, we design an adaptive feature fusion module to further enhance global knowledge utilization efficiency. Personalized fusion weights are gradually adjusted for each client to optimally balance global and local features. Compared with state-of-the-art FL methods like FedALA, the proposed AFedCL method achieves an accuracy increase of up to 5.67% on three SDC datasets.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2409.15711

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback