AITopics | Wang, Ziming

Collaborating Authors

Wang, Ziming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

Chen, Peng, Bu, Pi, Wang, Yingyao, Wang, Xinyi, Wang, Ziming, Guo, Jie, Zhao, Yingxiu, Zhu, Qi, Song, Jun, Yang, Siran, Wang, Jiamang, Zheng, Bo

arXiv.org Artificial IntelligenceMar-12-2025

Recent advances in Vision-Language-Action models (VLAs) have expanded the capabilities of embodied intelligence. However, significant challenges remain in real-time decision-making in complex 3D environments, which demand second-level responses, high-resolution perception, and tactical reasoning under dynamic conditions. To advance the field, we introduce CombatVLA, an efficient VLA model optimized for combat tasks in 3D action role-playing games(ARPGs). Specifically, our CombatVLA is a 3B model trained on video-action pairs collected by an action tracker, where the data is formatted as action-of-thought (AoT) sequences. Thereafter, CombatVLA seamlessly integrates into an action execution framework, allowing efficient inference through our truncated AoT strategy. Experimental results demonstrate that CombatVLA not only outperforms all existing models on the combat understanding benchmark but also achieves a 50-fold acceleration in game combat. Moreover, it has a higher task success rate than human players. We will open-source all resources, including the action tracker, dataset, benchmark, model weights, training code, and the implementation of the framework at https://combatvla.github.io/.

large language model, machine learning, real time system, (23 more...)

arXiv.org Artificial Intelligence

2503.09527

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(4 more...)

Add feedback

FanChuan: A Multilingual and Graph-Structured Benchmark For Parody Detection and Analysis

Zheng, Yilun, Li, Sha, Wu, Fangkun, Ziyi, Yang, Hongchao, Lin, Hu, Zhichao, Xinjun, Cai, Wang, Ziming, Chen, Jinxuan, Luan, Sitao, Xu, Jiahao, Chen, Lihui

arXiv.org Artificial IntelligenceFeb-23-2025

Parody is an emerging phenomenon on social media, where individuals imitate a role or position opposite to their own, often for humor, provocation, or controversy. Detecting and analyzing parody can be challenging and is often reliant on context, yet it plays a crucial role in understanding cultural values, promoting subcultures, and enhancing self-expression. However, the study of parody is hindered by limited available data and deficient diversity in current datasets. To bridge this gap, we built seven parody datasets from both English and Chinese corpora, with 14,755 annotated users and 21,210 annotated comments in total. To provide sufficient context information, we also collect replies and construct user-interaction graphs to provide richer contextual information, which is lacking in existing datasets. With these datasets, we test traditional methods and Large Language Models (LLMs) on three key tasks: (1) parody detection, (2) comment sentiment analysis with parody, and (3) user sentiment analysis with parody. Our extensive experiments reveal that parody-related tasks still remain challenging for all models, and contextual information plays a critical role. Interestingly, we find that, in certain scenarios, traditional sentence embedding methods combined with simple classifiers can outperform advanced LLMs, i.e. DeepSeek-R1 and GPT-o3, highlighting parody as a significant challenge for LLMs.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2502.16503

Country:

Asia (0.28)
North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

"See the World, Discover Knowledge": A Chinese Factuality Evaluation for Large Vision Language Models

Gu, Jihao, Wang, Yingyao, Bu, Pi, Wang, Chen, Wang, Ziming, Song, Tengtao, Wei, Donglai, Yuan, Jiale, Zhao, Yingxiu, He, Yancheng, Li, Shilong, Liu, Jiaheng, Cao, Meng, Song, Jun, Tan, Yingshui, Li, Xiang, Su, Wenbo, Zheng, Zhicheng, Zhu, Xiaoyong, Zheng, Bo

arXiv.org Artificial IntelligenceFeb-17-2025

The evaluation of factual accuracy in large vision language models (LVLMs) has lagged behind their rapid development, making it challenging to fully reflect these models' knowledge capacity and reliability. In this paper, we introduce the first factuality-based visual question-answering benchmark in Chinese, named ChineseSimpleVQA, aimed at assessing the visual factuality of LVLMs across 8 major topics and 56 subtopics. The key features of this benchmark include a focus on the Chinese language, diverse knowledge types, a multi-hop question construction, high-quality data, static consistency, and easy-to-evaluate through short answers. Moreover, we contribute a rigorous data construction pipeline and decouple the visual factuality into two parts: seeing the world (i.e., object recognition) and discovering knowledge. This decoupling allows us to analyze the capability boundaries and execution mechanisms of LVLMs. Subsequently, we evaluate 34 advanced open-source and closed-source models, revealing critical performance gaps within this field.

artificial intelligence, chinese factuality evaluation, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.11718

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (0.89)
Information Technology > Artificial Intelligence > Natural Language (0.89)
Information Technology > Data Science > Data Mining > Knowledge Discovery (0.40)

Add feedback

FAST-LIVO2 on Resource-Constrained Platforms: LiDAR-Inertial-Visual Odometry with Efficient Memory and Computation

Zhou, Bingyang, Zheng, Chunran, Wang, Ziming, Zhu, Fangcheng, Cai, Yixi, Zhang, Fu

arXiv.org Artificial IntelligenceJan-23-2025

This paper presents a lightweight LiDAR-inertial-visual odometry system optimized for resource-constrained platforms. It integrates a degeneration-aware adaptive visual frame selector into error-state iterated Kalman filter (ESIKF) with sequential updates, improving computation efficiency significantly while maintaining a similar level of robustness. Additionally, a memory-efficient mapping structure combining a locally unified visual-LiDAR map and a long-term visual map achieves a good trade-off between performance and memory usage. Extensive experiments on x86 and ARM platforms demonstrate the system's robustness and efficiency. On the Hilti dataset, our system achieves a 33% reduction in per-frame runtime and 47% lower memory usage compared to FAST-LIVO2, with only a 3 cm increase in RMSE. Despite this slight accuracy trade-off, our system remains competitive, outperforming state-of-the-art (SOTA) LIO methods such as FAST-LIO2 and most existing LIVO systems. These results validate the system's capability for scalable deployment on resource-constrained edge computing platforms.

artificial intelligence, fast-livo2, platform, (15 more...)

arXiv.org Artificial Intelligence

2501.13876

Country:

Asia > China (0.14)
Europe > Sweden (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (0.98)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Sensing and Signal Processing (0.66)

Add feedback

Procedural Fairness and Its Relationship with Distributive Fairness in Machine Learning

Wang, Ziming, Huang, Changwu, Tang, Ke, Yao, Xin

arXiv.org Artificial IntelligenceJan-12-2025

Fairness in machine learning (ML) has garnered significant attention in recent years. While existing research has predominantly focused on the distributive fairness of ML models, there has been limited exploration of procedural fairness. This paper proposes a novel method to achieve procedural fairness during the model training phase. The effectiveness of the proposed method is validated through experiments conducted on one synthetic and six real-world datasets. Additionally, this work studies the relationship between procedural fairness and distributive fairness in ML models. On one hand, the impact of dataset bias and the procedural fairness of ML model on its distributive fairness is examined. The results highlight a significant influence of both dataset bias and procedural fairness on distributive fairness. On the other hand, the distinctions between optimizing procedural and distributive fairness metrics are analyzed. Experimental results demonstrate that optimizing procedural fairness metrics mitigates biases introduced or amplified by the decision-making process, thereby ensuring fairness in the decision-making process itself, as well as improving distributive fairness. In contrast, optimizing distributive fairness metrics encourages the ML model's decision-making process to favor disadvantaged groups, counterbalancing the inherent preferences for advantaged groups present in the dataset and ultimately achieving distributive fairness.

artificial intelligence, machine learning, ml model, (15 more...)

arXiv.org Artificial Intelligence

2501.06753

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Enhancing Autonomous Driving Safety through World Model-Based Predictive Navigation and Adaptive Learning Algorithms for 5G Wireless Applications

Ding, Hong, Wang, Ziming, Ding, Yi, Lin, Hongjie, Xi, SuYang, Kang, Chia Chao

arXiv.org Artificial IntelligenceNov-25-2024

Addressing the challenge of ensuring safety in ever-changing and unpredictable environments, particularly in the swiftly advancing realm of autonomous driving in today's 5G wireless communication world, we present Navigation Secure (NavSecure). This vision-based navigation framework merges the strengths of world models with crucial safety-focused decision-making capabilities, enabling autonomous vehicles to navigate real-world complexities securely. Our approach anticipates potential threats and formulates safer routes by harnessing the predictive capabilities of world models, thus significantly reducing the need for extensive real-world trial-and-error learning. Additionally, our method empowers vehicles to autonomously learn and develop through continuous practice, ensuring the system evolves and adapts to new challenges. Incorporating radio frequency technology, NavSecure leverages 5G networks to enhance real-time data exchange, improving communication and responsiveness. Validated through rigorous experiments under simulation-to-real driving conditions, NavSecure has shown exceptional performance in safety-critical scenarios, such as sudden obstacle avoidance. Results indicate that NavSecure excels in key safety metrics, including collision prevention and risk reduction, surpassing other end-to-end methodologies. This framework not only advances autonomous driving safety but also demonstrates how world models can enhance decision-making in critical applications. NavSecure sets a new standard for developing more robust and trustworthy autonomous driving systems, capable of handling the inherent dynamics and uncertainties of real-world environments.

artificial intelligence, machine learning, scenario, (18 more...)

arXiv.org Artificial Intelligence

2411.15042

Country: Asia > Malaysia (0.16)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Robotics & Automation (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Rethinking Structure Learning For Graph Neural Networks

Zheng, Yilun, Zhang, Zhuofan, Wang, Ziming, Li, Xiang, Luan, Sitao, Peng, Xiaojiang, Chen, Lihui

arXiv.org Artificial IntelligenceNov-12-2024

To improve the performance of Graph Neural Networks (GNNs), Graph Structure Learning (GSL) has been extensively applied to reconstruct or refine original graph structures, effectively addressing issues like heterophily, over-squashing, and noisy structures. While GSL is generally thought to improve GNN performance, it often leads to longer training times and more hyperparameter tuning. Besides, the distinctions among current GSL methods remain ambiguous from the perspective of GNN training, and there is a lack of theoretical analysis to quantify their effectiveness. Recent studies further suggest that, under fair comparisons with the same hyperparameter tuning, GSL does not consistently outperform baseline GNNs. This motivates us to ask a critical question: is GSL really useful for GNNs? To address this question, this paper makes two key contributions. First, we propose a new GSL framework, which includes three steps: GSL base (the representation used for GSL) construction, new structure construction, and view fusion, to better understand the effectiveness of GSL in GNNs. Second, after graph convolution, we analyze the differences in mutual information (MI) between node representations derived from the original topology and those from the newly constructed topology. Surprisingly, our empirical observations and theoretical analysis show that no matter which type of graph structure construction methods are used, after feeding the same GSL bases to the newly constructed graph, there is no MI gain compared to the original GSL bases. To fairly reassess the effectiveness of GSL, we conduct ablation experiments and find that it is the pretrained GSL bases that enhance GNN performance, and in most cases, GSL cannot improve GNN performance. This finding encourages us to rethink the essential components in GNNs, such as self-training and structural encoding, in GNN design rather than GSL.

artificial intelligence, graph, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.07672

Country: North America > United States (0.70)

Genre: Research Report > New Finding (0.93)

Industry: Information Technology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Disentangling Heterogeneous Knowledge Concept Embedding for Cognitive Diagnosis on Untested Knowledge

Xiao, Kui, Xing, Runtian, Zhang, Miao, Tan, Shunfeng, Wang, Ziming, Zhu, Xiaolian

arXiv.org Artificial IntelligenceMay-24-2024

Cognitive diagnosis is a fundamental and critical task in learning assessment, which aims to infer students' proficiency on knowledge concepts from their response logs. Current works assume each knowledge concept will certainly be tested and covered by multiple exercises. However, whether online or offline courses, it's hardly feasible to completely cover all knowledge concepts in several exercises. Restricted tests lead to undiscovered knowledge deficits, especially untested knowledge concepts(UKCs). In this paper, we propose a novel \underline{Dis}entangling Heterogeneous \underline{K}nowledge \underline{C}ognitive \underline{D}iagnosis framework on untested knowledge(DisKCD). Specifically, we leverage course grades, exercise questions, and resources to learn the potential representations of students, exercises, and knowledge concepts. In particular, knowledge concepts are disentangled into tested and untested based on the limiting actual exercises. We construct a heterogeneous relation graph network via students, exercises, tested knowledge concepts(TKCs), and UKCs. Then, through a hierarchical heterogeneous message-passing mechanism, the fine-grained relations are incorporated into the embeddings of the entities. Finally, the embeddings will be applied to multiple existing cognitive diagnosis models to infer students' proficiency on UKCs. Experimental results on real-world datasets show that the proposed model can effectively improve the performance of the task of diagnosing students' proficiency on UKCs. Our anonymous code is available at https://anonymous.4open.science/r/DisKCD.

data mining, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.16003

Country:

Europe > United Kingdom > British North Sea (0.92)
Atlantic Ocean > North Sea > British North Sea (0.92)
North America > United States > New York > New York County > New York City (0.14)

Genre:

Research Report (1.00)
Instructional Material (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.68)
Education > Assessment & Standards > Student Performance (0.47)
Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Communications (0.93)
(2 more...)

Add feedback

Potential and Limitations of LLMs in Capturing Structured Semantics: A Case Study on SRL

Cheng, Ning, Yan, Zhaohui, Wang, Ziming, Li, Zhijie, Yu, Jiaming, Zheng, Zilong, Tu, Kewei, Xu, Jinan, Han, Wenjuan

arXiv.org Artificial IntelligenceMay-10-2024

Large Language Models (LLMs) play a crucial role in capturing structured semantics to enhance language understanding, improve interpretability, and reduce bias. Nevertheless, an ongoing controversy exists over the extent to which LLMs can grasp structured semantics. To assess this, we propose using Semantic Role Labeling (SRL) as a fundamental task to explore LLMs' ability to extract structured semantics. In our assessment, we employ the prompting approach, which leads to the creation of our few-shot SRL parser, called PromptSRL. PromptSRL enables LLMs to map natural languages to explicit semantic structures, which provides an interpretable window into the properties of LLMs. We find interesting potential: LLMs can indeed capture semantic structures, and scaling-up doesn't always mirror potential. Additionally, limitations of LLMs are observed in C-arguments, etc. Lastly, we are surprised to discover that significant overlap in the errors is made by both LLMs and untrained humans, accounting for almost 30% of all errors.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.0641

Country:

Asia > China (0.29)
North America > United States > Colorado (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL

Qin, Lang, Wang, Ziming, Jiang, Runhao, Yan, Rui, Tang, Huajin

arXiv.org Artificial IntelligenceApr-23-2024

Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (T AP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2404.15597

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback