AITopics | Zhang, Zhi

Collaborating Authors

Zhang, Zhi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Life-Cycle Routing Vulnerabilities of LLM Router

Lin, Qiqi, Ji, Xiaoyang, Zhai, Shengfang, Shen, Qingni, Zhang, Zhi, Fang, Yuejian, Gao, Yansong

arXiv.org Artificial IntelligenceMar-9-2025

Large language models (LLMs) have achieved remarkable success in natural language processing, yet their performance and computational costs vary significantly. LLM routers play a crucial role in dynamically balancing these tradeoffs. While previous studies have primarily focused on routing efficiency, security vulnerabilities throughout the entire LLM router life cycle, from training to inference, remain largely unexplored. In this paper, we present a comprehensive investigation into the life-cycle routing vulnerabilities of LLM routers. We evaluate both white-box and black-box adversarial robustness, as well as backdoor robustness, across several representative routing models under extensive experimental settings. Our experiments uncover several key findings: 1) Mainstream DNN-based routers tend to exhibit the weakest adversarial and backdoor robustness, largely due to their strong feature extraction capabilities that amplify vulnerabilities during both training and inference; 2) Training-free routers demonstrate the strongest robustness across different attack types, benefiting from the absence of learnable parameters that can be manipulated. These findings highlight critical security risks spanning the entire life cycle of LLM routers and provide insights for developing more robust models. In recent years, large language models (LLMs) such as GPT-3.5 (Brown et al., 2020), GPT-4 (Achiam et al., 2023), and PaLM 2 (Anil et al., 2023) have achieved significant progress in natural language processing tasks, finding widespread applications in open-domain dialogue, question answering, code generation, and other tasks (Gu, 2023; Zhuang et al., 2023; Ghosh et al., 2024). However, different LLMs vary in terms of training data, model size, and computational cost, leading to differences in their strengths, weaknesses, and overall capabilities. Generally, larger models tend to exhibit stronger performance but come with higher inference costs, whereas smaller models are more computationally efficient but have limited capability in handling complex tasks. LLM Routing (Ding et al., 2024; Ong et al., 2024; Hu et al., 2024) is a state-of-the-art optimization strategy designed to mitigate this trade-off and achieve a balance between response quality and computational cost.

large language model, machine learning, router, (17 more...)

arXiv.org Artificial Intelligence

2503.08704

Country:

Oceania > Australia (0.14)
North America > United States (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions

Xie, Guanwen, Xu, Jingzehua, Ding, Yimian, Zhang, Zhi, Zhang, Shuai, Li, Yi

arXiv.org Artificial IntelligenceMar-1-2025

The adaptivity and maneuvering capabilities of Autonomous Underwater Vehicles (AUVs) have drawn significant attention in oceanic research, due to the unpredictable disturbances and strong coupling among the AUV's degrees of freedom. In this paper, we developed large language model (LLM)-enhanced reinforcement learning (RL)-based adaptive S-surface controller for AUVs. Specifically, LLMs are introduced for the joint optimization of controller parameters and reward functions in RL training. Using multi-modal and structured explicit task feedback, LLMs enable joint adjustments, balance multiple objectives, and enhance task-oriented performance and adaptability. In the proposed controller, the RL policy focuses on upper-level tasks, outputting task-oriented high-level commands that the S-surface controller then converts into control signals, ensuring cancellation of nonlinear effects and unpredictable external disturbances in extreme sea conditions. Under extreme sea conditions involving complex terrain, waves, and currents, the proposed controller demonstrates superior performance and adaptability in high-level tasks such as underwater target tracking and data collection, outperforming traditional PID and SMC controllers.

controller, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.00527

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.46)

Add feedback

Beyond Words: Exploring Cultural Value Sensitivity in Multimodal Models

Yadav, Srishti, Zhang, Zhi, Hershcovich, Daniel, Shutova, Ekaterina

arXiv.org Artificial IntelligenceFeb-18-2025

Investigating value alignment in Large Language Models (LLMs) based on cultural context has become a critical area of research. However, similar biases have not been extensively explored in large vision-language models (VLMs). As the scale of multimodal models continues to grow, it becomes increasingly important to assess whether images can serve as reliable proxies for culture and how these values are embedded through the integration of both visual and textual data. In this paper, we conduct a thorough evaluation of multimodal model at different scales, focusing on their alignment with cultural values. Our findings reveal that, much like LLMs, VLMs exhibit sensitivity to cultural values, but their performance in aligning with these values is highly context-dependent. While VLMs show potential in improving value understanding through the use of images, this alignment varies significantly across contexts highlighting the complexities and underexplored challenges in the alignment of multimodal models.

arxiv preprint arxiv, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2502.14906

Country:

Asia (1.00)
Europe > Austria > Vienna (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.88)

Industry: Government (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Data-Efficient Model for Psychological Resilience Prediction based on Neurological Data

Zhang, Zhi, Liu, Yan, Gao, Mengxia, Yang, Yu, Cao, Jiannong, Hou, Wai Kai, Li, Shirley, Yau, Sonata, Wing, Yun Kwok, Lee, Tatia M. C.

arXiv.org Artificial IntelligenceFeb-3-2025

Psychological resilience, defined as the ability to rebound from adversity, is crucial for mental health. Compared with traditional resilience assessments through self-reported questionnaires, resilience assessments based on neurological data offer more objective results with biological markers, hence significantly enhancing credibility. This paper proposes a novel data-efficient model to address the scarcity of neurological data. We employ Neuro Kolmogorov-Arnold Networks as the structure of the prediction model. In the training stage, a new trait-informed multimodal representation algorithm with a smart chunk technique is proposed to learn the shared latent space with limited data. In the test stage, a new noise-informed inference algorithm is proposed to address the low signal-to-noise ratio of the neurological data. The proposed model not only shows impressive performance on both public datasets and self-constructed datasets but also provides some valuable psychological hypotheses for future research.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2502.01377

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.95)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

Mixture of Knowledge Minigraph Agents for Literature Review Generation

Zhang, Zhi, Liu, Yan, Zhong, Sheng-hua, Chen, Gong, Yang, Yu, Cao, Jiannong

arXiv.org Artificial IntelligenceJan-8-2025

Literature reviews play a crucial role in scientific research for understanding the current state of research, identifying gaps, and guiding future studies on specific topics. However, the process of conducting a comprehensive literature review is yet time-consuming. This paper proposes a novel framework, collaborative knowledge minigraph agents (CKMAs), to automate scholarly literature reviews. A novel prompt-based algorithm, the knowledge minigraph construction agent (KMCA), is designed to identify relations between concepts from academic literature and automatically constructs knowledge minigraphs. By leveraging the capabilities of large language models on constructed knowledge minigraphs, the multiple path summarization agent (MPSA) efficiently organizes concepts and relations from different viewpoints to generate literature review paragraphs. We evaluate CKMAs on three benchmark datasets. Experimental results show the effectiveness of the proposed method, further revealing promising applications of LLMs in scientific research.

large language model, natural language, relation, (16 more...)

arXiv.org Artificial Intelligence

2411.06159

Country: Asia > China > Guangdong Province (0.14)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Dense ReLU Neural Networks for Temporal-spatial Model

Zhang, Zhi, Padilla, Carlos Misael Madrid, Luo, Xiaokai, Wang, Daren, Padilla, Oscar Hernan Madrid

arXiv.org Machine LearningJan-4-2025

In this paper, we focus on fully connected deep neural networks utilizing the Rectified Linear Unit (ReLU) activation function for nonparametric estimation. We derive non-asymptotic bounds that lead to convergence rates, addressing both temporal and spatial dependence in the observed measurements. By accounting for dependencies across time and space, our models better reflect the complexities of real-world data, enhancing both predictive performance and theoretical robustness. We also tackle the curse of dimensionality by modeling the data on a manifold, exploring the intrinsic dimensionality of high-dimensional data. We broaden existing theoretical findings of temporal-spatial analysis by applying them to neural networks in more general contexts and demonstrate that our proof techniques are effective for models with short-range dependence. Our empirical simulations across various synthetic response functions underscore the superior performance of our method, outperforming established approaches in the existing literature. These findings provide valuable insights into the strong capabilities of dense neural networks (Dense NN) for temporal-spatial modeling across a broad range of function classes.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Machine Learning

2411.09961

Country: North America > United States > California (0.14)

Genre:

Research Report (0.81)
Overview (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Confidence Interval Construction and Conditional Variance Estimation with Dense ReLU Networks

Padilla, Carlos Misael Madrid, Padilla, Oscar Hernan Madrid, Kei, Yik Lun, Zhang, Zhi, Chen, Yanzhen

arXiv.org Machine LearningDec-31-2024

This paper addresses the problems of conditional variance estimation and confidence interval construction in nonparametric regression using dense networks with the Rectified Linear Unit (ReLU) activation function. We present a residual-based framework for conditional variance estimation, deriving nonasymptotic bounds for variance estimation under both heteroscedastic and homoscedastic settings. We relax the sub-Gaussian noise assumption, allowing the proposed bounds to accommodate sub-Exponential noise and beyond. Building on this, for a ReLU neural network estimator, we derive non-asymptotic bounds for both its conditional mean and variance estimation, representing the first result for variance estimation using ReLU networks. Furthermore, we develop a ReLU network based robust bootstrap procedure (Efron, 1992) for constructing confidence intervals for the true mean that comes with a theoretical guarantee on the coverage, providing a significant advancement in uncertainty quantification and the construction of reliable confidence intervals in deep learning settings.

artificial intelligence, estimation, machine learning, (17 more...)

arXiv.org Machine Learning

2412.20355

Country:

Europe (0.28)
North America > United States > California (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Add feedback

Cross-modal Information Flow in Multimodal Large Language Models

Zhang, Zhi, Yadav, Srishti, Han, Fengze, Shutova, Ekaterina

arXiv.org Artificial IntelligenceNov-27-2024

The recent advancements in auto-regressive multimodal large language models (MLLMs) have demonstrated promising progress for vision-language tasks. While there exists a variety of studies investigating the processing of linguistic information within large language models, little is currently known about the inner working mechanism of MLLMs and how linguistic and visual information interact within these models. In this study, we aim to fill this gap by examining the information flow between different modalities -- language and vision -- in MLLMs, focusing on visual question answering. Specifically, given an image-question pair as input, we investigate where in the model and how the visual and linguistic information are combined to generate the final prediction. Conducting experiments with a series of models from the LLaVA series, we find that there are two distinct stages in the process of integration of the two modalities. In the lower layers, the model first transfers the more general visual features of the whole image into the representations of (linguistic) question tokens. In the middle layers, it once again transfers visual information about specific objects relevant to the question to the respective token positions of the question. Finally, in the higher layers, the resulting multimodal representation is propagated to the last position of the input sequence for the final prediction. Overall, our findings provide a new and comprehensive perspective on the spatial and functional aspects of image and language processing in the MLLMs, thereby facilitating future research into multimodal information localization and editing.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.1862

Country:

Europe > Spain (0.14)
Europe > Netherlands (0.14)
Europe > Germany (0.14)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Proactive Gradient Conflict Mitigation in Multi-Task Learning: A Sparse Training Perspective

Zhang, Zhi, Shen, Jiayi, Cao, Congfeng, Dai, Gaole, Zhou, Shiji, Zhang, Qizhe, Zhang, Shanghang, Shutova, Ekaterina

arXiv.org Artificial IntelligenceNov-27-2024

Advancing towards generalist agents necessitates the concurrent processing of multiple tasks using a unified model, thereby underscoring the growing significance of simultaneous model training on multiple downstream tasks. A common issue in multi-task learning is the occurrence of gradient conflict, which leads to potential competition among different tasks during joint training. This competition often results in improvements in one task at the expense of deterioration in another. Although several optimization methods have been developed to address this issue by manipulating task gradients for better task balancing, they cannot decrease the incidence of gradient conflict. In this paper, we systematically investigate the occurrence of gradient conflict across different methods and propose a strategy to reduce such conflicts through sparse training (ST), wherein only a portion of the model's parameters are updated during training while keeping the rest unchanged. Our extensive experiments demonstrate that ST effectively mitigates conflicting gradients and leads to superior performance. Furthermore, ST can be easily integrated with gradient manipulation techniques, thus enhancing their effectiveness.

artificial intelligence, gradient conflict, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.18615

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Distributed Sign Momentum with Local Steps for Training Transformers

Yu, Shuhua, Zhou, Ding, Xie, Cong, Xu, An, Zhang, Zhi, Liu, Xin, Kar, Soummya

arXiv.org Artificial IntelligenceNov-26-2024

Pre-training Transformer models is resource-intensive, and recent studies have shown that sign momentum is an efficient technique for training large-scale deep learning models, particularly Transformers. However, its application in distributed training or federated learning remains underexplored. This paper investigates a novel communication-efficient distributed sign momentum method with local updates. Our proposed method allows for a broad class of base optimizers for local updates, and uses sign momentum in global updates, where momentum is generated from differences accumulated during local steps. We evaluate our method on the pre-training of various GPT-2 models, and the empirical results show significant improvement compared to other distributed methods with local updates. Furthermore, by approximating the sign operator with a randomized version that acts as a continuous analog in expectation, we present an $O(1/\sqrt{T})$ convergence for one instance of the proposed method for nonconvex smooth functions.

artificial intelligence, machine learning, optimization problem, (19 more...)

arXiv.org Artificial Intelligence

2411.17866

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback