AITopics | Sun, Jun

Collaborating Authors

Sun, Jun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AgentSpec: Customizable Runtime Enforcement for Safe and Reliable LLM Agents

Wang, Haoyu, Poskitt, Christopher M., Sun, Jun

arXiv.org Artificial IntelligenceMar-24-2025

Agents built on LLMs are increasingly deployed across diverse domains, automating complex decision-making and task execution. However, their autonomy introduces safety risks, including security vulnerabilities, legal violations, and unintended harmful actions. Existing mitigation methods, such as model-based safeguards and early enforcement strategies, fall short in robustness, interpretability, and adaptability. To address these challenges, we propose AgentSpec, a lightweight domain-specific language for specifying and enforcing runtime constraints on LLM agents. With AgentSpec, users define structured rules that incorporate triggers, predicates, and enforcement mechanisms, ensuring agents operate within predefined safety boundaries. We implement AgentSpec across multiple domains, including code execution, embodied agents, and autonomous driving, demonstrating its adaptability and effectiveness. Our evaluation shows that AgentSpec successfully prevents unsafe executions in over 90% of code agent cases, eliminates all hazardous actions in embodied agent tasks, and enforces 100% compliance by autonomous vehicles (AVs). Despite its strong safety guarantees, AgentSpec remains computationally lightweight, with overheads in milliseconds. By combining interpretability, modularity, and efficiency, AgentSpec provides a practical and scalable solution for enforcing LLM agent safety across diverse applications. We also automate the generation of rules using LLMs and assess their effectiveness. Our evaluation shows that the rules generated by OpenAI o1 achieve a precision of 95.56% and recall of 70.96% for embodied agents, successfully identifying 87.26% of the risky code, and prevent AVs from breaking laws in 5 out of 8 scenarios.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.18666

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (1.00)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

PALo: Learning Posture-Aware Locomotion for Quadruped Robots

Miao, Xiangyu, Sun, Jun, Lai, Hang, Di, Xinpeng, Cao, Jiahang, Yu, Yong, Zhang, Weinan

arXiv.org Artificial IntelligenceMar-6-2025

With the rapid development of embodied intelligence, locomotion control of quadruped robots on complex terrains has become a research hotspot. Unlike traditional locomotion control approaches focusing solely on velocity tracking, we pursue to balance the agility and robustness of quadruped robots on diverse and complex terrains. To this end, we propose an end-to-end deep reinforcement learning framework for posture-aware locomotion named PALo, which manages to handle simultaneous linear and angular velocity tracking and real-time adjustments of body height, pitch, and roll angles. In PALo, the locomotion control problem is formulated as a partially observable Markov decision process, and an asymmetric actor-critic architecture is adopted to overcome the sim-to-real challenge. Further, by incorporating customized training curricula, PALo achieves agile posture-aware locomotion control in simulated environments and successfully transfers to real-world settings without fine-tuning, allowing real-time control of the quadruped robot's locomotion and body posture across challenging terrains. Through in-depth experimental analysis, we identify the key components of PALo that contribute to its performance, further validating the effectiveness of the proposed method. The results of this study provide new possibilities for the low-level locomotion control of quadruped robots in higher dimensional command spaces and lay the foundation for future research on upper-level modules for embodied intelligence.

artificial intelligence, machine learning, robot, (16 more...)

arXiv.org Artificial Intelligence

2503.04462

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Zero-Shot Defense Against Toxic Images via Inherent Multimodal Alignment in LVLMs

Zhao, Wei, Li, Zhe, Li, Yige, Sun, Jun

arXiv.org Artificial IntelligenceFeb-25-2025

Large Vision-Language Models (LVLMs) have made significant strides in multimodal comprehension, thanks to extensive pre-training and fine-tuning on large-scale visual datasets. However, despite their robust textual safety mechanisms, they remain vulnerable to harmful visual inputs. Existing safeguards-typically relying on pre-filtering or fine-tuning-incur high costs and diminish overall utility. To address this critical vulnerability, we introduce SafeCLIP, a lightweight method that leverages LVLMs inherent multimodal alignment for zero-shot toxic image detection. By projecting CLIPs discarded CLS token into its text space and matching it with toxic descriptors, SafeCLIP detects harmful content without any architectural changes-adding minimal latency and enabling dynamic safety corrections during inference and fine-tuning.Experiments show that SafeCLIP achieves a 66.9% defense success rate with only 3.2% false positive rate and 7.2% overhead. In contrast, state-of-the-art methods achieve 52.9% success but have a 10.7% false positive rate and 210% overhead. Our work demonstrates that leveraging inherent multimodal alignment can yield efficient, low-cost LVLM safety. Code is available at anonymous.4open.science/r/safeclip-2C01.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.00037

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Verification of Bit-Flip Attacks against Quantized Neural Networks

Zhang, Yedi, Huang, Lei, Gao, Pengfei, Song, Fu, Sun, Jun, Dong, Jin Song

arXiv.org Artificial IntelligenceFeb-22-2025

In the rapidly evolving landscape of neural network security, the resilience of neural networks against bit-flip attacks (i.e., an attacker maliciously flips an extremely small amount of bits within its parameter storage memory system to induce harmful behavior), has emerged as a relevant area of research. Existing studies suggest that quantization may serve as a viable defense against such attacks. Recognizing the documented susceptibility of real-valued neural networks to such attacks and the comparative robustness of quantized neural networks (QNNs), in this work, we introduce BFAVerifier, the first verification framework designed to formally verify the absence of bit-flip attacks or to identify all vulnerable parameters in a sound and rigorous manner. BFAVerifier comprises two integral components: an abstraction-based method and an MILP-based method. Specifically, we first conduct a reachability analysis with respect to symbolic parameters that represent the potential bit-flip attacks, based on a novel abstract domain with a sound guarantee. If the reachability analysis fails to prove the resilience of such attacks, then we encode this verification problem into an equivalent MILP problem which can be solved by off-the-shelf solvers. Therefore, BFAVerifier is sound, complete, and reasonably efficient. We conduct extensive experiments, which demonstrate its effectiveness and efficiency across various network architectures, quantization bit-widths, and adversary capabilities.

artificial intelligence, machine learning, verification, (13 more...)

arXiv.org Artificial Intelligence

2502.16286

Country:

Asia > China (0.28)
North America > United States (0.28)

Genre:

Research Report > New Finding (0.92)
Research Report > Experimental Study (0.65)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Universal Semantic Embeddings of Chemical Elements for Enhanced Materials Inference and Discovery

Jia, Yunze, Xian, Yuehui, Xu, Yangyang, Dang, Pengfei, Ding, Xiangdong, Sun, Jun, Zhou, Yumei, Xue, Dezhen

arXiv.org Artificial IntelligenceFeb-19-2025

We present a framework for generating universal semantic embeddings of chemical elements to advance materials inference and discovery. This framework leverages ElementBERT, a domain - specific BERT - based natural language processing model trained on 1.29 million abstracts of alloy - related scientific papers, to capture latent knowledge and contextual relationships specific to alloys. These semantic embeddings serve as robust elemental descriptors, consistently outperforming traditional empirical descriptors with significant improvements across multiple downstream tasks . These include predicting mechanical and transformation properties, classifying phase structures, and optimizing materials properties via Bayesian optimization. Applications to titanium alloys, high - entropy alloys, and shape memory alloys demonstrate up to 23% gains in prediction accuracy. Our results show that ElementBERT surpasses general - purpose BERT variants by encoding specialized alloy knowledge. By bridging contextual insights from scientific literature with quantitative inference, our framework accelerates the discovery and optimization of advanced materials, with potential applications extending beyond alloys to other material classes.

elementbert, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.14912

Country: Asia > China (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Materials > Chemicals (0.72)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Safety at Scale: A Comprehensive Survey of Large Model Safety

Ma, Xingjun, Gao, Yifeng, Wang, Yixu, Wang, Ruofan, Wang, Xin, Sun, Ye, Ding, Yifan, Xu, Hengyuan, Chen, Yunhao, Zhao, Yunhan, Huang, Hanxun, Li, Yige, Zhang, Jiaming, Zheng, Xiang, Bai, Yang, Wu, Zuxuan, Qiu, Xipeng, Zhang, Jingfeng, Li, Yiming, Sun, Jun, Wang, Cong, Gu, Jindong, Wu, Baoyuan, Chen, Siheng, Zhang, Tianwei, Liu, Yang, Gong, Mingming, Liu, Tongliang, Pan, Shirui, Xie, Cihang, Pang, Tianyu, Dong, Yinpeng, Jia, Ruoxi, Zhang, Yang, Ma, Shiqing, Zhang, Xiangyu, Gong, Neil, Xiao, Chaowei, Erfani, Sarah, Li, Bo, Sugiyama, Masashi, Tao, Dacheng, Bailey, James, Jiang, Yu-Gang

arXiv.org Artificial IntelligenceFeb-12-2025

The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI). These models are now foundational to a wide range of applications, including conversational AI, recommendation systems, autonomous driving, content generation, medical diagnostics, and scientific discovery. However, their widespread deployment also exposes them to significant safety risks, raising concerns about robustness, reliability, and ethical implications. This survey provides a systematic review of current safety research on large models, covering Vision Foundation Models (VFMs), Large Language Models (LLMs), Vision-Language Pre-training (VLP) models, Vision-Language Models (VLMs), Diffusion Models (DMs), and large-model-based Agents. Our contributions are summarized as follows: (1) We present a comprehensive taxonomy of safety threats to these models, including adversarial attacks, data poisoning, backdoor attacks, jailbreak and prompt injection attacks, energy-latency attacks, data and model extraction attacks, and emerging agent-specific threats. (2) We review defense strategies proposed for each type of attacks if available and summarize the commonly used datasets and benchmarks for safety research. (3) Building on this, we identify and discuss the open challenges in large model safety, emphasizing the need for comprehensive safety evaluations, scalable and effective defense mechanisms, and sustainable data practices. More importantly, we highlight the necessity of collective efforts from the research community and international collaboration. Our work can serve as a useful reference for researchers and practitioners, fostering the ongoing development of comprehensive defense systems and platforms to safeguard AI models.

adversarial example, large language model, machine learning, (23 more...)

arXiv.org Artificial Intelligence

2502.05206

Country:

Asia (1.00)
North America > United States > Wisconsin (0.13)
North America > United States > Massachusetts (0.13)
(2 more...)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(3 more...)

Add feedback

Democratic Training Against Universal Adversarial Perturbations

Sun, Bing, Sun, Jun, Zhao, Wei

arXiv.org Artificial IntelligenceFeb-8-2025

Despite their advances and success, real-world deep neural networks are known to be vulnerable to adversarial attacks. Universal adversarial perturbation, an input-agnostic attack, poses a serious threat for them to be deployed in security-sensitive systems. In this case, a single universal adversarial perturbation deceives the model on a range of clean inputs without requiring input-specific optimization, which makes it particularly threatening. In this work, we observe that universal adversarial perturbations usually lead to abnormal entropy spectrum in hidden layers, which suggests that the prediction is dominated by a small number of ``feature'' in such cases (rather than democratically by many features). Inspired by this, we propose an efficient yet effective defense method for mitigating UAPs called \emph{Democratic Training} by performing entropy-based model enhancement to suppress the effect of the universal adversarial perturbations in a given model. \emph{Democratic Training} is evaluated with 7 neural networks trained on 5 benchmark datasets and 5 types of state-of-the-art universal adversarial attack methods. The results show that it effectively reduces the attack success rate, improves model robustness and preserves the model accuracy on clean samples.

artificial intelligence, democratic training, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.05542

Country:

Europe (1.00)
Asia (1.00)
North America > Canada > Ontario > Toronto (0.14)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.88)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training Verification-Friendly Neural Networks via Neuron Behavior Consistency

Liu, Zongxin, Zhao, Zhe, Song, Fu, Sun, Jun, Yang, Pengfei, Huang, Xiaowei, Zhang, Lijun

arXiv.org Artificial IntelligenceDec-29-2024

Formal verification provides critical security assurances for neural networks, yet its practical application suffers from the long verification time. This work introduces a novel method for training verification-friendly neural networks, which are robust, easy to verify, and relatively accurate. Our method integrates neuron behavior consistency into the training process, making neuron activation states remain consistent across different inputs within a local neighborhood. This reduces the number of unstable neurons and tightens the bounds of neurons thereby enhancing the network's verifiability. We evaluated our method using the MNIST, Fashion-MNIST, and CIFAR-10 datasets with various network architectures. The experimental results demonstrate that networks trained using our method are verification-friendly across different radii and architectures, whereas other tools fail to maintain verifiability as the radius increases. Additionally, we show that our method can be combined with existing approaches to further improve the verifiability of networks.

artificial intelligence, machine learning, neuron, (16 more...)

arXiv.org Artificial Intelligence

2412.13229

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.88)

Industry: Information Technology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Do Influence Functions Work on Large Language Models?

Li, Zhe, Zhao, Wei, Li, Yige, Sun, Jun

arXiv.org Artificial IntelligenceDec-19-2024

Influence functions are important for quantifying the impact of individual training data points on a model's predictions. Although extensive research has been conducted on influence functions in traditional machine learning models, their application to large language models (LLMs) has been limited. In this work, we conduct a systematic study to address a key question: do influence functions work on LLMs? Specifically, we evaluate influence functions across multiple tasks and find that they consistently perform poorly in most settings. Our further investigation reveals that their poor performance can be attributed to: (1) inevitable approximation errors when estimating the iHVP component due to the scale of LLMs, (2) uncertain convergence during fine-tuning, and, more fundamentally, (3) the definition itself, as changes in model parameters do not necessarily correlate with changes in LLM behavior. Thus, our study suggests the need for alternative approaches for identifying influential samples.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2409.19998

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.34)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Unleashing the Unseen: Harnessing Benign Datasets for Jailbreaking Large Language Models

Zhao, Wei, Li, Zhe, Li, Yige, Sun, Jun

arXiv.org Artificial IntelligenceDec-19-2024

Despite significant ongoing efforts in safety alignment, large language models (LLMs) such as GPT-4 and LLaMA 3 remain vulnerable to jailbreak attacks that can induce harmful behaviors, including through the use of adversarial suffixes. Building on prior research, we hypothesize that these adversarial suffixes are not mere bugs but may represent features that can dominate the LLM's behavior. To evaluate this hypothesis, we conduct several experiments. First, we demonstrate that benign features can be effectively made to function as adversarial suffixes, i.e., we develop a feature extraction method to extract sample-agnostic features from benign dataset in the form of suffixes and show that these suffixes may effectively compromise safety alignment. Second, we show that adversarial suffixes generated from jailbreak attacks may contain meaningful features, i.e., appending the same suffix to different prompts results in responses exhibiting specific characteristics. Third, we show that such benign-yet-safety-compromising features can be easily introduced through fine-tuning using only benign datasets. As a result, we are able to completely eliminate GPT's safety alignment in a blackbox setting through finetuning with only benign data. Our code and data is available at \url{https://github.com/suffix-maybe-feature/adver-suffix-maybe-features}.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.00451

Genre: Research Report > New Finding (0.93)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback