Goto

Collaborating Authors

 pin-yu chen


Computational Safety for Generative AI: A Signal Processing Perspective

arXiv.org Machine Learning

AI safety is a rapidly growing area of research that seeks to prevent the harm and misuse of frontier AI technology, particularly with respect to generative AI (GenAI) tools that are capable of creating realistic and high-quality content through text prompts. Examples of such tools include large language models (LLMs) and text-to-image (T2I) diffusion models. As the performance of various leading GenAI models approaches saturation due to similar training data sources and neural network architecture designs, the development of reliable safety guardrails has become a key differentiator for responsibility and sustainability. This paper presents a formalization of the concept of computational safety, which is a mathematical framework that enables the quantitative assessment, formulation, and study of safety challenges in GenAI through the lens of signal processing theory and methods. In particular, we explore two exemplary categories of computational safety challenges in GenAI that can be formulated as hypothesis testing problems. For the safety of model input, we show how sensitivity analysis and loss landscape analysis can be used to detect malicious prompts with jailbreak attempts. For the safety of model output, we elucidate how statistical signal processing and adversarial learning can be used to detect AI-generated content. Finally, we discuss key open research challenges, opportunities, and the essential role of signal processing in computational AI safety. Signal processing has played a pivotal role in ensuring the stability, security, and efficiency of numerous engineering systems and information technologies, including, but not limited to, telecommunications, information forensics and security, machine learning, data science, and control systems. With the recent advances, wide accessibility, and deep integration of generative AI (GenAI) tools into our society and technology, such as ChatGPT and the emerging agentic AI applications, understanding and mitigating the associated risks of the so-called "frontier AI technology" is essential to ensure a responsible and sustainable use of GenAI. In addition, as the performance of state-ofthe-art GanAI models surpasses that of an average human in certain tasks, but reaches a plateau in standardized capability evaluation benchmarks due to similar training data sources and neural network architecture design (e.g., the use of decoder-only transformers), improving and ensuring safety is becoming the new arms race among GenAI stakeholders. EU AI Act, AI safety institutes, etc.), there are growing concerns about the broader socio-technical impacts [1].


When Does Visual Prompting Outperform Linear Probing for Vision-Language Models? A Likelihood Perspective

arXiv.org Artificial Intelligence

When applying transfer learning to downstream tasks, specific modifications to the pre-trained model are required. For instance, linear probing (LP) involves adjusting the linear layer in the model's penultimate layer, while full fine-tuning involves modifying all parameters in the model. However, in the emerging field of fine-tuning for transfer learning, visual prompting (VP) (Bahng et al., 2022; Chen, 2024) offers a method that does not necessitate changes to the pre-trained model. Specifically, studies such as CLIP-VP (Bahng et al., 2022) and AutoVP (Tsao et al., 2024) indicate that visual prompting is particularly suitable for out-of-distribution (OOD) datasets. In AutoVP, the authors observed that datasets with lower confidence scores, indicative of being more OOD, tend to achieve greater accuracy gains (i.e., the performance difference between VP and LP).


Convex Bounds on the Softmax Function with Applications to Robustness Verification

arXiv.org Artificial Intelligence

The softmax function is a ubiquitous component at the output of neural networks and increasingly in intermediate layers as well. This paper provides convex lower bounds and concave upper bounds on the softmax function, which are compatible with convex optimization formulations for characterizing neural networks and other ML models. We derive bounds using both a natural exponential-reciprocal decomposition of the softmax as well as an alternative decomposition in terms of the log-sum-exp function. The new bounds are provably and/or numerically tighter than linear bounds obtained in previous work on robustness verification of transformers. As illustrations of the utility of the bounds, we apply them to verification of transformers as well as of the robustness of predictive uncertainty estimates of deep ensembles.


Holistic Adversarial Robustness of Deep Learning Models

arXiv.org Artificial Intelligence

Adversarial robustness studies the worst-case performance of a machine learning model to ensure safety and reliability. With the proliferation of deep-learning based technology, the potential risks associated with model development and deployment can be amplified and become dreadful vulnerabilities. This paper provides a comprehensive overview of research topics and foundational principles of research methods for adversarial robustness of deep learning models, including attacks, defenses, verification, and novel applications.