Goto

Collaborating Authors

 ori



Global Convergence of Four-Layer Matrix Factorization under Random Initialization

Luo, Minrui, Xu, Weihang, Gao, Xiang, Fazel, Maryam, Du, Simon Shaolei

arXiv.org Artificial Intelligence

Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-layer matrix factorization is well-established, no global convergence guarantee for general deep matrix factorization under random initialization has been established to date. To address this gap, we provide a polynomial-time global convergence guarantee for randomly initialized gradient descent on four-layer matrix factorization, given certain conditions on the target matrix and a standard balanced regularization term. Our analysis employs new techniques to show saddle-avoidance properties of gradient decent dynamics, and extends previous theories to characterize the change in eigenvalues of layer weights. Here F {C,R} as we consider both real and complex matrices in this paper. Following a long line of works (Arora et al., 2019a; Jiang et al., 2023; Y e & Du, 2021; Chou et al., 2024), we aim to understand the dynamics of gradient descent (GD) on this problem: j = 1,. . Work done while Minrui Luo was visiting the University of Washington. While the model representation power is independent of depth N, the deep matrix factorization problem is naturally motivated by the goal of understanding benefits of depth in deep learning (see, e.g., Arora et al. (2019b)).


Supplementary Material A Proofs

Neural Information Processing Systems

Let T be a finite set and L be a collection of subsets of T . Then ( T, L) is 2-extendible if For all C D, if D L then C L; Suppose C D L, and x be such that x / C and C { x } L. Then there exists Y D \C such that | Y | k and D \Y { x } L . To check the second condition, let x be an edge ( i,j). If x D, then the condition holds trivially with Y = . Now obviously each column contains at most two nonzero entries.






Concept-based Adversarial Attack: a Probabilistic Perspective

Zhang, Andi, Ding, Xuan, McDonagh, Steven, Kaski, Samuel

arXiv.org Artificial Intelligence

We propose a concept-based adversarial attack framework that extends beyond single-image perturbations by adopting a probabilistic perspective. Rather than modifying a single image, our method operates on an entire concept -- represented by a probabilistic generative model or a set of images -- to generate diverse adversarial examples. Preserving the concept is essential, as it ensures that the resulting adversarial images remain identifiable as instances of the original underlying category or identity. By sampling from this concept-based adversarial distribution, we generate images that maintain the original concept but vary in pose, viewpoint, or background, thereby misleading the classifier. Mathematically, this framework remains consistent with traditional adversarial attacks in a principled manner. Our theoretical and empirical results demonstrate that concept-based adversarial attacks yield more diverse adversarial examples and effectively preserve the underlying concept, while achieving higher attack efficiency.


from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

Yan, Yu, Sun, Sheng, Duan, Zenghao, Liu, Teli, Liu, Min, Yin, Zhiyi, Li, Qi, Lei, Jiangyu

arXiv.org Artificial Intelligence

Current studies have exposed the risk of Large Language Models (LLMs) generating harmful content by jailbreak attacks. However, they overlook that the direct generation of harmful content from scratch is more difficult than inducing LLM to calibrate benign content into harmful forms. In our study, we introduce a novel attack framework that exploits AdVersArial meTAphoR (AVATAR) to induce the LLM to calibrate malicious metaphors for jailbreaking. Specifically, to answer harmful queries, AVATAR adaptively identifies a set of benign but logically related metaphors as the initial seed. Then, driven by these metaphors, the target LLM is induced to reason and calibrate about the metaphorical content, thus jailbroken by either directly outputting harmful responses or calibrating residuals between metaphorical and professional harmful content. Experimental results demonstrate that AVATAR can effectively and transferable jailbreak LLMs and achieve a state-of-the-art attack success rate across multiple advanced LLMs.


ORI: O Routing Intelligence

Shadid, Ahmad, Kumar, Rahul, Mayank, Mohit

arXiv.org Artificial Intelligence

Single large language models (LLMs) often fall short when faced with the ever-growing range of tasks, making a single-model approach insufficient. We address this challenge by proposing ORI (O Routing Intelligence), a dynamic framework that leverages a set of LLMs. By intelligently routing incoming queries to the most suitable model, ORI not only improves task-specific accuracy, but also maintains efficiency. Comprehensive evaluations across diverse benchmarks demonstrate consistent accuracy gains while controlling computational overhead. By intelligently routing queries, ORI outperforms the strongest individual models by up to 2.7 points on MMLU and 1.8 points on MuSR, ties the top performance on ARC, and on BBH. These results underscore the benefits of a multi-model strategy and demonstrate how ORI's adaptive architecture can more effectively handle diverse tasks, offering a scalable, high-performance solution for a system of multiple large language models.