Goto

Collaborating Authors

 fragility


Hannah Fry: 'AI can do some superhuman things – but so can forklifts'

New Scientist

Hannah Fry: 'AI can do some superhuman things - but so can forklifts' Mathematician Hannah Fry travels to the front lines of AI in her new BBC documentary AI Confidential with Hannah Fry. The chances are that you think about artificial intelligence far more today than you did five years ago. Since ChatGPT was launched in November 2022, we have become accustomed to interacting with AIs in most spheres of life, from chatbots and smart home tech to banking and healthcare. But such rapid change brings unexpected problems - as mathematician and broadcaster Hannah Fry shows in AI Confidential With Hannah Fry, a new three-part BBC documentary in which she talks to people whose lives have been transformed by the technology. She spoke to New Scientist about how we should view AI, its role in modern mathematics - and why it will upend the global economy.


Democratizing LLM Efficiency: From Hyperscale Optimizations to Universal Deployability

arXiv.org Artificial Intelligence

Large language models (LLMs) have become indispensable, but the most celebrated efficiency methods -- mixture-of-experts (MoE), speculative decoding, and complex retrieval-augmented generation (RAG) -- were built for hyperscale providers with vast infrastructure and elite teams. Outside that context, their benefits collapse into overhead, fragility, and wasted carbon. The result is that a handful of Big Tech companies benefit, while thousands of hospitals, schools, governments, and enterprises are left without viable options. We argue that the next frontier is not greater sophistication at scale, but robust simplicity: efficiency that thrives under modest resources and minimal expertise. We propose a new research agenda: retrofitting pretrained models with more efficient architectures without retraining, inventing lightweight fine-tuning that preserves alignment, making reasoning economical despite long chains of thought, enabling dynamic knowledge management without heavy RAG pipelines, and adopting Overhead-Aware Efficiency (OAE) as a standard benchmark. By redefining efficiency to include adoption cost, sustainability, and fairness, we can democratize LLM deployment -- ensuring that optimization reduces inequality and carbon waste rather than amplifying them.


DeepForgeSeal: Latent Space-Driven Semi-Fragile Watermarking for Deepfake Detection Using Multi-Agent Adversarial Reinforcement Learning

arXiv.org Artificial Intelligence

Rapid advances in generative AI have led to increasingly realistic deepfakes, posing growing challenges for law enforcement and public trust. Existing passive deepfake detectors struggle to keep pace, largely due to their dependence on specific forgery artifacts, which limits their ability to generalize to new deepfake types. Proactive deepfake detection using watermarks has emerged to address the challenge of identifying high-quality synthetic media. However, these methods often struggle to balance robustness against benign distortions with sensitivity to malicious tampering. This paper introduces a novel deep learning framework that harnesses high-dimensional latent space representations and the Multi-Agent Adversarial Reinforcement Learning (MAARL) paradigm to develop a robust and adaptive watermarking approach. Specifically, we develop a learnable watermark embedder that operates in the latent space, capturing high-level image semantics, while offering precise control over message encoding and extraction. The MAARL paradigm empowers the learnable watermarking agent to pursue an optimal balance between robustness and fragility by interacting with a dynamic curriculum of benign and malicious image manipulations simulated by an adversarial attacker agent. Comprehensive evaluations on the CelebA and CelebA-HQ benchmarks reveal that our method consistently outperforms state-of-the-art approaches, achieving improvements of over 4.5% on CelebA and more than 5.3% on CelebA-HQ under challenging manipulation scenarios.


Rational Adversaries and the Maintenance of Fragility: A Game-Theoretic Theory of Rational Stagnation

arXiv.org Artificial Intelligence

Cooperative systems often remain in persistently suboptimal yet stable states. This paper explains such "rational stagnation" as an equilibrium sustained by a rational adversary whose utility follows the principle of potential loss, $u_{D} = U_{ideal} - U_{actual}$. Starting from the Prisoner's Dilemma, we show that the transformation $u_{i}' = a\,u_{i} + b\,u_{j}$ and the ratio of mutual recognition $w = b/a$ generate a fragile cooperation band $[w_{\min},\,w_{\max}]$ where both (C,C) and (D,D) are equilibria. Extending to a dynamic model with stochastic cooperative payoffs $R_{t}$ and intervention costs $(C_{c},\,C_{m})$, a Bellman-style analysis yields three strategic regimes: immediate destruction, rational stagnation, and intervention abandonment. The appendix further generalizes the utility to a reference-dependent nonlinear form and proves its stability under reference shifts, ensuring robustness of the framework. Applications to social-media algorithms and political trust illustrate how adversarial rationality can deliberately preserve fragility.


Position: Many generalization measures for deep learning are fragile

arXiv.org Artificial Intelligence

A wide variety of generalization measures have been applied to deep neural networks (DNNs). Although obtaining tight bounds remains challenging, such measures are often assumed to reproduce qualitative generalization trends. In this position paper, we argue that many post-mortem generalization measures -- those computed on trained networks -- are \textbf{fragile}: small training modifications that barely affect the underlying DNN can substantially change a measure's value, trend, or scaling behavior. For example, minor hyperparameter changes, such as learning rate adjustments or switching between SGD variants can reverse the slope of a learning curve in widely used generalization measures like the path norm. We also identify subtler forms of fragility. For instance, the PAC-Bayes origin measure is regarded as one of the most reliable, and is indeed less sensitive to hyperparameter tweaks than many other measures. However, it completely fails to capture differences in data complexity across learning curves. This data fragility contrasts with the function-based marginal-likelihood PAC-Bayes bound, which does capture differences in data-complexity, including scaling behavior, in learning curves, but which is not a post-mortem measure. Beyond demonstrating that many bounds -- such as path, spectral and Frobenius norms, flatness proxies, and deterministic PAC-Bayes surrogates -- are fragile, this position paper also argues that developers of new measures should explicitly audit them for fragility.


Understanding Sensitivity of Differential Attention through the Lens of Adversarial Robustness

arXiv.org Artificial Intelligence

Differential Attention (DA) has been proposed as a refinement to standard attention, suppressing redundant or noisy context through a subtractive structure and thereby reducing contextual hallucination. While this design sharpens task-relevant focus, we show that it also introduces a structural fragility under adversarial perturbations. Our theoretical analysis identifies negative gradient alignment-a configuration encouraged by DA's subtraction-as the key driver of sensitivity amplification, leading to increased gradient norms and elevated local Lipschitz constants. We empirically validate this Fragile Principle through systematic experiments on ViT/DiffViT and evaluations of pretrained CLIP/DiffCLIP, spanning five datasets in total. These results demonstrate higher attack success rates, frequent gradient opposition, and stronger local sensitivity compared to standard attention. Furthermore, depth-dependent experiments reveal a robustness crossover: stacking DA layers attenuates small perturbations via depth-dependent noise cancellation, though this protection fades under larger attack budgets. Overall, our findings uncover a fundamental trade-off: DA improves discriminative focus on clean inputs but increases adversarial vulnerability, underscoring the need to jointly design for selectivity and robustness in future attention mechanisms.


The Shape of Deceit: Behavioral Consistency and Fragility in Money Laundering Patterns

arXiv.org Artificial Intelligence

Conventional anti-money laundering (AML) systems predominantly focus on identifying anomalous entities or transactions, flagging them for manual investigation based on statistical deviation or suspicious behavior. This paradigm, however, misconstrues the true nature of money laundering, which is rarely anomalous but often deliberate, repeated, and concealed within consistent behavioral routines. In this paper, we challenge the entity-centric approach and propose a network-theoretic perspective that emphasizes detecting predefined laundering patterns across directed transaction networks. We introduce the notion of behavioral consistency as the core trait of laundering activity, and argue that such patterns are better captured through subgraph structures expressing semantic and functional roles - not solely geometry. Crucially, we explore the concept of pattern fragility: the sensitivity of laundering patterns to small attribute changes and, conversely, their semantic robustness even under drastic topological transformations. We claim that laundering detection should not hinge on statistical outliers, but on preservation of behavioral essence, and propose a reconceptualization of pattern similarity grounded in this insight. This philosophical and practical shift has implications for how AML systems model, scan, and interpret networks in the fight against financial crime.


The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning

Neural Information Processing Systems

Fairness metrics are a core tool in the fair machine learning literature (FairML),used to determine that ML models are, in some sense, "fair." Real-world data,however, are typically plagued by various measurement biases and other violatedassumptions, which can render fairness assessments meaningless. We adapt toolsfrom causal sensitivity analysis to the FairML context, providing a general frame-work which (1) accommodates effectively any combination of fairness metric andbias that can be posed in the "oblivious setting"; (2) allows researchers to inves-tigate combinations of biases, resulting in non-linear sensitivity; and (3) enablesflexible encoding of domain-specific constraints and assumptions. Employing thisframework, we analyze the sensitivity of the most common parity metrics under 3varieties of classifier across 14 canonical fairness datasets. Our analysis reveals thestriking fragility of fairness assessments to even minor dataset biases.


On the Fragility of Active Learners

arXiv.org Artificial Intelligence

Active learning (AL) techniques aim to maximally utilize a labeling budget by iteratively selecting instances that are most likely to improve prediction accuracy. However, their benefit compared to random sampling has not been consistent across various setups, e.g., different datasets, classifiers. In this empirical study, we examine how a combination of different factors might obscure any gains from an AL technique. Focusing on text classification, we rigorously evaluate AL techniques over around 1000 experiments that vary wrt the dataset, batch size, text representation and the classifier. We show that AL is only effective in a narrow set of circumstances. We also address the problem of using metrics that are better aligned with real world expectations. The impact of this study is in its insights for a practitioner: (a) the choice of text representation and classifier is as important as that of an AL technique, (b) choice of the right metric is critical in assessment of the latter, and, finally, (c) reported AL results must be holistically interpreted, accounting for variables other than just the query strategy.


PhyGrasp: Generalizing Robotic Grasping with Physics-informed Large Multimodal Models

arXiv.org Artificial Intelligence

Robotic grasping is a fundamental aspect of robot functionality, defining how robots interact with objects. Despite substantial progress, its generalizability to counter-intuitive or long-tailed scenarios, such as objects with uncommon materials or shapes, remains a challenge. In contrast, humans can easily apply their intuitive physics to grasp skillfully and change grasps efficiently, even for objects they have never seen before. This work delves into infusing such physical commonsense reasoning into robotic manipulation. We introduce PhyGrasp, a multimodal large model that leverages inputs from two modalities: natural language and 3D point clouds, seamlessly integrated through a bridge module. The language modality exhibits robust reasoning capabilities concerning the impacts of diverse physical properties on grasping, while the 3D modality comprehends object shapes and parts. With these two capabilities, PhyGrasp is able to accurately assess the physical properties of object parts and determine optimal grasping poses. Additionally, the model's language comprehension enables human instruction interpretation, generating grasping poses that align with human preferences. To train PhyGrasp, we construct a dataset PhyPartNet with 195K object instances with varying physical properties and human preferences, alongside their corresponding language descriptions. Extensive experiments conducted in the simulation and on the real robots demonstrate that PhyGrasp achieves state-of-the-art performance, particularly in long-tailed cases, e.g., about 10% improvement in success rate over GraspNet. Project page: https://sites.google.com/view/phygrasp