Goto

Collaborating Authors

 bsa


Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e

Neural Information Processing Systems

Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks.


AdRo-FL: Informed and Secure Client Selection for Federated Learning in the Presence of Adversarial Aggregator

arXiv.org Artificial Intelligence

Federated Learning (FL) enables collaborative learning without exposing clients' data. While clients only share model updates with the aggregator, studies reveal that aggregators can infer sensitive information from these updates. Secure Aggregation (SA) protects individual updates during transmission; however, recent work demonstrates a critical vulnerability where adversarial aggregators manipulate client selection to bypass SA protections, constituting a Biased Selection Attack (BSA). Although verifiable random selection prevents BSA, it precludes informed client selection essential for FL performance. We propose Adversarial Robust Federated Learning (AdRo-FL), which simultaneously enables: informed client selection based on client utility, and robust defense against BSA maintaining privacy-preserving aggregation. AdRo-FL implements two client selection frameworks tailored for distinct settings. The first framework assumes clients are grouped into clusters based on mutual trust, such as different branches of an organization. The second framework handles distributed clients where no trust relationships exist between them. For the cluster-oriented setting, we propose a novel defense against BSA by (1) enforcing a minimum client selection quota from each cluster, supervised by a cluster-head in every round, and (2) introducing a client utility function to prioritize efficient clients. For the distributed setting, we design a two-phase selection protocol: first, the aggregator selects the top clients based on our utility-driven ranking; then, a verifiable random function (VRF) ensures a BSA-resistant final selection. AdRo-FL also applies quantization to reduce communication overhead and sets strict transmission deadlines to improve energy efficiency. AdRo-FL achieves up to $1.85\times$ faster time-to-accuracy and up to $1.06\times$ higher final accuracy compared to insecure baselines.


Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models Ziyi Yin 1 Muchao Y e

Neural Information Processing Systems

Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks. However, the adversarial robustness of such models has not been fully explored. Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting, which is unrealistic. In this paper, we aim to investigate a new yet practical task to craft image and text perturbations using pre-trained VL models to attack black-box fine-tuned models on different downstream tasks.


Barycentric subspace analysis of network-valued data

arXiv.org Machine Learning

Certain data are naturally modeled by networks or weighted graphs, be they arterial networks or mobility networks. When there is no canonical labeling of the nodes across the dataset, we talk about unlabeled networks. In this paper, we focus on the question of dimensionality reduction for this type of data. More specifically, we address the issue of interpreting the feature subspace constructed by dimensionality reduction methods. Most existing methods for network-valued data are derived from principal component analysis (PCA) and therefore rely on subspaces generated by a set of vectors, which we identify as a major limitation in terms of interpretability. Instead, we propose to implement the method called barycentric subspace analysis (BSA), which relies on subspaces generated by a set of points. In order to provide a computationally feasible framework for BSA, we introduce a novel embedding for unlabeled networks where we replace their usual representation by equivalence classes of isomorphic networks with that by equivalence classes of cospectral networks. We then illustrate BSA on simulated and real-world datasets, and compare it to tangent PCA.


BSA: Ball Sparse Attention for Large-scale Geometries

arXiv.org Artificial Intelligence

Self-attention scales quadratically with input size, limiting its use for large-scale physical systems. Although sparse attention mechanisms provide a viable alternative, they are primarily designed for regular structures such as text or images, making them inapplicable for irregular geometries. In this work, we present Ball Sparse Attention (BSA), which adapts Native Sparse Attention (NSA) (Yuan et al., 2025) to unordered point sets by imposing regularity using the Ball Tree structure from the Erwin Transformer (Zhdanov et al., 2025). We modify NSA's components to work with ball-based neighborhoods, yielding a global receptive field at sub-quadratic cost. On an airflow pressure prediction task, we achieve accuracy comparable to Full Attention while significantly reducing the theoretical computational complexity. Our implementation is available at https://github.com/britacatalin/bsa.


A PyTorch-Compatible Spike Encoding Framework for Energy-Efficient Neuromorphic Applications

arXiv.org Artificial Intelligence

However, their incompatibility with traditi onal datasets, which consist of batches of input vectors rather t han spike trains, necessitates the development of efficient enc oding methods. This paper introduces a novel, open-source PyT orc h-compatible Python framework for spike encoding, designed f or neuromorphic applications in machine learning and reinfor cement learning. The framework supports a range of encoding algorithms, including Leaky Integrate-and-Fire (LIF), St ep Forward (SF), Pulse Width Modulation (PWM), and Ben's Spiker Algorithm (BSA), as well as specialized encoding strategie s covering population coding and reinforcement learning sce narios. Furthermore, we investigate the performance trade-offs of each method on embedded hardware using C/C++ implementations, considering energy consumption, computation time, spike s par-sity, and reconstruction accuracy. Our findings indicate th at SF typically achieves the lowest reconstruction error and off ers the highest energy efficiency and fastest encoding speed, achie ving the second-best spike sparsity. At the same time, other meth - ods demonstrate particular strengths depending on the sign al characteristics. This framework and the accompanying empi rical analysis provide valuable resources for selecting optimal encoding strategies for energy-efficient SNN applications.


Defining and Evaluating Visual Language Models' Basic Spatial Abilities: A Perspective from Psychometrics

arXiv.org Artificial Intelligence

The Theory of Multiple Intelligences underscores the hierarchical nature of cognitive capabilities. To advance Spatial Artificial Intelligence, we pioneer a psychometric framework defining five Basic Spatial Abilities (BSAs) in Visual Language Models (VLMs): Spatial Perception, Spatial Relation, Spatial Orientation, Mental Rotation, and Spatial Visualization. Benchmarking 13 mainstream VLMs through nine validated psychometric experiments reveals significant gaps versus humans (average score 24.95 vs. 68.38), with three key findings: 1) VLMs mirror human hierarchies (strongest in 2D orientation, weakest in 3D rotation) with independent BSAs (Pearson's r<0.4); 2) Smaller models such as Qwen2-VL-7B surpass larger counterparts, with Qwen leading (30.82) and InternVL2 lagging (19.6); 3) Interventions like chain-of-thought (0.100 accuracy gain) and 5-shot training (0.259 improvement) show limits from architectural constraints. Identified barriers include weak geometry encoding and missing dynamic simulation. By linking psychometric BSAs to VLM capabilities, we provide a diagnostic toolkit for spatial intelligence evaluation, methodological foundations for embodied AI development, and a cognitive science-informed roadmap for achieving human-like spatial intelligence.


Federated Sketching LoRA: On-Device Collaborative Fine-Tuning of Large Language Models

arXiv.org Artificial Intelligence

Fine-tuning large language models (LLMs) on devices is attracting increasing interest. Recent works have fused low-rank adaptation (LoRA) techniques with federated fine-tuning to mitigate challenges associated with device model sizes and data scarcity. Still, the heterogeneity of computational resources remains a critical bottleneck: while higher-rank modules generally enhance performance, varying device capabilities constrain LoRA's feasible rank range. Existing approaches attempting to resolve this issue either lack analytical justification or impose additional computational overhead, leaving a wide gap for an efficient and theoretically-grounded solution. To address these challenges, we propose federated sketching LoRA (FSLoRA), which leverages a sketching mechanism to enable devices to selectively update submatrices of global LoRA modules maintained by the server. By adjusting the sketching ratios, which determine the ranks of the submatrices on the devices, FSLoRA flexibly adapts to device-specific communication and computational constraints. We provide a rigorous convergence analysis of FSLoRA that characterizes how the sketching ratios affect the convergence rate. Through comprehensive experiments on multiple datasets and LLM models, we demonstrate FSLoRA's superior performance compared to various baselines.


Optimally Controlling the Timing of Energy Transfer in Elastic Joints: Experimental Validation of the Bi-Stiffness Actuation Concept

arXiv.org Artificial Intelligence

Elastic actuation taps into elastic elements' energy storage for dynamic motions beyond rigid actuation. While Series Elastic Actuators (SEA) and Variable Stiffness Actuators (VSA) are highly sophisticated, they do not fully provide control over energy transfer timing. To overcome this problem on the basic system level, the Bi-Stiffness Actuation (BSA) concept was recently proposed. Theoretically, it allows for full link decoupling, while simultaneously being able to lock the spring in the drive train via a switch-and-hold mechanism. Thus, the user would be in full control of the potential energy storage and release timing. In this work, we introduce an initial proof-of-concept of Bi-Stiffness-Actuation in the form of a 1-DoF physical prototype, which is implemented using a modular testbed. We present a hybrid system model, as well as the mechatronic implementation of the actuator. We corroborate the feasibility of the concept by conducting a series of hardware experiments using an open-loop control signal obtained by trajectory optimization. Here, we compare the performance of the prototype with a comparable SEA implementation. We show that BSA outperforms SEA 1) in terms of maximum velocity at low final times and 2) in terms of the movement strategy itself: The clutch mechanism allows the BSA to generate consistent launch sequences while the SEA has to rely on lengthy and possibly dangerous oscillatory swing-up motions. Furthermore, we demonstrate that providing full control authority over the energy transfer timing and link decoupling allows the user to synchronously release both elastic joint and gravitational energy. This facilitates the optimal exploitation of elastic and gravitational potentials in a synergistic manner.


BSA -- Bi-Stiffness Actuation for optimally exploiting intrinsic compliance and inertial coupling effects in elastic joint robots

arXiv.org Artificial Intelligence

Compliance in actuation has been exploited to generate highly dynamic maneuvers such as throwing that take advantage of the potential energy stored in joint springs. However, the energy storage and release could not be well-timed yet. On the contrary, for multi-link systems, the natural system dynamics might even work against the actual goal. With the introduction of variable stiffness actuators, this problem has been partially addressed. With a suitable optimal control strategy, the approximate decoupling of the motor from the link can be achieved to maximize the energy transfer into the distal link prior to launch. However, such continuous stiffness variation is complex and typically leads to oscillatory swing-up motions instead of clear launch sequences. To circumvent this issue, we investigate decoupling for speed maximization with a dedicated novel actuator concept denoted Bi-Stiffness Actuation. With this, it is possible to fully decouple the link from the joint mechanism by a switch-and-hold clutch and simultaneously keep the elastic energy stored. We show that with this novel paradigm, it is not only possible to reach the same optimal performance as with power-equivalent variable stiffness actuation, but even directly control the energy transfer timing. This is a major step forward compared to previous optimal control approaches, which rely on optimizing the full time-series control input.