bilinear
Generalizable Multi-Linear Attention Network
The majority of existing multimodal sequential learning methods focus on how to obtain powerful individual representations and neglect to effectively capture the multimodal joint representation. Bilinear attention network (BAN) is a commonly used integration method, which leverages tensor operations to associate the features of different modalities. However, BAN has a poor compatibility for more modalities, since the computational complexity of the attention map increases exponentially with the number of modalities. Based on this concern, we propose a new method called generalizable multi-linear attention network (MAN), which can associate more modalities in acceptable complexity with hierarchical approximation decomposition. Specifically, considering the fact that softmax attention kernels cannot be decomposed as linear operation directly, we adopt the addition random features mechanism to approximate the non-linear softmax functions with enough theoretical analysis. Furthermore, we also introduce the local sequential constraints, which can be combined with ARF conveniently, as positional information. We conduct extensive experiments on several datasets of corresponding tasks, the experimental results show that MAN could achieve competitive results compared with baseline methods, showcasing the effectiveness of our contributions.
959ef477884b6ac2241b19ee4fb776ae-AuthorFeedback.pdf
The proposed group bilinear requires the intra-group channels to be highly5 correlated (refer tothedefinitioninQ3.1),andtheproposed semantic grouping canbetter satisfy suchrequirements6 than MA-CNN [9]. Specifically,[9] adopts the idea of k-means, which optimizes each channel to its cluster center.7 Note that the notations aboveare the same with Eqn.16 (3),andthepairwisecorrelationis dij = Thanks foryour comments.Aisanapproximate indexmapping20 matrix, whose rows are constrained to be (approximate) one-hot vectors via asoftmax with small "temperature".21 Q2.2Inconsistentnotations. Thanks for your comments, and we will correct the notation "stage 3,4" into "Stage28 IV,V"respectively. Designing suitable grouping methods plays a42 keyrole.
GeneralizableMulti-LinearAttentionNetwork
The majority of existing multimodal sequential learning methods focus on how to obtain powerful individual representations and neglect to effectively capture themultimodal joint representation. Bilinear attention network (BAN) isacommonly used integration method, which leverages tensor operations to associate thefeatures ofdifferent modalities.
S$^3$F-Net: A Multi-Modal Approach to Medical Image Classification via Spatial-Spectral Summarizer Fusion Network
Siddiqui, Md. Saiful Bari, Bhuiyan, Mohammed Imamul Hassan
Abstract-- Convolutional Neural Networks (CNNs) have become a cornerstone of medical image analysis due to their proficiency in learning hierarchical spatial features. However, this focus on a single domain is inefficient at capturing global, holistic patterns and fails to explicitly model an image's frequency-domain characteristics. To address these challenges, we propose the Spatial-Spectral Summa-rizer Fusion Network (S F-Net), a dual-branch framework that learns from both spatial and spectral representations simultaneously. The S F-Net performs a fusion of a deep spatial CNN with our proposed shallow spectral encoder, SpectraNet. SpectraNet features the proposed SpectralFilter layer, which leverages the Convolution Theorem by applying a bank of learnable filters directly to an image's full Fourier spectrum via a computation-efficient element-wise multiplication. This allows the SpectralFilter layer to attain a global receptive field instantaneously, with its output being distilled by a lightweight summarizer network. We evaluate S F-Net across four diverse medical imaging datasets spanning different scales and modalities: HAM10000 (dermoscopy), BUSI (ultrasound), BRISC2025 (MRI), and Chest X-Ray Pneumonia (radiography), to validate its efficacy and generalizability, and reveal the task-dependent nature of the optimal fusion strategy. Our framework consistently and significantly outperforms its strong spatial-only baseline in all cases, with accuracy improvements of up to 5.13%. With a powerful Bilinear Fusion, S F-Net achieves a state-of-the-art competitive accuracy of 98.76% on the BRISC2025 dataset. A simpler Concatenation Fusion performs better on the texture-dominant Chest X-Ray Pneumonia dataset, achieving 93.11% accuracy, surpassing many top-performing, much deeper models. Our explainability analysis also reveals that the S F-Net learns to dynamically adjust its reliance on each branch based on the input pathology. These results verify that our dual-domain approach is a powerful and generalizable paradigm for medical image analysis.
Offline Meteorology-Pollution Coupling Global Air Pollution Forecasting Model with Bilinear Pooling
Fan, Xu, Lin, Yuetan, Gong, Bing, Li, Hao
Air pollution has become a major threat to human health, making accurate forecasting crucial for pollution control. Traditional physics-based models forecast global air pollution by coupling meteorology and pollution processes, using either online or offline methods depending on whether fully integrated with meteorological models and run simultaneously. However, the high computational demands of both methods severely limit real-time prediction efficiency. Existing deep learning (DL) solutions employ online coupling strategies for global air pollution forecasting, which finetune pollution forecasting based on pretrained atmospheric models, requiring substantial training resources. This study pioneers a DL-based offline coupling framework that utilizes bilinear pooling to achieve offline coupling between meteorological fields and pollutants. The proposed model requires only 13% of the parameters of DL-based online coupling models while achieving competitive performance. Compared with the state-of-the-art global air pollution forecasting model CAMS, our approach demonstrates superiority in 63% variables across all forecast time steps and 85% variables in predictions exceeding 48 hours. This work pioneers experimental validation of the effectiveness of meteorological fields in DL-based global air pollution forecasting, demonstrating that offline coupling meteorological fields with pollutants can achieve a 15% relative reduction in RMSE across all pollution variables. The research establishes a new paradigm for real-time global air pollution warning systems and delivers critical technical support for developing more efficient and comprehensive AI-powered global atmospheric forecasting frameworks.
BiCert: A Bilinear Mixed Integer Programming Formulation for Precise Certified Bounds Against Data Poisoning Attacks
Lorenz, Tobias, Kwiatkowska, Marta, Fritz, Mario
Data poisoning attacks pose one of the biggest threats to modern AI systems, necessitating robust defenses. While extensive efforts have been made to develop empirical defenses, attackers continue to evolve, creating sophisticated methods to circumvent these measures. To address this, we must move beyond empirical defenses and establish provable certification methods that guarantee robustness. This paper introduces a novel certification approach, BiCert, using Bilinear Mixed Integer Programming (BMIP) to compute sound deterministic bounds that provide such provable robustness. Using BMIP, we compute the reachable set of parameters that could result from training with potentially manipulated data. A key element to make this computation feasible is to relax the reachable parameter set to a convex set between training iterations. At test time, this parameter set allows us to predict all possible outcomes, guaranteeing robustness. BiCert is more precise than previous methods, which rely solely on interval and polyhedral bounds. Crucially, our approach overcomes the fundamental limitation of prior approaches where parameter bounds could only grow, often uncontrollably. We show that BiCert's tighter bounds eliminate a key source of divergence issues, resulting in more stable training and higher certified accuracy.