attention mask
Unifying and Enhancing Graph Transformers via a Hierarchical Mask Framework
Graph Transformers (GTs) have emerged as a powerful paradigm for graph representation learning due to their ability to model diverse node interactions. However, existing GTs often rely on intricate architectural designs tailored to specific interactions, limiting their flexibly. To address this, we propose a unified hierarchical mask framework that reveals an underlying equivalence between model architecture and attention mask construction. This framework enables a consistent modeling paradigm by capturing diverse interactions through carefully designed attention masks. Theoretical analysis under this framework demonstrates that the probability of correct classification positively correlates with the receptive field size and label consistency, leading to a fundamental design principle: An effective attention mask should ensure both a sufficiently large receptive field and a high level of label consistency.
Flow-based Feature Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection - Appendix Haibao Y u 1, 2, Yingjuan T ang
Mean A verage Precision (mAP). For VIC3D object detection, we focus on the obstacles around the ego vehicle. There are two metrics used for evaluation: BEV@mAP and 3D@mAP . BEV@mAP evaluates the 3D boxes in the bird's-eye view and ignores the In our implementation, we ignore the transmission cost of calibration files and timestamps. For early fusion, we calculate the transmission cost of transmitting raw data.
Enhancing low energy reconstruction and classification in KM3NeT/ORCA with transformers
The current KM3NeT/ORCA neutrino telescope, still under construction, has not yet reached its full potential in neutrino reconstruction capability . When training any deep learning model, no explicit information about the physics or the detector is provided, thus they remain unknown to the model. This study leverages the strengths of transformers by incorporating attention masks inspired by the physics and detector design, making the model understand both the telescope design and the neutrino physics measured on it. The study also shows the efficacy of transformers on retaining valuable information between detectors when doing fine-tuning from one configurations to another .
Functional Localization Enforced Deep Anomaly Detection Using Fundus Images
Ruhland, Jan Benedikt, Papenbrock, Thorsten, Sowa, Jan-Peter, Canbay, Ali, Eter, Nicole, Freisleben, Bernd, Heider, Dominik
Reliable detection of retinal diseases from fundus images is challenged by the variability in imaging quality, subtle early-stage manifestations, and domain shift across datasets. In this study, we systematically evaluated a Vision Transformer (ViT) classifier under multiple augmentation and enhancement strategies across several heterogeneous public datasets, as well as the AEyeDB dataset, a high-quality fundus dataset created in-house and made available for the research community. The ViT demonstrated consistently strong performance, with accuracies ranging from 0.789 to 0.843 across datasets and diseases. Diabetic retinopathy and age-related macular degeneration were detected reliably, whereas glaucoma remained the most frequently misclassified disease. Geometric and color augmentations provided the most stable improvements, while histogram equalization benefited datasets dominated by structural subtlety. Laplacian enhancement reduced performance across different settings. On the Papila dataset, the ViT with geometric augmentation achieved an AUC of 0.91, outperforming previously reported convolutional ensemble baselines (AUC of 0.87), underscoring the advantages of transformer architectures and multi-dataset training. To complement the classifier, we developed a GANomaly-based anomaly detector, achieving an AUC of 0.76 while providing inherent reconstruction-based explainability and robust generalization to unseen data. Probabilistic calibration using GUESS enabled threshold-independent decision support for future clinical implementation.