Performance Analysis
NegRefine: Refining Negative Label-Based Zero-Shot OOD Detection
Ansari, Amirhossein, Wang, Ke, Xiong, Pulei
Recent advancements in Vision-Language Models like CLIP have enabled zero-shot OOD detection by leveraging both image and textual label information. Among these, negative label-based methods such as NegLabel and CSP have shown promising results by utilizing a lexicon of words to define negative labels for distinguishing OOD samples. However, these methods suffer from detecting in-distribution samples as OOD due to negative labels that are subcategories of in-distribution labels or proper nouns. They also face limitations in handling images that match multiple in-distribution and negative labels. W e propose NegRefine, a novel negative label refinement framework for zero-shot OOD detection. By introducing a filtering mechanism to exclude subcategory labels and proper nouns from the negative label set and incorporating a multi-matching-aware scoring function that dynamically adjusts the contributions of multiple labels matching an image, NegRefine ensures a more robust separation between in-distribution and OOD samples. W e evaluate NegRefine on large-scale benchmarks, including ImageNet-1K.
Faster Low-Rank Approximation and Kernel Ridge Regression via the Block-Nystrรถm Method
Garg, Sachin, Dereziลski, Michaล
The Nystrรถm method is a popular low-rank approximation technique for large matrices that arise in kernel methods and convex optimization. Yet, when the data exhibits heavy-tailed spectral decay, the effective dimension of the problem often becomes so large that even the Nystrรถm method may be outside of our computational budget. To address this, we propose Block-Nystrรถm, an algorithm that injects a block-diagonal structure into the Nystrรถm method, thereby significantly reducing its computational cost while recovering strong approximation guarantees. We show that Block-Nystrรถm can be used to construct improved preconditioners for second-order optimization, as well as to efficiently solve kernel ridge regression for statistical learning over Hilbert spaces. Our key technical insight is that, within the same computational budget, combining several smaller Nystrรถm approximations leads to stronger tail estimates of the input spectrum than using one larger approximation. Along the way, we provide a novel recursive preconditioning scheme for efficiently inverting the Block-Nystrรถm matrix, and provide new statistical learning bounds for a broad class of approximate kernel ridge regression solvers.
On Entity Identification in Language Models
Sakata, Masaki, Heinzerling, Benjamin, Yokoi, Sho, Ito, Takumi, Inui, Kentaro
We analyze the extent to which internal representations of language models (LMs) identify and distinguish mentions of named entities, focusing on the many-to-many correspondence between entities and their mentions. We first formulate two problems of entity mentions -- ambiguity and variability -- and propose a framework analogous to clustering quality metrics. Specifically, we quantify through cluster analysis of LM internal representations the extent to which mentions of the same entity cluster together and mentions of different entities remain separated. Our experiments examine five Transformer-based autoregressive models, showing that they effectively identify and distinguish entities with metrics analogous to precision and recall ranging from 0.66 to 0.9. Further analysis reveals that entity-related information is compactly represented in a low-dimensional linear subspace at early LM layers. Additionally, we clarify how the characteristics of entity representations influence word prediction performance. These findings are interpreted through the lens of isomorphism between LM representations and entity-centric knowledge structures in the real world, providing insights into how LMs internally organize and use entity information.
SOC-DGL: Social Interaction Behavior Inspired Dual Graph Learning Framework for Drug-Target Interaction Identification
Zhao, Xiang, Li, Ruijie, Ning, Qiao, Guo, Shikai, Li, Hui, Ma, Qian
The identification of drug-target interactions (DTI) is critical for drug discovery and repositioning, as it reveals potential therapeutic uses of existing drugs, accelerating development and reducing costs. However, most existing models focus only on direct similarity in homogeneous graphs, failing to exploit the rich similarity in heterogeneous graphs. To address this gap, inspired by real-world social interaction behaviors, we propose SOC-DGL, which comprises two specialized modules: the Affinity-Driven Graph Learning (ADGL) module, learning global similarity through an affinity-enhanced drug-target graph, and the Equilibrium-Driven Graph Learning (EDGL) module, capturing higher-order similarity by amplifying the influence of even-hop neighbors using an even-polynomial graph filter based on balance theory. This dual approach enables SOC-DGL to effectively capture similarity information across multiple interaction scales within affinity and association matrices. To address the issue of imbalance in DTI datasets, we propose an adjustable imbalance loss function that adjusts the weight of negative samples by the parameter. Extensive experiments on four benchmark datasets demonstrate that SOC-DGL consistently outperforms existing state-of-the-art methods across both balanced and imbalanced scenarios. Moreover, SOC-DGL successfully predicts the top 9 drugs known to bind ABL1, and further analyzed the 10th drug, which has not been experimentally confirmed to interact with ABL1, providing supporting evidence for its potential binding.
Towards Harmonized Uncertainty Estimation for Large Language Models
Li, Rui, Long, Jing, Qi, Muge, Xia, Heming, Sha, Lei, Wang, Peiyi, Sui, Zhifang
To facilitate robust and trustworthy deployment of large language models (LLMs), it is essential to quantify the reliability of their generations through uncertainty estimation. While recent efforts have made significant advancements by leveraging the internal logic and linguistic features of LLMs to estimate uncertainty scores, our empirical analysis highlights the pitfalls of these methods to strike a harmonized estimation between indication, balance, and calibration, which hinders their broader capability for accurate uncertainty estimation. To address this challenge, we propose CUE (Corrector for Uncertainty Estimation): A straightforward yet effective method that employs a lightweight model trained on data aligned with the target LLM's performance to adjust uncertainty scores. Comprehensive experiments across diverse models and tasks demonstrate its effectiveness, which achieves consistent improvements of up to 60% over existing methods.
Kolmogorov Arnold Networks (KANs) for Imbalanced Data -- An Empirical Perspective
Kolmogorov Arnold Networks (KANs) are recent architectural advancement in neural computation that offer a mathematically grounded alternative to standard neural networks. This study presents an empirical evaluation of KANs in context of class imbalanced classification, using ten benchmark datasets. We observe that KANs can inherently perform well on raw imbalanced data more effectively than Multi-Layer Perceptrons (MLPs) without any resampling strategy. However, conventional imbalance strategies fundamentally conflict with KANs mathematical structure as resampling and focal loss implementations significantly degrade KANs performance, while marginally benefiting MLPs. Crucially, KANs suffer from prohibitive computational costs without proportional performance gains. Statistical validation confirms that MLPs with imbalance techniques achieve equivalence with KANs (|d| < 0.08 across metrics) at minimal resource costs. These findings reveal that KANs represent a specialized solution for raw imbalanced data where resources permit. But their severe performance-resource tradeoffs and incompatibility with standard resampling techniques currently limits practical deployment. We identify critical research priorities as developing KAN specific architectural modifications for imbalance learning, optimizing computational efficiency, and theoretical reconciling their conflict with data augmentation. This work establishes foundational insights for next generation KAN architectures in imbalanced classification scenarios.
CPC-CMS: Cognitive Pairwise Comparison Classification Model Selection Framework for Document-level Sentiment Analysis
Li, Jianfei, Yuen, Kevin Kam Fung
This study proposes the Cognitive Pairwise Comparison Classification Model Selection (CPC-CMS) framework for document-level sentiment analysis. The CPC, based on expert knowledge judgment, is used to calculate the weights of evaluation criteria, including accuracy, precision, recall, F1-score, specificity, Matthews Correlation Coefficient (MCC), Cohen's Kappa (Kappa), and efficiency. Naive Bayes, Linear Support Vector Classification (LSVC), Random Forest, Logistic Regression, Extreme Gradient Boosting (XGBoost), Long Short-Term Memory (LSTM), and A Lite Bidirectional Encoder Representations from Transformers (ALBERT) are chosen as classification baseline models. A weighted decision matrix consisting of classification evaluation scores with respect to criteria weights, is formed to select the best classification model for a classification problem. Three open datasets of social media are used to demonstrate the feasibility of the proposed CPC-CMS. Based on our simulation, for evaluation results excluding the time factor, ALBERT is the best for the three datasets; if time consumption is included, no single model always performs better than the other models. The CPC-CMS can be applied to the other classification applications in different areas.
RAG-based Architectures for Drug Side Effect Retrieval in LLMs
Nygren, Shad, Avci, Pinar, Daniels, Andre, Rassol, Reza, Beheshti, Afshin, Galeano, Diego
To overcome these significant challenges, we propose two novel architectures designed to integrate domain knowledge about drug side effects into a Llama 3 - 8B Language Model: Retrieval Augmented Generation (RAG) and GraphRAG. Our first architecture employs RAG, which enhances LLMs by retrieving relevant information from an external Pinecone vector database where drug side effect information is stored as feature vectors. The second architecture utilizes GraphRAG, which leverages a Neo4j graph database to stor e and efficiently handle more complex relationships of drug side effect associations. Both frameworks incorporate custom split functions and filtering modules to optimize user prompts for accurate retrieval. Through extensive evaluations on 19,520 associat ions between 976 marketed drugs and 3,851 unique side effect terms, we demonstrate that GraphRAG achieves near - perfect accuracy in drug side effect retrieval, significantly outperforming standalone LLMs and standard RAG approaches.
Graph-Structured Data Analysis of Component Failure in Autonomous Cargo Ships Based on Feature Fusion
Zhang, Zizhao, Zhao, Tianxiang, Sun, Yu, Sun, Liping, Kang, Jichuan
To address the challenges posed by cascading reactions caused by component failures in autonomous cargo ships (ACS) and the uncertainties in emergency decision-making, this paper proposes a novel hybrid feature fusion framework for constructing a graph-structured dataset of failure modes. By employing an improved cuckoo search algorithm (HN-CSA), the literature retrieval efficiency is significantly enhanced, achieving improvements of 7.1% and 3.4% compared to the NSGA-II and CSA search algorithms, respectively. A hierarchical feature fusion framework is constructed, using Word2Vec encoding to encode subsystem/component features, BERT-KPCA to process failure modes/reasons, and Sentence-BERT to quantify the semantic association between failure impact and emergency decision-making. The dataset covers 12 systems, 1,262 failure modes, and 6,150 propagation paths. Validation results show that the GATE-GNN model achieves a classification accuracy of 0.735, comparable to existing benchmarks. Additionally, a silhouette coefficient of 0.641 indicates that the features are highly distinguishable. In the label prediction results, the Shore-based Meteorological Service System achieved an F1 score of 0.93, demonstrating high prediction accuracy. This paper not only provides a solid foundation for failure analysis in autonomous cargo ships but also offers reliable support for fault diagnosis, risk assessment, and intelligent decision-making systems. The link to the dataset is https://github.com/wojiufukele/Graph-Structured-about-CSA.
Benchmarking of EEG Analysis Techniques for Parkinson's Disease Diagnosis: A Comparison between Traditional ML Methods and Foundation DL Methods
Avola, Danilo, Bernardini, Andrea, Crocetti, Giancarlo, Ladogana, Andrea, Lezoche, Mario, Mancini, Maurizio, Pannone, Daniele, Ranaldi, Amedeo
Parkinson's Disease (PD) is a progressive neurodegen-erative disorder that affects motor and cognitive functions, with early diagnosis being critical for effective clinical intervention. Electroencephalography (EEG) offers a noninvasive and cost-effective means of detecting PD-related neural alterations, yet the development of reliable automated diagnostic models remains a challenge. In this study, we conduct a systematic benchmark of traditional machine learning (ML) and deep learning (DL) models for classifying PD using a publicly available oddball task dataset. Our aim is to lay the groundwork for developing an effective learning system and to determine which approach produces the best results. W e implement a unified seven-step prepro-cessing pipeline and apply consistent subject-wise cross-validation and evaluation criteria to ensure comparability across models. Our results demonstrate that while baseline deep learning architectures, particularly CNN-LSTM models, achieve the best performance compared to other deep learning architectures, underlining the importance of capturing long-range temporal dependencies, several traditional classifiers such as XGBoost also offer strong predictive accuracy and calibrated decision boundaries. By rigorously comparing these baselines, our work provides a solid reference framework for future studies aiming to develop and evaluate more complex or specialized architectures. Establishing a reliable set of baseline results is essential to contextualize improvements introduced by novel methods, ensuring scientific rigor and reproducibility in the evolving field of EEG-based neurodiagnostics.