Performance Analysis
Scaling Up Efficient Small Language Models Serving and Deployment for Semantic Job Search
Behdin, Kayhan, Song, Qingquan, Vasudevan, Sriram, Sheng, Jian, Ma, Xiaojing, Zhou, Z, Zhu, Chuanrui, Li, Guoyao, Nguyen, Chanh, Ghosh, Sayan, Sang, Hejian, Baarzi, Ata Fatahi, Ramachandran, Sundara Raman, Wang, Xiaoqing, Lan, Qing, S, Vinay Y, Guo, Qi, Johnson, Caleb, Wang, Zhipeng, Borisyuk, Fedor
Large Language Models (LLMs) have demonstrated impressive quality when applied to predictive tasks such as relevance ranking and semantic search. However, deployment of such LLMs remains prohibitively expensive for industry applications with strict latency and throughput requirements. In this work, we present lessons and efficiency insights from developing a purely text-based decoder-only Small Language Model (SLM) for a semantic search application at LinkedIn. Particularly, we discuss model compression techniques such as pruning that allow us to reduce the model size by up to $40\%$ while maintaining the accuracy. Additionally, we present context compression techniques that allow us to reduce the input context length by up to $10$x with minimal loss of accuracy. Finally, we present practical lessons from optimizing the serving infrastructure for deploying such a system on GPUs at scale, serving millions of requests per second. Taken together, this allows us to increase our system's throughput by $10$x in a real-world deployment, while meeting our quality bar.
Training data membership inference via Gaussian process meta-modeling: a post-hoc analysis approach
Huang, Yongchao, Zhang, Pengfei, Mumtaz, Shahzad
Membership inference attacks (MIAs) test whether a data point was part of a model's training set, posing serious privacy risks. Existing methods often depend on shadow models or heavy query access, which limits their practicality. We propose GP-MIA, an efficient and interpretable approach based on Gaussian process (GP) meta-modeling. Using post-hoc metrics such as accuracy, entropy, dataset statistics, and optional sensitivity features (e.g. gradients, NTK measures) from a single trained model, GP-MIA trains a GP classifier to distinguish members from non-members while providing calibrated uncertainty estimates. Experiments on synthetic data, real-world fraud detection data, CIFAR-10, and WikiText-2 show that GP-MIA achieves high accuracy and generalizability, offering a practical alternative to existing MIAs.
Precise classification of low quality G-banded Chromosome Images by reliability metrics and data pruning classifier
In the last decade, due to high resolution cameras and accurate meta - phase analyzes, the accuracy of chromosome classification has improved substantially. However, current Karyotyping systems demand large number of high quality train data to have an adequa tely plausible Precision per each chromosome. Such provision of high quality train data with accurate devices are not yet accomplished in some out - reached pathological laboratories. To prevent false positive detections in low - cost systems and low - quality i mages settings, this paper improves the classification Precision of chromosomes using proposed reliability thresholding metrics and deliberately engineered features. The proposed method has been evaluated using a variation of deep Alex - Net neural network, SVM, K - Nearest - Neighbors, and their cascade pipelines to an automated filtering of semi - straight chromosome. The classification results have highly improved over 90% for the chromosomes with more common defections and translocations. Furthermore, a compara tive analysis over the proposed thresholding metrics has been conducted and the best metric is bolded with its salient characteristics. The high Precision results provided for a very low - quality G - banding database verifies suitability of the proposed metri cs and pruning method for Karyotyping facilities in poor countries and low - budget pathological laboratories. Keywords: G - banded Karyotyping, Precision, Reliability metrics, Pattern Recognition, Medical Imaging 1 Introduction One of the ways to study and dia gnose birth - defects and biological disorders is through using Cytogenetics. This branch of science endeavors to analyze chromosome shapes and patterns to find out common defects. The methods used for such analyzes includes G - Banding, Fluorescent In - Situ Hy bridization (FISH), Comparative Genomic Hybridization (CGH) and Chromosome - specific unique - sequence probes [27] . While Molecular Cytogenetics methods are effective in biological disorders, they do not necessarily manifest specific chromosome defects. FISH methods, though having higher accuracy results in stains, are costly and unable to identify all chromosome abnorm alities. Being temporary in sustaining fluorescence detector, they demand higher provision effort and substance supply that might not be affordable for some countries . Furthermore, detecting some abnormalities implies having G - banding technique involved an d not merely using stains.
Beyond Point Matching: Evaluating Multiscale Dubuc Distance for Time Series Similarity
Ahmadzadeh, Azim, Khazaei, Mahsa, Rohlfing, Elaina
Abstract--Time series are high-dimensional and complex data objects, making their efficient search and indexing a longstanding challenge in data mining. Building on a recently introduced similarity measure, namely Multiscale Dubuc Distance (MDD), this paper investigates its comparative strengths and limitations relative to the widely used Dynamic Time Warping (DTW). MDD is novel in two key ways: it evaluates time series similarity across multiple temporal scales and avoids point-to-point alignment. We demonstrate that in many scenarios where MDD outperforms DTW, the gains are substantial, and we provide a detailed analysis of the specific performance gaps it addresses. We provide simulations, in addition to the 95 datasets from the UCR archive, to test our hypotheses. Finally, we apply both methods to a challenging real-world classification task and show that MDD yields a significant improvement over DTW, underscoring its practical utility. Time series, or more generally, ordered high-dimensional data types, have become increasingly prevalent with the rise of powerful computational tools and machine learning techniques. In this study, we adopt the term time series as an umbrella label for all such sequential data. A central challenge in analyzing time series lies in defining and measuring similarity. Similarity is inherently subjective, shaped by the specific goals and nuances of a given application. The existing literature has produced a rich landscape of similarity measures, each tailored to specific assumptions and use cases.
Wavelet-based GAN Fingerprint Detection using ResNet50
Erukude, Sai Teja, Veluru, Suhasnadh Reddy, Marella, Viswa Chaitanya
Identifying images generated by Generative Adversarial Networks (GANs) has become a significant challenge in digital image forensics. This research presents a wavelet-based detection method that uses discrete wavelet transform (DWT) preprocessing and a ResNet50 classification layer to differentiate the StyleGAN-generated images from real ones. Haar and Daubechies wavelet filters are applied to convert the input images into multi-resolution representations, which will then be fed to a ResNet50 network for classification, capitalizing on subtle artifacts left by the generative process. Moreover, the wavelet-based models are compared to an identical ResNet50 model trained on spatial data. The Haar and Daubechies preprocessed models achieved a greater accuracy of 93.8 percent and 95.1 percent, much higher than the model developed in the spatial domain (accuracy rate of 81.5 percent). The Daubechies-based model outperforms Haar, showing that adding layers of descriptive frequency patterns can lead to even greater distinguishing power. These results indicate that the GAN-generated images have unique wavelet-domain artifacts or "fingerprints." The method proposed illustrates the effectiveness of wavelet-domain analysis to detect GAN images and emphasizes the potential of further developing the capabilities of future deepfake detection systems.
Unlocking Biomedical Insights: Hierarchical Attention Networks for High-Dimensional Data Interpretation
Nair, Rekha R, Babu, Tina, Panthakkan, Alavikunhu, Al-Ahmad, Hussain, Balusamy, Balamurugan
The proliferation of high-dimensional datasets in fields such as genomics, healthcare, and finance has created an urgent need for machine learning models that are both highly accurate and inherently interpretable. While traditional deep learning approaches deliver strong predictive performance, their lack of transparency often impedes their deployment in critical, decision-sensitive applications. In this work, we introduce the Hierarchical Attention-based Interpretable Network (HAIN), a novel architecture that unifies multi-level attention mechanisms, dimensionality reduction, and explanation-driven loss functions to deliver interpretable and robust analysis of complex biomedical data. HAIN provides feature-level interpretability via gradientweighted attention and offers global model explanations through prototype-based representations. Comprehensive evaluation on The Cancer Genome Atlas (TCGA) dataset demonstrates that HAIN achieves a classification accuracy of 94.3%, surpassing conventional post-hoc interpretability approaches such as SHAP and LIME in both transparency and explanatory power. Furthermore, HAIN effectively identifies biologically relevant cancer biomarkers, supporting its utility for clinical and research applications. By harmonizing predictive accuracy with interpretability, HAIN advances the development of transparent AI solutions for precision medicine and regulatory compliance.
Noise Aggregation Analysis Driven by Small-Noise Injection: Efficient Membership Inference for Diffusion Models
Li, Guo, Yu, Yuyang, Xu, Xuemiao
Diffusion models have demonstrated powerful performance in generating high-quality images. A typical example is text-to-image generator like Stable Diffusion. However, their widespread use also poses potential privacy risks. A key concern is membership inference attacks, which attempt to determine whether a particular data sample was used in the model training process. We propose an efficient membership inference attack method against diffusion models. This method is based on the injection of slight noise and the evaluation of the aggregation degree of the noise distribution. The intuition is that the noise prediction patterns of diffusion models for training set samples and non-training set samples exhibit distinguishable differences.Specifically, we suppose that member images exhibit higher aggregation of predicted noise around a certain time step of the diffusion process. In contrast, the predicted noises of non-member images exhibit a more discrete characteristic around the certain time step. Compared with other existing methods, our proposed method requires fewer visits to the target diffusion model. We inject slight noise into the image under test and then determine its membership by analyzing the aggregation degree of the noise distribution predicted by the model. Empirical findings indicate that our method achieves superior performance across multiple datasets. At the same time, our method can also show better attack effects in ASR and AUC when facing large-scale text-to-image diffusion models, proving the scalability of our method.
Beyond IVR Touch-Tones: Customer Intent Routing using LLMs
Widespread frustration with rigid touch-tone Interactive Voice Response (IVR) systems for customer service underscores the need for more direct and intuitive language interaction. While speech technologies are necessary, the key challenge lies in routing intents from user phrasings to IVR menu paths, a task where Large Language Models (LLMs) show strong potential. Progress, however, is limited by data scarcity, as real IVR structures and interactions are often proprietary. We present a novel LLM-based methodology to address this gap. Using three distinct models, we synthesized a realistic 23-node IVR structure, generated 920 user intents (230 base and 690 augmented), and performed the routing task. We evaluate two prompt designs: descriptive hierarchical menus and flattened path representations, across both base and augmented datasets. Results show that flattened paths consistently yield higher accuracy, reaching 89.13% on the base dataset compared to 81.30% with the descriptive format, while augmentation introduces linguistic noise that slightly reduces performance. Confusion matrix analysis further suggests that low-performing routes may reflect not only model limitations but also redundancies in menu design. Overall, our findings demonstrate proof-of-concept that LLMs can enable IVR routing through a smoother, more seamless user experience -- moving customer service one step ahead of touch-tone menus.
Automated HIV Screening on Dutch Electronic Health Records with Large Language Models
Zhou, Lang, Jhingoer, Amrish, Luo, Yinghao, Vliegenthart--Jongbloed, Klaske, Jordans, Carlijn, Werkhoven, Ben, Seinen, Tom, van Mulligen, Erik, Rokx, Casper, Li, Yunlei
Efficient screening and early diagnosis of HIV are critical for reducing onward transmission. Although large scale laboratory testing is not feasible, the widespread adoption of Electronic Health Records (EHRs) offers new opportunities to address this challenge. Existing research primarily focuses on applying machine learning methods to structured data, such as patient demographics, for improving HIV diagnosis. However, these approaches often overlook unstructured text data such as clinical notes, which potentially contain valuable information relevant to HIV risk. In this study, we propose a novel pipeline that leverages a Large Language Model (LLM) to analyze unstructured EHR text and determine a patient's eligibility for further HIV testing. Experimental results on clinical data from Erasmus University Medical Center Rotterdam demonstrate that our pipeline achieved high accuracy while maintaining a low false negative rate.
GOOD: Training-Free Guided Diffusion Sampling for Out-of-Distribution Detection
Gao, Xin, Liu, Jiyao, Li, Guanghao, Lyu, Yueming, Gao, Jianxiong, Yu, Weichen, Xu, Ningsheng, Wang, Liang, Shan, Caifeng, Liu, Ziwei, Si, Chenyang
Recent advancements have explored text-to-image diffusion models for synthesizing out-of-distribution (OOD) samples, substantially enhancing the performance of OOD detection. However, existing approaches typically rely on perturbing text-conditioned embeddings, resulting in semantic instability and insufficient shift diversity, which limit generalization to realistic OOD. To address these challenges, we propose GOOD, a novel and flexible framework that directly guides diffusion sampling trajectories towards OOD regions using off-the-shelf in-distribution (ID) classifiers. GOOD incorporates dual-level guidance: (1) Image-level guidance based on the gradient of log partition to reduce input likelihood, drives samples toward low-density regions in pixel space. (2) Feature-level guidance, derived from k-NN distance in the classifier's latent space, promotes sampling in feature-sparse regions. Hence, this dual-guidance design enables more controllable and diverse OOD sample generation. Additionally, we introduce a unified OOD score that adaptively combines image and feature discrepancies, enhancing detection robustness. We perform thorough quantitative and qualitative analyses to evaluate the effectiveness of GOOD, demonstrating that training with samples generated by GOOD can notably enhance OOD detection performance.