Performance Analysis
Stochastic Weight Sharing for Bayesian Neural Networks
Lin, Moule, Guan, Shuhao, Jing, Weipeng, Botterweck, Goetz, Patane, Andrea
While offering a principled framework for uncertainty quantification in deep learning, the employment of Bayesian Neural Networks (BNNs) is still constrained by their increased computational requirements and the convergence difficulties when training very deep, state-of-the-art architectures. In this work, we reinterpret weight-sharing quantization techniques from a stochastic perspective in the context of training and inference with Bayesian Neural Networks (BNNs). Specifically, we leverage 2D adaptive Gaussian distributions, Wasserstein distance estimations, and alpha blending to encode the stochastic behaviour of a BNN in a lower dimensional, soft Gaussian representation. Through extensive empirical investigation, we demonstrate that our approach significantly reduces the computational overhead inherent in Bayesian learning by several orders of magnitude, enabling the efficient Bayesian training of large-scale models, such as ResNet-101 and Vision Transformer (VIT). On various computer vision benchmarks including CIFAR10, CIFAR100, and ImageNet1k. Our approach compresses model parameters by approximately 50x and reduces model size by 75, while achieving accuracy and uncertainty estimations comparable to the state-of-the-art.
Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space
Yang, Jinrong, Chen, Kexun, Li, Zhuoling, Wu, Shengkai, Zhao, Yong, Ren, Liangliang, Luo, Wenqiu, Shang, Chaohui, Zhi, Meiyu, Gao, Linfeng, Sun, Mingshan, Cheng, Hui
Imitation learning (IL) with human demonstrations is a promising method for robotic manipulation tasks. While minimal demonstrations enable robotic action execution, achieving high success rates and generalization requires high cost, e.g., continuously adding data or incrementally conducting human-in-loop processes with complex hardware/software systems. In this paper, we rethink the state/action space of the data collection pipeline as well as the underlying factors responsible for the prediction of non-robust actions. To this end, we introduce a Hierarchical Data Collection Space (HD-Space) for robotic imitation learning, a simple data collection scheme, endowing the model to train with proactive and high-quality data. Specifically, We segment the fine manipulation task into multiple key atomic tasks from a high-level perspective and design atomic state/action spaces for human demonstrations, aiming to generate robust IL data. We conduct empirical evaluations across two simulated and five real-world long-horizon manipulation tasks and demonstrate that IL policy training with HD-Space-based data can achieve significantly enhanced policy performance. HD-Space allows the use of a small amount of demonstration data to train a more powerful policy, particularly for long-horizon manipulation tasks. We aim for HD-Space to offer insights into optimizing data quality and guiding data scaling. project page: https://hd-space-robotics.github.io.
Where You Go is Who You Are: Behavioral Theory-Guided LLMs for Inverse Reinforcement Learning
Sun, Yuran, Xu, Susu, Wang, Chenguang, Zhao, Xilei
Big trajectory data hold great promise for human mobility analysis, but their utility is often constrained by the absence of critical traveler attributes, particularly sociodemographic information. While prior studies have explored predicting such attributes from mobility patterns, they often overlooked underlying cognitive mechanisms and exhibited low predictive accuracy. This study introduces SILIC, short for Sociodemographic Inference with LLM-guided Inverse Reinforcement Learning (IRL) and Cognitive Chain Reasoning (CCR), a theoretically grounded framework that leverages LLMs to infer sociodemographic attributes from observed mobility patterns by capturing latent behavioral intentions and reasoning through psychological constructs. Particularly, our approach explicitly follows the Theory of Planned Behavior (TPB), a foundational behavioral framework in transportation research, to model individuals' latent cognitive processes underlying travel decision-making. The LLMs further provide heuristic guidance to improve IRL reward function initialization and update by addressing its ill-posedness and optimization challenges arising from the vast and unstructured reward space. Evaluated in the 2017 Puget Sound Regional Council Household Travel Survey, our method substantially outperforms state-of-the-art baselines and shows great promise for enriching big trajectory data to support more behaviorally grounded applications in transportation planning and beyond.
CRG Score: A Distribution-Aware Clinical Metric for Radiology Report Generation
Hamamci, Ibrahim Ethem, Er, Sezgin, Shit, Suprosanna, Reynaud, Hadrien, Kainz, Bernhard, Menze, Bjoern
Evaluating long-context radiology report generation is challenging. NLG metrics fail to capture clinical correctness, while LLM-based metrics often lack generalizability. Clinical accuracy metrics are more relevant but are sensitive to class imbalance, frequently favoring trivial predictions. We propose the CRG Score, a distribution-aware and adaptable metric that evaluates only clinically relevant abnormalities explicitly described in reference reports. CRG supports both binary and structured labels (e.g., type, location) and can be paired with any LLM for feature extraction. By balancing penalties based on label distribution, it enables fairer, more robust evaluation and serves as a clinically aligned reward function.
Informatics for Food Processing
Ispirova, Gordana, Sebek, Michael, Menichetti, Giulia
This chapter explores the evolution, classification, and health implications of food processing, while emphasizing the transformative role of machine learning, artificial intelligence (AI), and data science in advancing food informatics. It begins with a historical overview and a critical review of traditional classification frameworks such as NOVA, Nutri-Score, and SIGA, highlighting their strengths and limitations, particularly the subjectivity and reproducibility challenges that hinder epidemiological research and public policy. To address these issues, the chapter presents novel computational approaches, including FoodProX, a random forest model trained on nutrient composition data to infer processing levels and generate a continuous FPro score. It also explores how large language models like BERT and BioBERT can semantically embed food descriptions and ingredient lists for predictive tasks, even in the presence of missing data. A key contribution of the chapter is a novel case study using the Open Food Facts database, showcasing how multimodal AI models can integrate structured and unstructured data to classify foods at scale, offering a new paradigm for food processing assessment in public health and research.
Streamlining HTTP Flooding Attack Detection through Incremental Feature Selection
Sarmah, Upasana, Borah, Parthajit, Bhattacharyya, D. K.
Applications over the Web primarily rely on the HTTP protocol to transmit web pages to and from systems. There are a variety of application layer protocols, but among all, HTTP is the most targeted because of its versatility and ease of integration with online services. The attackers leverage the fact that by default no detection system blocks any HTTP traffic. Thus, by exploiting such characteristics of the protocol, attacks are launched against web applications. HTTP flooding attacks are one such attack in the application layer of the OSI model. In this paper, a method for the detection of such an attack is proposed. The heart of the detection method is an incremental feature subset selection method based on mutual information and correlation. INFS-MICC helps in identifying a subset of highly relevant and independent feature subset so as to detect HTTP Flooding attacks with best possible classification performance in near-real time.
Improving endpoint detection in end-to-end streaming ASR for conversational speech
C, Anandh, Durai, Karthik Pandia, Prakash, Jeena, Arumugam, Manickavela, Hacioglu, Kadri, Dubagunta, S. Pavankumar, Stolcke, Andreas, Venkatesan, Shankar, Ganapathiraju, Aravind
ASR endpointing (EP) plays a major role in delivering a good user experience in products supporting human or artificial agents in human-human/machine conversations. Transducer-based ASR (T-ASR) is an end-to-end (E2E) ASR modelling technique preferred for streaming. A major limitation of T-ASR is delayed emission of ASR outputs, which could lead to errors or delays in EP. Inaccurate EP will cut the user off while speaking, returning incomplete transcript while delays in EP will increase the perceived latency, degrading the user experience. We propose methods to improve EP by addressing delayed emission along with EP mistakes. To address the delayed emission problem, we introduce an end-of-word token at the end of each word, along with a delay penalty. The EP delay is addressed by obtaining a reliable frame-level speech activity detection using an auxiliary network. We apply the proposed methods on Switchboard conversational speech corpus and evaluate it against a delay penalty method.
Predictively Combatting Toxicity in Health-related Online Discussions through Machine Learning
Paz-Ruza, Jorge, Alonso-Betanzos, Amparo, Guijarro-Berdiñas, Bertha, Eiras-Franco, Carlos
--In health-related topics, user toxicity in online discussions frequently becomes a source of social conflict or promotion of dangerous, unscientific behaviour; common approaches for battling it include different forms of detection, flagging and/or removal of existing toxic comments, which is often counterproductive for platforms and users alike. In this work, we propose the alternative of combatting user toxicity predictively, anticipating where a user could interact toxically in health-related online discussions. The hierarchical and decentralised structure made Reddit a hub of heated debate during the onset of the COVID pandemic, with over 200,000 related posts per day. Center accredited by Galician University System, is funded by "Conseller Conversely, volunteer-based moderation is generally more susceptible to bias and under-moderation, depending on the platform's audience. The design of an adapted Leave Out Last Item data partitioning method suitable for binary classification-oriented Collaborative Filtering tasks. We remove "generic comments'' from the set, i.e. those Label comments as "generic'' if they do not contain any words from Authors have temporarily removed this link to the work's repository to The majority of users do not post toxic comments when discussing health on Reddit, with 9.96% of toxic comments in the aggregate, similar to previous work. Furthermore, as Figure 2 shows, a user's toxicity on a subreddit tends to be consistent (toxic or non-toxic, as indicated by the peaks in the distribution at toxicities 0 Note the logarithmic scale on the y-axis. To tag the toxicity of comments we use Detoxify-original [7], a pre-trained language model. Instead of only detecting and punishing the toxicity of existing interactions like common content moderation methods, which is ineffective and counterproductive in the long term, this work's proposal is to predict the toxicity of an unobserved interaction Figure 5. Topology of the Machine Learning model proposed to predict the toxicity of health-related conversations in unobserved user-subreddit interactions on the Reddit platform. We assessed the predictive ability of our model and baselines using classical binary classification metrics: sensitivity, specificity, and geometric mean (G.Mean) of the class-wise We identify different avenues of future work. U. Naseem, J. Kim, M. Khushi, and A. G. Dunn, "Identification of disease or symptom terms in reddit to improve health mention classification," in "R/redditsecurity - understanding hate on reddit, and the impact of our Iii, "Toxicity detection is not all you need: Measuring the gaps to "Meta to replace'biased' fact-checkers with moderation by users -- J. Brownlee, Imbalanced classification with Python: better metrics, balance skewed classes, cost-sensitive learning .
SpectralGap: Graph-Level Out-of-Distribution Detection via Laplacian Eigenvalue Gaps
Gu, Jiawei, Qiao, Ziyue, Li, Zechao
The task of graph-level out-of-distribution (OOD) detection is crucial for deploying graph neural networks in real-world settings. In this paper, we observe a significant difference in the relationship between the largest and second-largest eigenvalues of the Laplacian matrix for in-distribution (ID) and OOD graph samples: \textit{OOD samples often exhibit anomalous spectral gaps (the difference between the largest and second-largest eigenvalues)}. This observation motivates us to propose SpecGap, an effective post-hoc approach for OOD detection on graphs. SpecGap adjusts features by subtracting the component associated with the second-largest eigenvalue, scaled by the spectral gap, from the high-level features (i.e., $\mathbf{X}-\left(λ_n-λ_{n-1}\right) \mathbf{u}_{n-1} \mathbf{v}_{n-1}^T$). SpecGap achieves state-of-the-art performance across multiple benchmark datasets. We present extensive ablation studies and comprehensive theoretical analyses to support our empirical results. As a parameter-free post-hoc method, SpecGap can be easily integrated into existing graph neural network models without requiring any additional training or model modification.
A Clinician-Friendly Platform for Ophthalmic Image Analysis Without Technical Barriers
Wang, Meng, Lin, Tian, Hou, Qingshan, Lin, Aidi, Wang, Jingcheng, Peng, Qingsheng, Nguyen, Truong X., Fang, Danqi, Zou, Ke, Xu, Ting, Xue, Cancan, Quek, Ten Cheer, Yu, Qinkai, Liu, Minxin, Zhou, Hui, Xiao, Zixuan, He, Guiqin, Liang, Huiyu, Shi, Tingkun, Chen, Man, Liu, Linna, Peng, Yuanyuan, Wang, Lianyu, Hu, Qiuming, Chen, Junhong, Zhang, Zhenhua, Chen, Cheng, Zhao, Yitian, Liu, Dianbo, Wu, Jianhua, Chen, Xinjian, Zhang, Changqing, Nguyen, Triet Thanh, Meng, Yanda, Zheng, Yalin, Tham, Yih Chung, Cheung, Carol Y., Fu, Huazhu, Chen, Haoyu, Cheng, Ching-Yu
Artificial intelligence (AI) shows remarkable potential in medical imaging diagnostics, yet most current models require retraining when applied across different clinical settings, limiting their scalability. We introduce GlobeReady, a clinician-friendly AI platform that enables fundus disease diagnosis that operates without retraining, fine-tuning, or the needs for technical expertise. GlobeReady demonstrates high accuracy across imaging modalities: 93.9-98.5% for 11 fundus diseases using color fundus photographs (CPFs) and 87.2-92.7% for 15 fundus diseases using optic coherence tomography (OCT) scans. By leveraging training-free local feature augmentation, GlobeReady platform effectively mitigates domain shifts across centers and populations, achieving accuracies of 88.9-97.4% across five centers on average in China, 86.3-96.9% in Vietnam, and 73.4-91.0% in Singapore, and 90.2-98.9% in the UK. Incorporating a bulit-in confidence-quantifiable diagnostic mechanism further enhances the platform's accuracy to 94.9-99.4% with CFPs and 88.2-96.2% with OCT, while enabling identification of out-of-distribution cases with 86.3% accuracy across 49 common and rare fundus diseases using CFPs, and 90.6% accuracy across 13 diseases using OCT. Clinicians from countries rated GlobeReady highly for usability and clinical relevance (average score 4.6/5). These findings demonstrate GlobeReady's robustness, generalizability and potential to support global ophthalmic care without technical barriers.