Goto

Collaborating Authors

 Inductive Learning


Diagnostic Uncertainty in Pneumonia Detection using CNN MobileNetV2 and CNN from Scratch

arXiv.org Artificial Intelligence

Pneumonia Diagnosis, though it is crucial for an effective treatment, it can be hampered by uncertainty. This uncertainty starts to arise due to some factors like atypical presentations, limitations of diagnostic tools such as chest X-rays, and the presence of co-existing respiratory conditions. This research proposes one of the supervised learning methods, CNN. Using MobileNetV2 as the pre-trained one with ResNet101V2 architecture and using Keras API as the built from scratch model, for identifying lung diseases especially pneumonia. The datasets used in this research were obtained from the website through Kaggle. The result shows that by implementing CNN MobileNetV2 and CNN from scratch the result is promising. While validating data, MobileNetV2 performs with stability and minimal overfitting, while the training accuracy increased to 84.87% later it slightly decreased to 78.95%, with increasing validation loss from 0.499 to 0.6345. Nonetheless, MobileNetV2 is more stable. Although it takes more time to train each epoch. Meanwhile, after the 10th epoch, the Scratch model displayed more instability and overfitting despite having higher validation accuracy, training accuracy decreased significantly to 78.12% and the validation loss increased from 0.5698 to 1.1809. With these results, ResNet101V2 offers stability, and the Scratch model offers high accuracy.


TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action

arXiv.org Artificial Intelligence

Understanding causal event relationships and achieving fine-grained temporal grounding in videos remain challenging for vision-language models. Existing methods either compress video tokens to reduce temporal resolution, or treat videos as unsegmented streams, which obscures fine-grained event boundaries and limits the modeling of causal dependencies. We propose TEMPURA (Temporal Event Masked Prediction and Understanding for Reasoning in Action), a two-stage training framework that enhances video temporal understanding. TEMPURA first applies masked event prediction reasoning to reconstruct missing events and generate step-by-step causal explanations from dense event annotations, drawing inspiration from effective infilling techniques. TEMPURA then learns to perform video segmentation and dense captioning to decompose videos into non-overlapping events with detailed, timestamp-aligned descriptions. We train TEMPURA on VER, a large-scale dataset curated by us that comprises 1M training instances and 500K videos with temporally aligned event descriptions and structured reasoning steps. Experiments on temporal grounding and highlight detection benchmarks demonstrate that TEMPURA outperforms strong baseline models, confirming that integrating causal reasoning with fine-grained temporal segmentation leads to improved video understanding.


Subset Selection for Fine-Tuning: A Utility-Diversity Balanced Approach for Mathematical Domain Adaptation

arXiv.org Artificial Intelligence

We propose a refined approach to efficiently fine-tune large language models (LLMs) on specific domains like the mathematical domain by employing a budgeted subset selection method. Our approach combines utility and diversity metrics to select the most informative and representative training examples. The final goal is to achieve near-full dataset performance with meticulously selected data points from the entire dataset while significantly reducing computational cost and training time and achieving competitive performance as the full dataset. The utility metric incorporates both perplexity and Chain-of-Thought (CoT) loss to identify challenging examples that contribute most to model learning, while the diversity metric ensures broad coverage across mathematical subdomains. We evaluate our method on LLaMA-3 8B and Phi-3 models, comparing against several baseline approaches, including random selection, diversity-based sampling, and existing state-of-the-art subset selection techniques.


Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study

arXiv.org Artificial Intelligence

Multiple Instance Learning (MIL) has emerged as the best solution for Whole Slide Image (WSI) classification. It consists of dividing each slide into patches, which are treated as a bag of instances labeled with a global label. MIL includes two main approaches: instance-based and embedding-based. In the former, each patch is classified independently, and then the patch scores are aggregated to predict the bag label. In the latter, bag classification is performed after aggregating patch embeddings. Even if instance-based methods are naturally more interpretable, embedding-based MILs have usually been preferred in the past due to their robustness to poor feature extractors. However, recently, the quality of feature embeddings has drastically increased using self-supervised learning (SSL). Nevertheless, many authors continue to endorse the superiority of embedding-based MIL. To investigate this further, we conduct 710 experiments across 4 datasets, comparing 10 MIL strategies, 6 self-supervised methods with 4 backbones, 4 foundation models, and various pathology-adapted techniques. Furthermore, we introduce 4 instance-based MIL methods never used before in the pathology domain. Through these extensive experiments, we show that with a good SSL feature extractor, simple instance-based MILs, with very few parameters, obtain similar or better performance than complex, state-of-the-art (SOTA) embedding-based MIL methods, setting new SOTA results on the BRACS and Camelyon16 datasets. Since simple instance-based MIL methods are naturally more interpretable and explainable to clinicians, our results suggest that more effort should be put into well-adapted SSL methods for WSI rather than into complex embedding-based MIL methods. Keywords: Whole Slide Image Classification, Self-Supervised Learning, Multiple Instance Learning, Digital Pathology 1 Introduction Whole Slide histopathology Image (WSI) analysis has become an increasingly common tool for disease diagnosis in digital pathology [1] . However, the gigapixel size of WSIs, makes the manual analysis very time-consuming and presents significant challenges for conventional Deep Learning (DL) methods [2, 3], as they are not designed to support such large images. To address that, a simple approach involves dividing the WSI into smaller patches that DL methods can easily handle. Then, features or predictions from a patch-level encoder/classifier are aggregated to get the slide-level prediction [4, 5]. Nonetheless, this method requires very expensive patch-level annotations, which are not always available. Please note that a naive assignment of the slide label to all patches might be clinically incorrect, since the tissue section characterizing a disease might only occupy a small fraction of the slide, while all other patches should be considered as healthy. 1 arXiv:2505.01109v1


Federated Adapter on Foundation Models: An Out-Of-Distribution Approach

arXiv.org Artificial Intelligence

As foundation models gain prominence, Federated Foundation Models (FedFM) have emerged as a privacy-preserving approach to collaboratively fine-tune models in federated learning (FL) frameworks using distributed datasets across clients. A key challenge for FedFM, given the versatile nature of foundation models, is addressing out-of-distribution (OOD) generalization, where unseen tasks or clients may exhibit distribution shifts leading to suboptimal performance. Although numerous studies have explored OOD generalization in conventional FL, these methods are inadequate for FedFM due to the challenges posed by large parameter scales and increased data heterogeneity. To address these, we propose FedOA, which employs adapter-based parameter-efficient fine-tuning methods for efficacy and introduces personalized adapters with feature distance-based regularization to align distributions and guarantee OOD generalization for each client. Theoretically, we demonstrate that the conventional aggregated global model in FedFM inherently retains OOD generalization capabilities, and our proposed method enhances the personalized model's OOD generalization through regularization informed by the global model, with proven convergence under general non-convex settings. Empirically, the effectiveness of the proposed method is validated on benchmark datasets across various NLP tasks.


Enhancing User Sequence Modeling through Barlow Twins-based Self-Supervised Learning

arXiv.org Artificial Intelligence

User sequence modeling is crucial for modern large-scale recommendation systems, as it enables the extraction of informative representations of users and items from their historical interactions. These user representations are widely used for a variety of downstream tasks to enhance users' online experience. A key challenge for learning these representations is the lack of labeled training data. While self-supervised learning (SSL) methods have emerged as a promising solution for learning representations from unlabeled data, many existing approaches rely on extensive negative sampling, which can be computationally expensive and may not always be feasible in real-world scenario. In this work, we propose an adaptation of Barlow Twins, a state-of-the-art SSL methods, to user sequence modeling by incorporating suitable augmentation methods. Our approach aims to mitigate the need for large negative sample batches, enabling effective representation learning with smaller batch sizes and limited labeled data. We evaluate our method on the MovieLens-1M, MovieLens-20M, and Yelp datasets, demonstrating that our method consistently outperforms the widely-used dual encoder model across three downstream tasks, achieving an 8%-20% improvement in accuracy. Our findings underscore the effectiveness of our approach in extracting valuable sequence-level information for user modeling, particularly in scenarios where labeled data is scarce and negative examples are limited.


Per-Domain Generalizing Policies: On Validation Instances and Scaling Behavior

arXiv.org Artificial Intelligence

Recent work has shown that successful per-domain generalizing action policies can be learned. Scaling behavior, from small training instances to large test instances, is the key objective; and the use of validation instances larger than training instances is one key to achieve it. Prior work has used fixed validation sets. Here, we introduce a method generating the validation set dynamically, on the fly, increasing instance size so long as informative and feasible. We also introduce refined methodology for evaluating scaling behavior, generating test instances systematically to guarantee a given confidence in coverage performance for each instance size. In experiments, dynamic validation improves scaling behavior of GNN policies in all 9 domains used.


Recursive KL Divergence Optimization: A Dynamic Framework for Representation Learning

arXiv.org Artificial Intelligence

Contrastive methods, dimensionality reduction algorithms like t-SNE, and clustering objectives such as k-Means all implicitly or explicitly define distributions over neighborhoods and minimize some divergence between them. The Information Contrastive Learning (I-Con) framework recently unified many such approaches by expressing them as the minimization of KL divergence between a fixed supervisory distribution p (j | i) and a learned distribution q ( j |i) over data neighborhoods [1]. However, I-Con treats this KL alignment statically, as if each point-wise loss were independent. In this paper, we propose a deeper view: that representation learning is fundamentally a process of recursive divergence minimization across a structured field of conditional distributions. Each neighborhood distribution depends on prior learned representations, forming a dynamic system that we call Recursive KL Divergence Optimization (RKDO). While the exponential moving average (EMA) recursion we employ has been used in several well-known self-supervised and semi-supervised methods such as Temporal Ensembling [2], Mean Teacher [3], and momentum-based frameworks like MoCo [4], BYOL [5], and DINO [6], our novel contribution lies in applying this recursive structure to the entire response field (the joint conditional distribution over representation pairs), rather than to individual weights or per-sample predictions. RKDO captures the temporal dynamics of representation learning that are absent in static frameworks, with significant implications for optimization efficiency. Our contributions include: A new theoretical framework that generalizes representation learning as recursive alignment of conditional distributions across the entire response field Mathematical formulations showing how RKDO captures temporal dynamics absent in static frameworks, with a formal proof of linear-rate convergence under this recursion Empirical evidence that RKDO's recursive approach results in significantly lower loss values (approximately 30% reduction across all tested datasets) Demonstration that RKDO requires 60-80% fewer computational resources (training epochs) to achieve results comparable to longer I-Con training Analysis of the trade-offs between optimization efficiency and generalization in recursive versus static approaches Our experiments suggest that while I-Con effectively represents a unified view of many typical representation learning approaches, RKDO can provide substantial efficiency improvements: achieving comparable optimization objectives with approximately 30% lower loss values, while potentially reducing computational requirements by 60-80% in the specific scenarios we studied. 2 Background and Related Work The KL divergence [7] is a foundational object in representation learning.


Variational Self-Supervised Learning

arXiv.org Artificial Intelligence

We present Variational Self-Supervised Learning (VSSL), a novel framework that combines variational inference with self-supervised learning to enable efficient, decoder-free representation learning. Unlike traditional VAEs that rely on input reconstruction via a decoder, VSSL symmetrically couples two encoders with Gaussian outputs. A momentum-updated teacher network defines a dynamic, data-dependent prior, while the student encoder produces an approximate posterior from augmented views. The reconstruction term in the ELBO is replaced with a cross-view denoising objective, preserving the analytical tractability of Gaussian KL divergence. We further introduce cosine-based formulations of KL and log-likelihood terms to enhance semantic alignment in high-dimensional latent spaces. Experiments on CIFAR-10, CIFAR-100, and ImageNet-100 show that VSSL achieves competitive or superior performance to leading self-supervised methods, including BYOL and MoCo V3. VSSL offers a scalable, probabilistically grounded approach to learning transferable representations without generative reconstruction, bridging the gap between variational modeling and modern self-supervised techniques.


Severity Classification of Chronic Obstructive Pulmonary Disease in Intensive Care Units: A Semi-Supervised Approach Using MIMIC-III Dataset

arXiv.org Artificial Intelligence

Chronic obstructive pulmonary disease (COPD) is a major global health concern, with accurate severity assessment crucial for effective management, especially in intensive care units (ICUs). This study presents a novel approach to COPD sever - ity classification using machine learning algorithms applied to the MIMIC - III dataset. Our work presents a new application of the MIMIC - III dataset and con - tributes to the growing field of artificial intelligence in critical care medicine. We developed a model to classify COPD severity based on available ICU parameters, including blood gas measurements and vital signs. Our methodology incorpo - rated semi - supervised learning techniques to leverage unlabeled data, enhancing model robustness. A random forest classifier demonstrated superior performance, achieving 92.51% accuracy and 0.98 ROC AUC distinguishing between mild - to - moderate and severe COPD cases. This approach offers a practical, accurate, and accessible tool for rapid COPD severity assessment in ICU settings, poten - tially improving clinical decision - making and patient outcomes. Future research should focus on external validation and integration into clinical decision support systems to enhance COPD management in the ICUs.