Vellore
KAN-AFT: An Interpretable Nonlinear Survival Model Integrating Kolmogorov-Arnold Networks with Accelerated Failure Time Analysis
Jose, Mebin, Francis, Jisha, Kattumannil, Sudheesh Kumar
Survival analysis relies fundamentally on the semi-parametric Cox Proportional Hazards (CoxPH) model and the parametric Accelerated Failure Time (AFT) model. CoxPH assumes constant hazard ratios, often failing to capture real-world dynamics, while traditional AFT models are limited by rigid distributional assumptions. Although deep learning models like DeepAFT address these constraints by improving predictive accuracy and handling censoring, they inherit the significant challenge of black-box interpretability. The recent introduction of CoxKAN demonstrated the successful integration of Kolmogorov-Arnold Networks (KANs), a novel architecture that yields highly accurate and interpretable symbolic representations, within the CoxPH framework. Motivated by the interpretability gains of CoxKAN, we introduce KAN-AFT (Kolmogorov Arnold Network-based AFT), the first framework to apply KANs to the AFT model. Our primary contributions include: (i) a principled AFT-KAN formulation, (ii) robust optimization strategies for right-censored observations (e.g., Buckley-James and IPCW), and (iii) an interpretability pipeline that converts the learned spline functions into closed-form symbolic equations for survival time. Empirical results on multiple datasets confirm that KAN-AFT achieves performance comparable to or better than DeepAFT, while uniquely providing transparent, symbolic models of the survival process.
ART: Adaptive Resampling-based Training for Imbalanced Classification
Basandrai, Arjun, Jain, Shourya, Ilanthenral, K.
Traditional resampling methods for handling class imbalance typically uses fixed distributions, undersampling the majority or oversampling the minority. These static strategies ignore changes in class-wise learning difficulty, which can limit the overall performance of the model. This paper proposes an Adaptive Resampling-based Training (ART) method that periodically updates the distribution of the training data based on the class-wise performance of the model. Specifically, ART uses class-wise macro F1 scores, computed at fixed intervals, to determine the degree of resampling to be performed. Unlike instance-level difficulty modeling, which is noisy and outlier-sensitive, ART adapts at the class level. This allows the model to incrementally shift its attention towards underperforming classes in a way that better aligns with the optimization objective. Results on diverse benchmarks, including Pima Indians Diabetes and Yeast dataset demonstrate that ART consistently outperforms both resampling-based and algorithm-level methods, including Synthetic Minority Oversampling Technique (SMOTE), NearMiss Undersampling, and Cost-sensitive Learning on binary as well as multi-class classification tasks with varying degrees of imbalance. In most settings, these improvements are statistically significant. On tabular datasets, gains are significant under paired t-tests and Wilcoxon tests (p < 0.05), while results on text and image tasks remain favorable. Compared to training on the original imbalanced data, ART improves macro F1 by an average of 2.64 percentage points across all tested tabular datasets. Unlike existing methods, whose performance varies by task, ART consistently delivers the strongest macro F1, making it a reliable choice for imbalanced classification.
VocalEyes: Enhancing Environmental Perception for the Visually Impaired through Vision-Language Models and Distance-Aware Object Detection
Chavan, Kunal, Balaji, Keertan, Barigidad, Spoorti, Chiluveru, Samba Raju
With an increasing demand for assistive technologies that promote the independence and mobility of visually impaired people, this study suggests an innovative real-time system that gives audio descriptions of a user's surroundings to improve situational awareness. The system acquires live video input and processes it with a quantized and fine-tuned Florence-2 big model, adjusted to 4-bit accuracy for efficient operation on low-power edge devices such as the NVIDIA Jetson Orin Nano. By transforming the video signal into frames with a 5-frame latency, the model provides rapid and contextually pertinent descriptions of objects, pedestrians, and barriers, together with their estimated distances. The system employs Parler TTS Mini, a lightweight and adaptable Text-to-Speech (TTS) solution, for efficient audio feedback. It accommodates 34 distinct speaker types and enables customization of speech tone, pace, and style to suit user requirements. This study examines the quantization and fine-tuning techniques utilized to modify the Florence-2 model for this application, illustrating how the integration of a compact model architecture with a versatile TTS component improves real-time performance and user experience. The proposed system is assessed based on its accuracy, efficiency, and usefulness, providing a viable option to aid vision-impaired users in navigating their surroundings securely and successfully.
An Improved Rapidly Exploring Random Tree Algorithm for Path Planning in Configuration Spaces with Narrow Channels
Noel, Mathew Mithra, Chawla, Akshay
Rapidly-exploring Random Tree (RRT) algorithms have been applied successfully to challenging robot motion planning and under-actuated nonlinear control problems. However a fundamental limitation of the RRT approach is the slow convergence in configuration spaces with narrow channels because of the small probability of generating test points inside narrow channels. This paper presents an improved RRT algorithm that takes advantage of narrow channels between the initial and goal states to find shorter paths by improving the exploration of narrow regions in the configuration space. The proposed algorithm detects the presence of narrow channel by checking for collision of neighborhood points with the infeasible set and attempts to add points within narrow channels with a predetermined bias. This approach is compared with the classical RRT and its variants on a variety of benchmark planning problems. Simulation results indicate that the algorithm presented in this paper computes a significantly shorter path in spaces with narrow channels.
Self-DenseMobileNet: A Robust Framework for Lung Nodule Classification using Self-ONN and Stacking-based Meta-Classifier
Rahman, Md. Sohanur, Chowdhury, Muhammad E. H., Rahman, Hasib Ryan, Ahmed, Mosabber Uddin, Kabir, Muhammad Ashad, Roy, Sanjiban Sekhar, Sarmun, Rusab
In this study, we propose a novel and robust framework, Self-DenseMobileNet, designed to enhance the classification of nodules and non-nodules in chest radiographs (CXRs). Our approach integrates advanced image standardization and enhancement techniques to optimize the input quality, thereby improving classification accuracy. To enhance predictive accuracy and leverage the strengths of multiple models, the prediction probabilities from Self-DenseMobileNet were transformed into tabular data and used to train eight classical machine learning (ML) models; the top three performers were then combined via a stacking algorithm, creating a robust meta-classifier that integrates their collective insights for superior classification performance. To enhance the interpretability of our results, we employed class activation mapping (CAM) to visualize the decision-making process of the best-performing model. Our proposed framework demonstrated remarkable performance on internal validation data, achieving an accuracy of 99.28\% using a Meta-Random Forest Classifier. When tested on an external dataset, the framework maintained strong generalizability with an accuracy of 89.40\%. These results highlight a significant improvement in the classification of CXRs with lung nodules.
Hazardous Asteroids Classification
Quy, Thai Duy, Buana, Alvin, Lee, Josh, Asyrofi, Rakha
Hazardous asteroid has been one of the concerns for humankind as fallen asteroid on earth could cost a huge impact on the society.Monitoring these objects could help predict future impact events, but such efforts are hindered by the large numbers of objects that pass in the Earth's vicinity. The aim of this project is to use machine learning and deep learning to accurately classify hazardous asteroids. A total of ten methods which consist of five machine learning algorithms and five deep learning models are trained and evaluated to find the suitable model that solves the issue. We experiment on two datasets, one from Kaggle and one we extracted from a web service called NeoWS which is a RESTful web service from NASA that provides information about near earth asteroids, it updates every day. In overall, the model is tested on two datasets with different features to find the most accurate model to perform the classification.
A Novel Self-Attention-Enabled Weighted Ensemble-Based Convolutional Neural Network Framework for Distributed Denial of Service Attack Classification
S, Kanthimathi, Venkatraman, Shravan, S, Jayasankar K, T, Pranay Jiljith, R, Jashwanth
Distributed Denial of Service (DDoS) attacks are a major concern in network security, as they overwhelm systems with excessive traffic, compromise sensitive data, and disrupt network services. Accurately detecting these attacks is crucial to protecting network infrastructure. Traditional approaches, such as single Convolutional Neural Networks (CNNs) or conventional Machine Learning (ML) algorithms like Decision Trees (DTs) and Support Vector Machines (SVMs), struggle to extract the diverse features needed for precise classification, resulting in suboptimal performance. This research addresses this gap by introducing a novel approach for DDoS attack detection. The proposed method combines three distinct CNN architectures: SA-Enabled CNN with XGBoost, SA-Enabled CNN with LSTM, and SA-Enabled CNN with Random Forest. Each model extracts features at multiple scales, while self-attention mechanisms enhance feature integration and relevance. The weighted ensemble approach ensures that both prominent and subtle features contribute to the final classification, improving adaptability to evolving attack patterns and novel threats. The proposed method achieves a precision of 98.71%, an F1-score of 98.66%, a recall of 98.63%, and an accuracy of 98.69%, outperforming traditional methods and setting a new benchmark in DDoS attack detection. This innovative approach addresses critical limitations in current models and advances the state of the art in network security.
Abstracted Gaussian Prototypes for One-Shot Concept Learning
Zou, Chelsea, Kurtz, Kenneth J.
We introduce a cluster-based generative image segmentation framework to encode higher-level representations of visual concepts based on one-shot learning inspired by the Omniglot Challenge. The inferred parameters of each component of a Gaussian Mixture Model (GMM) represent a distinct topological subpart of a visual concept. Sampling new data from these parameters generates augmented subparts to build a more robust prototype for each concept, i.e., the Abstracted Gaussian Prototype (AGP). This framework addresses one-shot classification tasks using a cognitively-inspired similarity metric and addresses one-shot generative tasks through a novel AGP-VAE pipeline employing variational autoencoders (VAEs) to generate new class variants. Results from human judges reveal that the generative pipeline produces novel examples and classes of visual concepts that are broadly indistinguishable from those made by humans. The proposed framework leads to impressive but not state-of-the-art classification accuracy; thus, the contribution is two-fold: 1) the system is uniquely low in theoretical and computational complexity and operates in a completely standalone manner compared while existing approaches draw heavily on pre-training or knowledge engineering; and 2) in contrast with competing neural network models, the AGP approach addresses the importance of breadth of task capability emphasized in the Omniglot challenge (i.e., successful performance on generative tasks). These two points are critical as we advance toward an understanding of how learning/reasoning systems can produce viable, robust, and flexible concepts based on literally nothing more than a single example.
Multimodal Methods for Analyzing Learning and Training Environments: A Systematic Literature Review
Cohn, Clayton, Davalos, Eduardo, Vatral, Caleb, Fonteles, Joyce Horn, Wang, Hanchen David, Ma, Meiyi, Biswas, Gautam
Recent technological advancements have enhanced our ability to collect and analyze rich multimodal data (e.g., speech, video, and eye gaze) to better inform learning and training experiences. While previous reviews have focused on parts of the multimodal pipeline (e.g., conceptual models and data fusion), a comprehensive literature review on the methods informing multimodal learning and training environments has not been conducted. This literature review provides an in-depth analysis of research methods in these environments, proposing a taxonomy and framework that encapsulates recent methodological advances in this field and characterizes the multimodal domain in terms of five modality groups: Natural Language, Video, Sensors, Human-Centered, and Environment Logs. We introduce a novel data fusion category -- mid fusion -- and a graph-based technique for refining literature reviews, termed citation graph pruning. Our analysis reveals that leveraging multiple modalities offers a more holistic understanding of the behaviors and outcomes of learners and trainees. Even when multimodality does not enhance predictive accuracy, it often uncovers patterns that contextualize and elucidate unimodal data, revealing subtleties that a single modality may miss. However, there remains a need for further research to bridge the divide between multimodal learning and training studies and foundational AI research.
Multitaper mel-spectrograms for keyword spotting
de Souza, Douglas Baptista, Bakri, Khaled Jamal, Ferreira, Fernanda, Inacio, Juliana
Keyword spotting (KWS) is one of the speech recognition tasks most sensitive to the quality of the feature representation. However, the research on KWS has traditionally focused on new model topologies, putting little emphasis on other aspects like feature extraction. This paper investigates the use of the multitaper technique to create improved features for KWS. The experimental study is carried out for different test scenarios, windows and parameters, datasets, and neural networks commonly used in embedded KWS applications. Experiment results confirm the advantages of using the proposed improved features.