Goto

Collaborating Authors

 Performance Analysis


PolyMicros: Bootstrapping a Foundation Model for Polycrystalline Material Structure

arXiv.org Artificial Intelligence

Recent advances in Foundation Models for Materials Science are poised to revolutionize the discovery, manufacture, and design of novel materials with tailored properties and responses. Although great strides have been made, successes have been restricted to materials classes where multi-million sample data repositories can be readily curated (e.g., atomistic structures). Unfortunately, for many structural and functional materials (e.g., mesoscale structured metal alloys), such datasets are too costly or prohibitive to construct; instead, datasets are limited to very few examples. To address this challenge, we introduce a novel machine learning approach for learning from hyper-sparse, complex spatial data in scientific domains. Our core contribution is a physics-driven data augmentation scheme that leverages an ensemble of local generative models, trained on as few as five experimental observations, and coordinates them through a novel diversity curation strategy to generate a large-scale, physically diverse dataset. We utilize this framework to construct PolyMicros, the first Foundation Model for polycrystalline materials (a structural material class important across a broad range of industrial and scientific applications). We demonstrate the utility of PolyMicros by zero-shot solving several long standing challenges related to accelerating 3D experimental microscopy. Finally, we make both our models and datasets openly available to the community.


Bootstrapping your behavior: a new pretraining strategy for user behavior sequence data

arXiv.org Artificial Intelligence

User Behavior Sequence (UBS) modeling is crucial in industrial applications. As data scale and task diversity grow, UBS pretraining methods have become increasingly pivotal. State-of-the-art UBS pretraining methods rely on predicting behavior distributions. The key step in these methods is constructing a selected behavior vocabulary. However, this manual step is labor-intensive and prone to bias. The limitation of vocabulary capacity also directly affects models' generalization ability. In this paper, we introduce Bootstrapping Your Behavior (\model{}), a novel UBS pretraining strategy that predicts an automatically constructed supervision embedding summarizing all behaviors' information within a future time window, eliminating the manual behavior vocabulary selection. In implementation, we incorporate a student-teacher encoder scheme to construct the pretraining supervision effectively. Experiments on two real-world industrial datasets and eight downstream tasks demonstrate that \model{} achieves an average improvement of 3.9\% in AUC and 98.9\% in training throughput. Notably, the model exhibits meaningful attention patterns and cluster representations during pretraining without any label supervision. In our online deployment over two months, the pretrained model improves the KS by about 2.7\% and 7.1\% over the baseline model for two financial overdue risk prediction tasks in the Alipay mobile application, which reduces bad debt risk by millions of dollars for Ant group.


Leveraging Low-rank Factorizations of Conditional Correlation Matrices in Graph Learning

arXiv.org Artificial Intelligence

This paper addresses the problem of learning an undirected graph from data gathered at each nodes. Within the graph signal processing framework, the topology of such graph can be linked to the support of the conditional correlation matrix of the data. The corresponding graph learning problem then scales to the squares of the number of variables (nodes), which is usually problematic at large dimension. To tackle this issue, we propose a graph learning framework that leverages a low-rank factorization of the conditional correlation matrix. In order to solve for the resulting optimization problems, we derive tools required to apply Riemannian optimization techniques for this particular structure. The proposal is then particularized to a low-rank constrained counterpart of the GLasso algorithm, i.e., the penalized maximum likelihood estimation of a Gaussian graphical model. Experiments on synthetic and real data evidence that a very efficient dimension-versus-performance trade-off can be achieved with this approach.


A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes

arXiv.org Artificial Intelligence

-- In many nations, diabetes is becoming a significant health problem, and early identi - fication and control are crucial. Using machine learning algorithms to predict diabetes has yielded encouraging results. Using the Pima Indians Dia - betes dataset, this study attempts to evaluate the efficacy of several machine - learning methods for diabetes prediction. The collection includes infor - mation on 768 patients, such as their ages, BMIs, and glucose levels. The techniques assessed are Logistic Regression, Decision Tree, Random Forest, k - Nearest Neighbors, Naive Bayes, Support Vector Machine, Gradient Boosting, and Neural Network. The findings indicate that the Neural Network algorithm performed the best, with an accuracy of 78.57 The study implies that machine learning algorithms can aid diabetes prediction and be an efficient early detection tool. Diabetes is a chronic metabolic disease af - fecting millions worldwide and is a significant cause of morbidity and death [1]. High blood glucose levels characterize the disorder and can result in some complications, including cardiovascular disease, stroke, blindness, and amputations. To prevent or postpone com - plications, diabetes must be recognized and treated as soon as feasible; however, this can be challenging because symptoms may be mild or absent [2]. Machine learning (ML) is a subfield of artificial intelligence that comprises the de - velopment of algorithms that can learn from data and generate inferences or predictions without being explicitly programmed. ML algorithms are beneficial in several fields, in - cluding healthcare.


Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection

arXiv.org Artificial Intelligence

Out-of-distribution (OOD) detection is a critical task in machine learning, particularly for safety-critical applications where unexpected inputs must be reliably flagged. While hierarchical variational autoencoders (HVAEs) offer improved representational capacity over traditional VAEs, their performance is highly sensitive to how latent dimensions are distributed across layers. Existing approaches often allocate latent capacity arbitrarily, leading to ineffective representations or posterior collapse. In this work, we introduce a theoretically grounded framework for optimizing latent dimension allocation in HVAEs, drawing on principles from information theory to formalize the trade-off between information loss and representational attenuation. We prove the existence of an optimal allocation ratio $r^{\ast}$ under a fixed latent budget, and empirically show that tuning this ratio consistently improves OOD detection performance across datasets and architectures. Our approach outperforms baseline HVAE configurations and provides practical guidance for principled latent structure design, leading to more robust OOD detection with deep generative models.


TED-LaST: Towards Robust Backdoor Defense Against Adaptive Attacks

arXiv.org Artificial Intelligence

--Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, where attackers implant hidden triggers during training to maliciously control model behavior . T opological Evolution Dynamics (TED) has recently emerged as a powerful tool for detecting backdoor attacks in DNNs. However, TED can be vulnerable to backdoor attacks that adaptively distort topological representation distributions across network layers. T o address this limitation, we propose TED-LaST (T opological Evolution Dynamics against La undry, S low release, and T arget mapping attack strategies), a novel defense strategy that enhances TED's robustness against adaptive attacks. TED-LaST introduces two key innovations: label-supervised dynamics tracking and adaptive layer emphasis. These enhancements enable the identification of stealthy threats that evade traditional TED-based defenses, even in cases of inseparability in topological space and subtle topological perturbations. We review and classify data poisoning tricks in state-of-the-art adaptive attacks and propose enhanced adaptive attack with target mapping, which can dynamically shift malicious tasks and fully leverage the stealthiness that adaptive attacks possess. Our comprehensive experiments on multiple datasets (CIF AR-10, GTSRB, and ImageNet100) and model architectures (ResNet20, ResNet101) show that TED-LaST effectively counteracts sophisticated backdoors like Adap-Blend, Adapt-Patch, and the proposed enhanced adaptive attack. TED-LaST sets a new benchmark for robust backdoor detection, substantially enhancing DNN security against evolving threats. EEP Neural Networks (DNN) models have revolutionized fields such as computer vision [1], speech recognition [2], and autonomous driving [3] with their impressive capabilities. Despite these advances, their dependence on expansive datasets and complex training procedures introduces significant vulnerabilities, notably through backdoor attacks. Backdoor attacks implant hidden behaviors in DNN models, which can be activated by specific triggers.


Demonstrating Multi-Suction Item Picking at Scale via Multi-Modal Learning of Pick Success

arXiv.org Artificial Intelligence

This work demonstrates how autonomously learning aspects of robotic operation from sparsely-labeled, real-world data of deployed, engineered solutions at industrial scale can provide with solutions that achieve improved performance. Specifically, it focuses on multi-suction robot picking and performs a comprehensive study on the application of multi-modal visual encoders for predicting the success of candidate robotic picks. Picking diverse items from unstructured piles is an important and challenging task for robot manipulation in real-world settings, such as warehouses. Methods for picking from clutter must work for an open set of items while simultaneously meeting latency constraints to achieve high throughput. The demonstrated approach utilizes multiple input modalities, such as RGB, depth and semantic segmentation, to estimate the quality of candidate multi-suction picks. The strategy is trained from real-world item picking data, with a combination of multimodal pretrain and finetune. The manuscript provides comprehensive experimental evaluation performed over a large item-picking dataset, an item-picking dataset targeted to include partial occlusions, and a package-picking dataset, which focuses on containers, such as boxes and envelopes, instead of unpackaged items. The evaluation measures performance for different item configurations, pick scenes, and object types. Ablations help to understand the effects of in-domain pretraining, the impact of different modalities and the importance of finetuning. These ablations reveal both the importance of training over multiple modalities but also the ability of models to learn during pretraining the relationship between modalities so that during finetuning and inference, only a subset of them can be used as input.


DynaSubVAE: Adaptive Subgrouping for Scalable and Robust OOD Detection

arXiv.org Artificial Intelligence

Real-world observational data often contain existing or emerging heterogeneous subpopulations that deviate from global patterns. The majority of models tend to overlook these underrepresented groups, leading to inaccurate or even harmful predictions. Existing solutions often rely on detecting these samples as Out-of-domain (OOD) rather than adapting the model to new emerging patterns. We introduce DynaSubVAE, a Dynamic Subgrouping Variational Autoencoder framework that jointly performs representation learning and adaptive OOD detection. Unlike conventional approaches, DynaSubVAE evolves with the data by dynamically updating its latent structure to capture new trends. It leverages a novel non-parametric clustering mechanism, inspired by Gaussian Mixture Models, to discover and model latent subgroups based on embedding similarity. Extensive experiments show that DynaSubVAE achieves competitive performance in both near-OOD and far-OOD detection, and excels in class-OOD scenarios where an entire class is missing during training. We further illustrate that our dynamic subgrouping mechanism outperforms standalone clustering methods such as GMM and KMeans++ in terms of both OOD accuracy and regret precision.


Improving Oral Cancer Outcomes Through Machine Learning and Dimensionality Reduction

arXiv.org Artificial Intelligence

Oral cancer presents a formidable challenge in oncology, necessitating early diagnosis and accurate prognosis to enhance patient survival rates. Recent advancements in machine learning and data mining have revolutionized traditional diagnostic methodologies, providing sophisticated and automated tools for differentiating between benign and malignant oral lesions. This study presents a comprehensive review of cutting-edge data mining methodologies, including Neural Networks, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and ensemble learning techniques, specifically applied to the diagnosis and prognosis of oral cancer. Through a rigorous comparative analysis, our findings reveal that Neural Networks surpass other models, achieving an impressive classification accuracy of 93,6 % in predicting oral cancer. Furthermore, we underscore the potential benefits of integrating feature selection and dimensionality reduction techniques to enhance model performance. These insights underscore the significant promise of advanced data mining techniques in bolstering early detection, optimizing treatment strategies, and ultimately improving patient outcomes in the realm of oral oncology.


Analyzing Emotions in Bangla Social Media Comments Using Machine Learning and LIME

arXiv.org Artificial Intelligence

Research on understanding emotions in written language continues to expand, especially for understudied languages with distinctive regional expressions and cultural features, such as Bangla. This study examines emotion analysis using 22,698 social media comments from the EmoNoBa dataset. For language analysis, we employ machine learning models--Linear SVM, KNN, and Random Forest--with n-gram data from a TF-IDF vectorizer. We additionally investigated how PCA affects the reduction of dimensionality. Moreover, we utilized a BiLSTM model and AdaBoost to improve decision trees. To make our machine learning models easier to understand, we used LIME to explain the predictions of the AdaBoost classifier, which uses decision trees. With the goal of advancing sentiment analysis in languages with limited resources, our work examines various techniques to find efficient techniques for emotion identification in Bangla.