Goto

Collaborating Authors

 Support Vector Machines



Downscaling human mobility data based on demographic socioeconomic and commuting characteristics using interpretable machine learning methods

arXiv.org Artificial Intelligence

Understanding urban human mobility patterns at various spatial levels is essential for social science. This study presents a machine learning framework to downscale origin-destination (OD) taxi trips flows in New York City from a larger spatial unit to a smaller spatial unit. First, correlations between OD trips and demographic, socioeconomic, and commuting characteristics are developed using four models: Linear Regression (LR), Random Forest (RF), Support Vector Machine (SVM), and Neural Networks (NN). Second, a perturbation-based sensitivity analysis is applied to interpret variable importance for nonlinear models. The results show that the linear regression model failed to capture the complex variable interactions. While NN performs best with the training and testing datasets, SVM shows the best generalization ability in downscaling performance. The methodology presented in this study provides both analytical advancement and practical applications to improve transportation services and urban development.


Automated and Interpretable Survival Analysis from Multimodal Data

arXiv.org Artificial Intelligence

Accurate and interpretable survival analysis remains a core challenge in oncology. With growing multimodal data and the clinical need for transparent models to support validation and trust, this challenge increases in complexity. We propose an interpretable multimodal AI framework to automate survival analysis by integrating clinical variables and computed tomography imaging. Our MultiFIX-based framework uses deep learning to infer survival-relevant features that are further explained: imaging features are interpreted via Grad-CAM, while clinical variables are modeled as symbolic expressions through genetic programming. Risk estimation employs a transparent Cox regression, enabling stratification into groups with distinct survival outcomes. Using the open-source RADCURE dataset for head and neck cancer, MultiFIX achieves a C-index of 0.838 (prediction) and 0.826 (stratification), outperforming the clinical and academic baseline approaches and aligning with known prognostic markers. These results highlight the promise of interpretable multimodal AI for precision oncology with MultiFIX.


SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization

arXiv.org Artificial Intelligence

Contrastive language-audio pretraining, which aims to unify multimodal representations in a shared embedding space, serves as a cornerstone for building a wide range of applications, from cross-modal retrieval to cutting-edge multimodal large language models. However, we find that the perpendicular component of the pushing force from negative samples in contrastive learning is a double-edged sword: it contains rich supplementary information from negative samples, yet its unconstrained nature causes optimization trajectory drift and training instability. To address this, we propose Support V ector Regularization (SVR), a method that introduces an auxiliary support vector to control this perpendicular component, aiming to harness its rich information while mitigating the associated trajectory drift. The efficacy of SVR is critically governed by its semantic radius, for which we explore two unsupervised modeling strategies: direct parameterization and an adaptive radius predictor module enhanced with constraints to improve its predicting accuracy. Extensive experimental results demonstrate that our method surpasses widely used baselines like InfoNCE and SigLIP loss across classification, monolingual retrieval, and multilingual retrieval on standard audio-text datasets. Contrastive Language-Audio Pretraining (CLAP) Wu et al. (2023); Ghosh et al. (2025) aims to learn a unified audio-text embedding space by pulling corresponding pairs closer and pushing others apart. This paradigm, which powers applications like cross-modal retrieval Xie et al. (2024) and multimodal LLMs Xue et al. (2024); Lam et al. (2025), has achieved great empirical success. However, standard InfoNCE-based CLAP methods still struggle to learn ideal representations, facing limitations such as poor temporal alignment of audio events Y uan et al. (2024) and inconsistent multilingual alignment Yin et al. (2025). Therefore, achieving optimal alignment between the language and audio representation spaces remains an open challenge. In this paper, we uncover a complex yet overlooked dynamic in the optimization process of standard InfoNCE-based contrastive learning Wu et al. (2021): optimization trajectory drift.


Design, Implementation and Evaluation of a Novel Programming Language Topic Classification Workflow

arXiv.org Artificial Intelligence

As software systems grow in scale and complexity, understanding the distribution of programming language topics within source code becomes increasingly important for guiding technical decisions, improving onboarding, and informing tooling and education. This paper presents the design, implementation, and evaluation of a novel programming language topic classification workflow. Our approach combines a multi-label Support Vector Machine (SVM) with a sliding window and voting strategy to enable fine-grained localization of core language concepts such as operator overloading, virtual functions, inheritance, and templates. Trained on the IBM Project CodeNet dataset, our model achieves an average F1 score of 0.90 across topics and 0.75 in code-topic highlight. Our findings contribute empirical insights and a reusable pipeline for researchers and practitioners interested in code analysis and data-driven software engineering.


Online Deterministic Annealing for Classification and Clustering

arXiv.org Artificial Intelligence

--Inherent in virtually every iterative machine learning algorithm is the problem of hyper-parameter tuning which includes three major design parameters: (a) the complexity of the model, e.g., the number of neurons in a neural network, (b) the initial conditions, which heavily affect the behavior of the algorithm, and (c) the dissimilarity measure used to quantify its performance. We introduce an online prototype-based learning algorithm that can be viewed as a progressively growing competitive-learning neural network architecture for classification and clustering. The learning rule of the proposed approach is formulated as an online gradient-free stochastic approximation algorithm that solves a sequence of appropriately defined optimization problems, simulating an annealing process. The annealing nature of the algorithm contributes to avoiding poor local minima, offers robustness with respect to the initial conditions, and provides a means to progressively increase the complexity of the learning model, through an intuitive bifurcation phenomenon. The proposed approach is interpretable, requires minimal hyper-parameter tuning, and allows online control over the performance-complexity trade-off. Finally, we show that Bregman divergences appear naturally as a family of dissimilarity measures that play a central role in both the performance and the computational complexity of the learning algorithm. EARNING from data samples has become an important component of artificial intelligence. While virtually all learning problems can be formulated as constrained stochastic optimization problems, the optimization methods can be intractable, typically dealing with mixed constraints and very large, or even infinite-dimensional spaces [1]. For this reason, feature extraction, model selection and design, and analysis of optimization methods, have been the cornerstone of machine learning algorithms from their genesis until today. Deep learning methods, currently dominating the field of machine learning due to their performance in multiple applications, attempt to learn feature representations from data, using biologically-inspired models in artificial neural networks [2], [3]. Manuscript published in the IEEE Transactions on Neural Networks and Learning Systems (TNNLS).


Constraint-Reduced MILP with Local Outlier Factor Modeling for Plausible Counterfactual Explanations in Credit Approval

arXiv.org Artificial Intelligence

Counterfactual explanation (CE) is a widely used post-hoc method that provides individuals with actionable changes to alter an unfavorable prediction from a machine learning model. Plausible CE methods improve realism by considering data distribution characteristics, but their optimization models introduce a large number of constraints, leading to high computational cost. In this work, we revisit the DACE framework and propose a refined Mixed-Integer Linear Programming (MILP) formulation that significantly reduces the number of constraints in the local outlier factor (LOF) objective component. We also apply the method to a linear SVM classifier with standard scaler. The experimental results show that our approach achieves faster solving times while maintaining explanation quality. These results demonstrate the promise of more efficient LOF modeling in counterfactual explanation and data science applications.


Predicting First Year Dropout from Pre Enrolment Motivation Statements Using Text Mining

arXiv.org Artificial Intelligence

Preventing student dropout is a major challenge in higher education and it is difficult to predict prior to enrolment which students are likely to drop out and which students are likely to succeed. High School GPA is a strong predictor of dropout, but much variance in dropout remains to be explained. This study focused on predicting university dropout by using text mining techniques with the aim of exhuming information contained in motivation statements written by students. By combining text data with classic predictors of dropout in the form of student characteristics, we attempt to enhance the available set of predictive student characteristics. Our dataset consisted of 7,060 motivation statements of students enrolling in a non-selective bachelor at a Dutch university in 2014 and 2015. Support Vector Machines were trained on 75 percent of the data and several models were estimated on the test data. We used various combinations of student characteristics and text, such as TFiDF, topic modelling, LIWC dictionary. Results showed that, although the combination of text and student characteristics did not improve the prediction of dropout, text analysis alone predicted dropout similarly well as a set of student characteristics. Suggestions for future research are provided.


A Study on Stabilizer Rényi Entropy Estimation using Machine Learning

arXiv.org Artificial Intelligence

Nonstabilizerness is a fundamental resource for quantum advantage, as it quantifies the extent to which a quantum state diverges from those states that can be efficiently simulated on a classical computer, the stabilizer states. The stabilizer Rényi entropy (SRE) is one of the most investigated measures of nonstabilizerness because of its computational properties and suitability for experimental measurements on quantum processors. Because computing the SRE for arbitrary quantum states is a computationally hard problem, we propose a supervised machine-learning approach to estimate it. In this work, we frame SRE estimation as a regression task and train a Random Forest Regressor and a Support Vector Regressor (SVR) on a comprehensive dataset, including both unstructured random quantum circuits and structured circuits derived from the physics-motivated one-dimensional transverse Ising model (TIM). We compare the machine-learning models using two different quantum circuit representations: one based on classical shadows and the other on circuit-level features. Furthermore, we assess the generalization capabilities of the models on out-of-distribution instances. Experimental results show that an SVR trained on circuit-level features achieves the best overall performance. On the random circuits dataset, our approach converges to accurate SRE estimations, but struggles to generalize out of distribution. In contrast, it generalizes well on the structured TIM dataset, even to deeper and larger circuits. In line with previous work, our experiments suggest that machine learning offers a viable path for efficient nonstabilizerness estimation.


Geometric Mixture Classifier (GMC): A Discriminative Per-Class Mixture of Hyperplanes

arXiv.org Artificial Intelligence

Many real world categories are multimodal, with single classes occupying disjoint regions in feature space. Classical linear models (logistic regression, linear SVM) use a single global hyperplane and perform poorly on such data, while high-capacity methods (kernel SVMs, deep nets) fit multimodal structure but at the expense of interpretability, heavier tuning, and higher computational cost. We propose the Geometric Mixture Classifier (GMC), a discriminative model that represents each class as a mixture of hyperplanes. Within each class, GMC combines plane scores via a temperature-controlled soft-OR (log-sum-exp), smoothly approximating the max; across classes, standard softmax yields probabilistic posteriors. GMC optionally uses Random Fourier Features (RFF) for nonlinear mappings while keeping inference linear in the number of planes and features. Our practical training recipe: geometry-aware k-means initialization, silhouette-based plane budgeting, alpha annealing, usage-aware L2 regularization, label smoothing, and early stopping, makes GMC plug-and-play. Across synthetic multimodal datasets (moons, circles, blobs, spirals) and tabular/image benchmarks (iris, wine, WDBC, digits), GMC consistently outperforms linear baselines and k-NN, is competitive with RBF-SVM, Random Forests, and small MLPs, and provides geometric introspection via per-plane and class responsibility visualizations. Inference scales linearly in planes and features, making GMC CPU-friendly, with single-digit microsecond latency per example, often faster than RBF-SVM and compact MLPs. Post-hoc temperature scaling reduces ECE from about 0.06 to 0.02. GMC thus strikes a favorable balance of accuracy, interpretability, and efficiency: it is more expressive than linear models and lighter, more transparent, and faster than kernel or deep models.