Goto

Collaborating Authors

 Support Vector Machines


Efficient Quantum One-Class Support Vector Machines for Anomaly Detection Using Randomized Measurements and Variable Subsampling

arXiv.org Artificial Intelligence

However, their quadratic time complexity with respect to data size poses challenges when dealing with large datasets. In recent work, quantum randomized measurements kernels and variable subsampling were proposed, as two independent methods to address this problem. The former achieves higher average precision, but suffers from variance, while the latter achieves linear complexity to data size and has lower variance. The current work focuses instead on combining these two methods, along with rotated feature bagging, to achieve linear time complexity both to data size and to number of features.


The Stochastic Conjugate Subgradient Algorithm For Kernel Support Vector Machines

arXiv.org Artificial Intelligence

Stochastic First-Order (SFO) methods have been a cornerstone in addressing a broad spectrum of modern machine learning (ML) challenges. However, their efficacy is increasingly questioned, especially in large-scale applications where empirical evidence indicates potential performance limitations. In response, this paper proposes an innovative method specifically designed for kernel support vector machines (SVMs). This method not only achieves faster convergence per iteration but also exhibits enhanced scalability when compared to conventional SFO techniques. Diverging from traditional sample average approximation strategies that typically frame kernel SVM as an 'all-in-one' Quadratic Program (QP), our approach adopts adaptive sampling. This strategy incrementally refines approximation accuracy on an 'as-needed' basis. Crucially, this approach also inspires a decomposition-based algorithm, effectively decomposing parameter selection from error estimation, with the latter being independently determined for each data point. To exploit the quadratic nature of the kernel matrix, we introduce a stochastic conjugate subgradient method. This method preserves many benefits of first-order approaches while adeptly handling both nonlinearity and non-smooth aspects of the SVM problem. Thus, it extends beyond the capabilities of standard SFO algorithms for non-smooth convex optimization. The convergence rate of this novel method is thoroughly analyzed within this paper. Our experimental results demonstrate that the proposed algorithm not only maintains but potentially exceeds the scalability of SFO methods. Moreover, it significantly enhances both speed and accuracy of the optimization process.


Exploring Genre and Success Classification through Song Lyrics using DistilBERT: A Fun NLP Venture

arXiv.org Artificial Intelligence

This paper presents a natural language processing (NLP) approach to the problem of thoroughly comprehending song lyrics, with particular attention on genre classification, view-based success prediction, and approximate release year. Our tests provide promising results with 65\% accuracy in genre classification and 79\% accuracy in success prediction, leveraging a DistilBERT model for genre classification and BERT embeddings for release year prediction. Support Vector Machines outperformed other models in predicting the release year, achieving the lowest root mean squared error (RMSE) of 14.18. Our study offers insights that have the potential to revolutionize our relationship with music by addressing the shortcomings of current approaches in properly understanding the emotional intricacies of song lyrics.


Hybrid Heuristic Algorithms for Adiabatic Quantum Machine Learning Models

arXiv.org Artificial Intelligence

The recent developments of adiabatic quantum machine learning (AQML) methods and applications based on the quadratic unconstrained binary optimization (QUBO) model have received attention from academics and practitioners. Traditional machine learning methods such as support vector machines, balanced k-means clustering, linear regression, Decision Tree Splitting, Restricted Boltzmann Machines, and Deep Belief Networks can be transformed into a QUBO model. The training of adiabatic quantum machine learning models is the bottleneck for computation. Heuristics-based quantum annealing solvers such as Simulated Annealing and Multiple Start Tabu Search (MSTS) are implemented to speed up the training of AQML based on the QUBO model. The main purpose of this paper is to present a hybrid heuristic embedding an r-flip strategy to solve large-scale QUBO with an improved solution and shorter computing time compared to the state-of-the-art MSTS method. The results of the substantial computational experiments are reported to compare an r-flip strategy embedded hybrid heuristic and a multiple start tabu search algorithm on a set of benchmark instances and three large-scale QUBO instances. The r-flip strategy embedded algorithm provides very high-quality solutions within the CPU time limits of 60 and 600 seconds.


Mathematical Programming Algorithms for Convex Hull Approximation with a Hyperplane Budget

arXiv.org Artificial Intelligence

We consider the following problem in computational geometry: given, in the d-dimensional real space, a set of points marked as positive and a set of points marked as negative, such that the convex hull of the positive set does not intersect the negative set, find K hyperplanes that separate, if possible, all the positive points from the negative ones. That is, we search for a convex polyhedron with at most K faces, containing all the positive points and no negative point. The problem is known in the literature for pure convex polyhedral approximation; our interest stems from its possible applications in constraint learning, where points are feasible or infeasible solutions of a Mixed Integer Program, and the K hyperplanes are linear constraints to be found. We cast the problem as an optimization one, minimizing the number of negative points inside the convex polyhedron, whenever exact separation cannot be achieved. We introduce models inspired by support vector machines and we design two mathematical programming formulations with binary variables. We exploit Dantzig-Wolfe decomposition to obtain extended formulations, and we devise column generation algorithms with ad-hoc pricing routines. We compare computing time and separation error values obtained by all our approaches on synthetic datasets, with number of points from hundreds up to a few thousands, showing our approaches to perform better than existing ones from the literature. Furthermore, we observe that key computational differences arise, depending on whether the budget K is sufficient to completely separate the positive points from the negative ones or not. On 8-dimensional instances (and over), existing convex hull algorithms become computational inapplicable, while our algorithms allow to identify good convex hull approximations in minutes of computation.


On the Pros and Cons of Active Learning for Moral Preference Elicitation

arXiv.org Artificial Intelligence

Computational preference elicitation methods are tools used to learn people's preferences quantitatively in a given context. Recent works on preference elicitation advocate for active learning as an efficient method to iteratively construct queries (framed as comparisons between context-specific cases) that are likely to be most informative about an agent's underlying preferences. In this work, we argue that the use of active learning for moral preference elicitation relies on certain assumptions about the underlying moral preferences, which can be violated in practice. Specifically, we highlight the following common assumptions (a) preferences are stable over time and not sensitive to the sequence of presented queries, (b) the appropriate hypothesis class is chosen to model moral preferences, and (c) noise in the agent's responses is limited. While these assumptions can be appropriate for preference elicitation in certain domains, prior research on moral psychology suggests they may not be valid for moral judgments. Through a synthetic simulation of preferences that violate the above assumptions, we observe that active learning can have similar or worse performance than a basic random query selection method in certain settings. Yet, simulation results also demonstrate that active learning can still be viable if the degree of instability or noise is relatively small and when the agent's preferences can be approximately represented with the hypothesis class used for learning. Our study highlights the nuances associated with effective moral preference elicitation in practice and advocates for the cautious use of active learning as a methodology to learn moral preferences.


Cross-Vendor Reproducibility of Radiomics-based Machine Learning Models for Computer-aided Diagnosis

arXiv.org Artificial Intelligence

Background: The reproducibility of machine-learning models in prostate cancer detection across different MRI vendors remains a significant challenge. Methods: This study investigates Support Vector Machines (SVM) and Random Forest (RF) models trained on radiomic features extracted from T2-weighted MRI images using Pyradiomics and MRCradiomics libraries. Feature selection was performed using the maximum relevance minimum redundancy (MRMR) technique. We aimed to enhance clinical decision support through multimodal learning and feature fusion. Results: Our SVM model, utilizing combined features from Pyradiomics and MRCradiomics, achieved an AUC of 0.74 on the Multi-Improd dataset (Siemens scanner) but decreased to 0.60 on the Philips test set. The RF model showed similar trends, with notable robustness for models using Pyradiomics features alone (AUC of 0.78 on Philips). Conclusions: These findings demonstrate the potential of multimodal feature integration to improve the robustness and generalizability of machine-learning models for clinical decision support in prostate cancer detection. This study marks a significant step towards developing reliable AI-driven diagnostic tools that maintain efficacy across various imaging platforms. Keywords: Machine Learning Inter-Vendor Reproducibility Radiomics Prostate Cancer Diagnostic tools Model Reproducibility 1 Introduction Thinking about the transformative era of medical diagnostics with the integration of machine learning (ML) into cancer detection opens up a new oil reserve of possibilities.


Automatic Real-word Error Correction in Persian Text

arXiv.org Artificial Intelligence

Automatic spelling correction stands as a pivotal challenge within the ambit of natural language processing (NLP), demanding nuanced solutions. Traditional spelling correction techniques are typically only capable of detecting and correcting non-word errors, such as typos and misspellings. However, context-sensitive errors, also known as real-word errors, are more challenging to detect because they are valid words that are used incorrectly in a given context. The Persian language, characterized by its rich morphology and complex syntax, presents formidable challenges to automatic spelling correction systems. Furthermore, the limited availability of Persian language resources makes it difficult to train effective spelling correction models. This paper introduces a cutting-edge approach for precise and efficient real-word error correction in Persian text. Our methodology adopts a structured, multi-tiered approach, employing semantic analysis, feature selection, and advanced classifiers to enhance error detection and correction efficacy. The innovative architecture discovers and stores semantic similarities between words and phrases in Persian text. The classifiers accurately identify real-word errors, while the semantic ranking algorithm determines the most probable corrections for real-word errors, taking into account specific spelling correction and context properties such as context, semantic similarity, and edit-distance measures. Evaluations have demonstrated that our proposed method surpasses previous Persian real-word error correction models. Our method achieves an impressive F-measure of 96.6% in the detection phase and an accuracy of 99.1% in the correction phase. These results clearly indicate that our approach is a highly promising solution for automatic real-word error correction in Persian text.


Non-Contact Breath Rate Classification Using SVM Model and mmWave Radar Sensor Data

arXiv.org Artificial Intelligence

This work presents the use of frequency modulated continuous wave (FMCW) radar technology combined with a machine learning model to differentiate between normal and abnormal breath rates. The proposed system non-contactly collects data using FMCW radar, which depends on breath rates. Various support vector machine kernels are used to classify the observed data into normal and abnormal states. Prolonged experiments show good accuracy in breath rate classification, confirming the model's efficacy. The best accuracy is 95 percent with the smallest number of support vectors in the case of the quadratic polynomial kernel.


From 2015 to 2023: How Machine Learning Aids Natural Product Analysis

arXiv.org Artificial Intelligence

In recent years, conventional chemistry techniques have faced significant challenges due to their inherent limitations, struggling to cope with the increasing complexity and volume of data generated in contemporary research endeavors. Computational methodologies represent robust tools in the field of chemistry, offering the capacity to harness potent machine-learning models to yield insightful analytical outcomes. This review delves into the spectrum of computational strategies available for natural product analysis and constructs a research framework for investigating both qualitative and quantitative chemistry problems. Our objective is to present a novel perspective on the symbiosis of machine learning and chemistry, with the potential to catalyze a transformation in the field of natural product analysis.