Goto

Collaborating Authors

 Support Vector Machines


Honey Adulteration Detection using Hyperspectral Imaging and Machine Learning

arXiv.org Artificial Intelligence

This paper aims to develop a machine learning-based system for automatically detecting honey adulteration with sugar syrup, based on honey hyperspectral imaging data. First, the floral source of a honey sample is classified by a botanical origin identification subsystem. Then, the sugar syrup adulteration is identified, and its concentration is quantified by an adulteration detection subsystem. Both subsystems consist of two steps. The first step involves extracting relevant features from the honey sample using Linear Discriminant Analysis (LDA). In the second step, we utilize the K-Nearest Neighbors (KNN) model to classify the honey botanical origin in the first subsystem and identify the adulteration level in the second subsystem. We assess the proposed system performance on a public honey hyperspectral image dataset. The result indicates that the proposed system can detect adulteration in honey with an overall cross-validation accuracy of 96.39%, making it an appropriate alternative to the current chemical-based detection methods.


Enhanced Prediction of CAR T-Cell Cytotoxicity with Quantum-Kernel Methods

arXiv.org Artificial Intelligence

Chimeric antigen receptor (CAR) T-cells are T-cells engineered to recognize and kill specific tumor cells. Through their extracellular domains, CAR T-cells bind tumor cell antigens which triggers CAR T activation and proliferation. These processes are regulated by co-stimulatory domains present in the intracellular region of the CAR T-cell. Through integrating novel signaling components into the co-stimulatory domains, it is possible to modify CAR T-cell phenotype. Identifying and experimentally testing new CAR constructs based on libraries of co-stimulatory domains is nontrivial given the vast combinatorial space defined by such libraries. This leads to a highly data constrained, poorly explored combinatorial problem, where the experiments undersample all possible combinations. We propose a quantum approach using a Projected Quantum Kernel (PQK) to address this challenge. PQK operates by embedding classical data into a high dimensional Hilbert space and employs a kernel method to measure sample similarity. Using 61 qubits on a gate-based quantum computer, we demonstrate the largest PQK application to date and an enhancement in the classification performance over purely classical machine learning methods for CAR T cytotoxicity prediction. Importantly, we show improved learning for specific signaling domains and domain positions, particularly where there was lower information highlighting the potential for quantum computing in data-constrained problems.


Categorical Classification of Book Summaries Using Word Embedding Techniques

arXiv.org Artificial Intelligence

In this study, book summaries and categories taken from book sites were classified using word embedding methods, natural language processing techniques and machine learning algorithms. In addition, one hot encoding, Word2Vec and Term Frequency - Inverse Document Frequency (TF - IDF) methods, which are frequently used word embedding methods were used in this study and their success was compared. Additionally, the combination table of the pre - processing methods used is shown and added to the table. Looking at the results, it was observed that Support Vector Machine, Naive Bayes and Logistic Regression Models and TF - IDF and One - Hot Encoder word embedding techniques gave more successful results for Turkish texts. Using word2vec to process big text data.


Segmentation-free Goodness of Pronunciation

arXiv.org Artificial Intelligence

Mispronunciation detection and diagnosis (MDD) is a significant part in modern computer aided language learning (CALL) systems. Within MDD, phoneme-level pronunciation assessment is key to helping L2 learners improve their pronunciation. However, most systems are based on a form of goodness of pronunciation (GOP) which requires pre-segmentation of speech into phonetic units. This limits the accuracy of these methods and the possibility to use modern CTC-based acoustic models for their evaluation. In this study, we first propose self-alignment GOP (GOP-SA) that enables the use of CTC-trained ASR models for MDD. Next, we define a more general alignment-free method that takes all possible alignments of the target phoneme into account (GOP-AF). We give a theoretical account of our definition of GOP-AF, an implementation that solves potential numerical issues as well as a proper normalization which makes the method applicable with acoustic models with different peakiness over time. We provide extensive experimental results on the CMU Kids and Speechocean762 datasets comparing the different definitions of our methods, estimating the dependency of GOP-AF on the peakiness of the acoustic models and on the amount of context around the target phoneme. Finally, we compare our methods with recent studies over the Speechocean762 data showing that the feature vectors derived from the proposed method achieve state-of-the-art results on phoneme-level pronunciation assessment.


Automatic Cough Analysis for Non-Small Cell Lung Cancer Detection

arXiv.org Artificial Intelligence

Early detection of non-small cell lung cancer (NSCLC) is critical for improving patient outcomes, and novel approaches are needed to facilitate early diagnosis. In this study, we explore the use of automatic cough analysis as a pre-screening tool for distinguishing between NSCLC patients and healthy controls. Cough audio recordings were prospectively acquired from a total of 227 subjects, divided into NSCLC patients and healthy controls. The recordings were analyzed using machine learning techniques, such as support vector machine (SVM) and XGBoost, as well as deep learning approaches, specifically convolutional neural networks (CNN) and transfer learning with VGG16. To enhance the interpretability of the machine learning model, we utilized Shapley Additive Explanations (SHAP). The fairness of the models across demographic groups was assessed by comparing the performance of the best model across different age groups (less than or equal to 58y and higher than 58y) and gender using the equalized odds difference on the test set. The results demonstrate that CNN achieves the best performance, with an accuracy of 0.83 on the test set. Nevertheless, SVM achieves slightly lower performances (accuracy of 0.76 in validation and 0.78 in the test set), making it suitable in contexts with low computational power. The use of SHAP for SVM interpretation further enhances model transparency, making it more trustworthy for clinical applications. Fairness analysis shows slightly higher disparity across age (0.15) than gender (0.09) on the test set. Therefore, to strengthen our findings' reliability, a larger, more diverse, and unbiased dataset is needed -- particularly including individuals at risk of NSCLC and those in early disease stages.


Differentiated Thyroid Cancer Recurrence Classification Using Machine Learning Models and Bayesian Neural Networks with Varying Priors: A SHAP-Based Interpretation of the Best Performing Model

arXiv.org Artificial Intelligence

Differentiated thyroid cancer DTC recurrence is a major public health concern, requiring classification and predictive models that are not only accurate but also interpretable and uncertainty aware. This study introduces a comprehensive framework for DTC recurrence classification using a dataset containing 383 patients and 16 clinical and pathological variables. Initially, 11 machine learning ML models were employed using the complete dataset, where the Support Vector Machines SVM model achieved the highest accuracy of 0.9481. To reduce complexity and redundancy, feature selection was carried out using the Boruta algorithm, and the same ML models were applied to the reduced dataset, where it was observed that the Logistic Regression LR model obtained the maximum accuracy of 0.9611. However, these ML models often lack uncertainty quantification, which is critical in clinical decision making. Therefore, to address this limitation, the Bayesian Neural Networks BNN with six varying prior distributions, including Normal 0,1, Normal 0,10, Laplace 0,1, Cauchy 0,1, Cauchy 0,2.5, and Horseshoe 1, were implemented on both the complete and reduced datasets. The BNN model with Normal 0,10 prior distribution exhibited maximum accuracies of 0.9740 and 0.9870 before and after feature selection, respectively.


Asymmetric Lesion Detection with Geometric Patterns and CNN-SVM Classification

arXiv.org Artificial Intelligence

Accepted Manuscript: This is the peer - reviewed version of the article accepted for publication in Computers in Biology and Medicine . This manuscript version is made available under the CC BY - NC - ND license. Abstract In dermoscopic images, which allow visualization of surface skin structures not visible to the naked eye, lesion shape offers vital insights into skin diseases. In clinically practiced methods, asymmetric lesion shape is one of the criteria for diagnosing M elanoma. Initially, we labeled data for a non - annotated dataset with symmetrical information based on clinical assessments . Subsequently, we propose a supporting technique -- a supervised learning image processing algorithm -- to analyze the geometrical pattern of lesion shape, aiding non - experts in understanding the criteria of an asymmetric lesion. We then utilize a pre - trained convolutional neural network (CNN) to extract shape, color, and texture features from dermoscopic images for training a multiclass support vector machine (SVM) classifier, outperforming state - of - the - art methods from the literature. In the geometry - based experiment, we achieved a 99.00% detection rate for dermatological asymmetric lesions. In the CNN - based experiment, the best performance is found 9 4% Kappa Score, 95% Macro F1 - score, and 97 % weighted F1 - score for classifying lesion shapes ( A symmetric, H alf - S ymmetric, and S ymmetric). Introduction Dermatological asymmetry, a cornerstone in skin lesion assessment, refers to disparities observed in the shape, size, or color of moles or lesions [1, 2, 3] . In dermatology, careful examination of the lesion shape is critical, especially when it comes to the possibility that lesions are cancerous, such as Melanoma. The dermatological three - point - checklist for early skin cancer detection has showcased remarkable sensitivity in identifying Melanoma [ 2 ]. The presence of " asymmetry of color and structure in one or two perpendicular axes ", stands as the initial criterion of this checklist [ 2 ]. In this method, asymmetry evaluation entails scrutinizing lesions within a plane bisected by two axes set at 90, assigning a score ranging from 0 to 2 based on the number of axes exhibiting asymmetry in shape, color, or structure.


Quantum Cognition Machine Learning for Forecasting Chromosomal Instability

arXiv.org Artificial Intelligence

Unlike traditional tissue tests[1, 2], cell-based liquid biopsy assays enable selection of individual CTCs for the analysis of chromosomal instability using next-generation sequencing by quantification of large-scale state transitions (LST) [3-9]. Chromosomal instability is a genomic characteristic of cancer cells that drives tumor evolution and metastatic potential [10-19]. However, whole genome sequencing assays are laborious, requiring a complex workflow that invariably results in a considerable turnaround time that sometimes is not compatible with clinical practice [20]. A previous study has shown that we can partially predict chromosomal instability in individual cells by developing algorithms that analyze a range of features, including cell shape, size, morphology, and protein levels, from images of CTCs using an automated digital pathology pipeline [3]. Predicting chromosomal instability through morphology offers significant advantages; it can significantly reduce turnaround times compared to whole-genome assays, providing crucial information about the genomic characteristics of CTCs in a patient in a shorter timeframe [3]. Timely information on the presence of CTCs with the highest metastatic potential may be critical for making optimal clinical decisions. A key challenge in predicting chromosomal instability through morphology is the utilization of a machine-learning method that accurately classifies morphology patterns from all CTC features and provides a generalization and reproducibility, compatible with potential validation for clinical use [21-24]. Key limitations of commonly used machine learning techniques in biology applications, such as support vector machines (SVMs) with Gaussian kernels, include the following [21-24]: 1) The increase in dimensionality that arises from combinations of multiple features exponentially complicates the prediction task, as often seen with cell morphologies.


Surface EMG Profiling in Parkinson's Disease: Advancing Severity Assessment with GCN-SVM

arXiv.org Artificial Intelligence

Parkinson's disease (PD) poses challenges in diagnosis and monitoring due to its progressive nature and complex symptoms. This study introduces a novel approach utilizing surface electromyography (sEMG) to objectively assess PD severity, focusing on the biceps brachii muscle. Initial analysis of sEMG data from five PD patients and five healthy controls revealed significant neuromuscular differences. A traditional Support Vector Machine (SVM) model achieved up to 83% accuracy, while enhancements with a Graph Convolutional Network-Support Vector Machine (GCN-SVM) model increased accuracy to 92%. Despite the preliminary nature of these results, the study outlines a detailed experimental methodology for future research with larger cohorts to validate these findings and integrate the approach into clinical practice. The proposed approach holds promise for advancing PD severity assessment and improving patient care in Parkinson's disease management.


Reading Between the Lines: Combining Pause Dynamics and Semantic Coherence for Automated Assessment of Thought Disorder

arXiv.org Artificial Intelligence

Formal thought disorder (FTD), a hallmark of schizophrenia spectrum disorders, manifests as incoherent speech and poses challenges for clinical assessment. Traditional clinical rating scales, though validated, are resource-intensive and lack scalability. Automated speech analysis with automatic speech recognition (ASR) allows for objective quantification of linguistic and temporal features of speech, offering scalable alternatives. The use of utterance timestamps in ASR captures pause dynamics, which are thought to reflect the cognitive processes underlying speech production. However, the utility of integrating these ASR-derived features for assessing FTD severity requires further evaluation. This study integrates pause features with semantic coherence metrics across three datasets: naturalistic self-recorded diaries (AVH, n = 140), structured picture descriptions (TOPSY, n = 72), and dream narratives (PsyCL, n = 43). We evaluated pause related features alongside established coherence measures, using support vector regression (SVR) to predict clinical FTD scores. Key findings demonstrate that pause features alone robustly predict the severity of FTD. Integrating pause features with semantic coherence metrics enhanced predictive performance compared to semantic-only models, with integration of independent models achieving correlations up to \r{ho} = 0.649 and AUC = 83.71% for severe cases detection (TOPSY, with best \r{ho} = 0.584 and AUC = 79.23% for semantic-only models). The performance gains from semantic and pause features integration held consistently across all contexts, though the nature of pause patterns was dataset-dependent. These findings suggest that frameworks combining temporal and semantic analyses provide a roadmap for refining the assessment of disorganized speech and advance automated speech analysis in psychosis.