selectkbest
Multilingual Lexical Feature Analysis of Spoken Language for Predicting Major Depression Symptom Severity
Tokareva, Anastasiia, Dineley, Judith, Firth, Zoe, Conde, Pauline, Matcham, Faith, Siddi, Sara, Lamers, Femke, Carr, Ewan, Oetzmann, Carolin, Leightley, Daniel, Zhang, Yuezhou, Folarin, Amos A., Haro, Josep Maria, Penninx, Brenda W. J. H., Bailon, Raquel, Vairavan, Srinivasan, Wykes, Til, Dobson, Richard J. B., Narayan, Vaibhav A., Hotopf, Matthew, Cummins, Nicholas, Consortium, The RADAR-CNS
Background: Captured between clinical appointments using mobile devices, spoken language has potential for objective, more regular assessment of symptom severity and earlier detection of relapse in major depressive disorder. However, research to date has largely been in non-clinical cross-sectional samples of written language using complex machine learning (ML) approaches with limited interpretability. Methods: We describe an initial exploratory analysis of longitudinal speech data and PHQ-8 assessments from 5,836 recordings of 586 participants in the UK, Netherlands, and Spain, collected in the RADAR-MDD study. We sought to identify interpretable lexical features associated with MDD symptom severity with linear mixed-effects modelling. Interpretable features and high-dimensional vector embeddings were also used to test the prediction performance of four regressor ML models. Results: In English data, MDD symptom severity was associated with 7 features including lexical diversity measures and absolutist language. In Dutch, associations were observed with words per sentence and positive word frequency; no associations were observed in recordings collected in Spain. The predictive power of lexical features and vector embeddings was near chance level across all languages. Limitations: Smaller samples in non-English speech and methodological choices, such as the elicitation prompt, may have also limited the effect sizes observable. A lack of NLP tools in languages other than English restricted our feature choice. Conclusion: To understand the value of lexical markers in clinical research and practice, further research is needed in larger samples across several languages using improved protocols, and ML models that account for within- and between-individual variations in language.
- Europe > United Kingdom > England > Greater London > London (0.28)
- North America > United States > Texas > Travis County > Austin (0.14)
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
Deep Fast Machine Learning Utils: A Python Library for Streamlined Machine Learning Prototyping
Machine learning (ML) research and application often involve time-consuming steps such as model architecture prototyping, feature selection, and dataset preparation. To support these tasks, we introduce the Deep Fast Machine Learning Utils (DFMLU) library, which provides tools designed to automate and enhance aspects of these processes. Compatible with frameworks like TensorFlow, Keras, and Scikit-learn, DFMLU offers functionalities that support model development and data handling. The library includes methods for dense neural network search, advanced feature selection, and utilities for data management and visualization of training outcomes. This manuscript presents an overview of DFMLU's functionalities, providing Python examples for each tool.
- Research Report (0.64)
- Overview (0.54)
A novel feature selection method based on quantum support vector machine
Feature selection is critical in machine learning to reduce dimensionality and improve model accuracy and efficiency. The exponential growth in feature space dimensionality for modern datasets directly results in ambiguous samples and redundant features, which can severely degrade classification accuracy. Quantum machine learning offers potential advantages for addressing this challenge. In this paper, we propose a novel method, quantum support vector machine feature selection (QSVMF), integrating quantum support vector machines with multi-objective genetic algorithm. QSVMF optimizes multiple simultaneous objectives: maximizing classification accuracy, minimizing selected features and quantum circuit costs, and reducing feature covariance. We apply QSVMF for feature selection on a breast cancer dataset, comparing the performance of QSVMF against classical approaches with the selected features. Experimental results show that QSVMF achieves superior performance. Furthermore, The Pareto front solutions of QSVMF enable analysis of accuracy versus feature set size trade-offs, identifying extremely sparse yet accurate feature subsets. We contextualize the biological relevance of the selected features in terms of known breast cancer biomarkers. This work highlights the potential of quantum-based feature selection to enhance machine learning efficiency and performance on complex real-world data.
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- Africa > Sudan > Northern State > Dongola (0.04)
Building a Basic Machine Learning Model in Python
By now, all of us have seen the results of various basic machine learning (ML) models. The internet is rife with images, videos, and articles showing off how a computer identifies, correctly or not, various animals. While we have moved towards more intricate machine learning models, such as ones that generate or upscale images, those basic ones still form the foundation of those efforts. Mastering the basics can become a launchpad for much greater future endeavors. So, I decided to revisit the basics myself and build a basic machine learning model with several caveats -- it must be somewhat useful, as simplistic as possible, and return reasonably accurate results. Unlike many other tutorials on the internet, however, I want to present my entire thought process from beginning to end. As such, the coding part will begin quite a bit later as problem selection in both the theoretical and practical realm is equally important. In the end, I believe that understanding why will go further than how to. Although machine learning can solve a great deal of challenges, it's not a one-size-fits-all approach. Even if we were to temporarily forget about the financial, temporal, and other resource costs, ML models would still be great at some things and terrible at others. Categorization is a great example of where machine learning may shine.
Chi-Squared For Feature Selection using SelectKBest
In this video, I'll show you how SelectKBest uses Chi-squared test for feature selection for categorical features & target columns. We calculate Chi-square between each feature & the target & select the desired number of features with best Chi-square scores or the lowest p-values. The Chi-squared (χ2) test is used in statistics to test the independence of two events. More specifically in feature selection we use it to test whether the occurrence of a specific feature & the target are independent or not. For each feature & target combination, a corresponding high χ2 chi-square score or a low p-value indicates that the target column is dependent on the feature column.