Overview
The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification
Moell, Birger, Aronsson, Fredrik Sand, Östberg, Per, Beskow, Jonas
Background:Speech patterns have emerged as potential diagnostic markers for conditions with varying etiologies. Machine learning (ML) presents an opportunity to harness these patterns for accurate disease diagnosis. Objective: This review synthesized findings from studies exploring ML's capability in leveraging speech for the diagnosis of neurological, laryngeal and mental disorders. Methods: A systematic examination of 564 articles was conducted with 91 articles included in the study, which encompassed a wide spectrum of conditions, ranging from voice pathologies to mental and neurological disorders. Methods for speech classifications were assessed based on the relevant studies and scored between 0-10 based on the reported diagnostic accuracy of their ML models. Results: High diagnostic accuracies were consistently observed for laryngeal disorders, dysarthria, and changes related to speech in Parkinsons disease. These findings indicate the robust potential of speech as a diagnostic tool. Disorders like depression, schizophrenia, mild cognitive impairment and Alzheimers dementia also demonstrated high accuracies, albeit with some variability across studies. Meanwhile, disorders like OCD and autism highlighted the need for more extensive research to ascertain the relationship between speech patterns and the respective conditions. Conclusion: ML models utilizing speech patterns demonstrate promising potential in diagnosing a range of mental, laryngeal, and neurological disorders. However, the efficacy varies across conditions, and further research is needed. The integration of these models into clinical practice could potentially revolutionize the evaluation and diagnosis of a number of different medical conditions.
Direct Speech to Speech Translation: A Review
Sarim, Mohammad, Shakeel, Saim, Javed, Laeeba, Jamaluddin, null, Nadeem, Mohammad
Speech to speech translation (S2ST) is a transformative technology that bridges global communication gaps, enabling real time multilingual interactions in diplomacy, tourism, and international trade. Our review examines the evolution of S2ST, comparing traditional cascade models which rely on automatic speech recognition (ASR), machine translation (MT), and text to speech (TTS) components with newer end to end and direct speech translation (DST) models that bypass intermediate text representations. While cascade models offer modularity and optimized components, they suffer from error propagation, increased latency, and loss of prosody. In contrast, direct S2ST models retain speaker identity, reduce latency, and improve translation naturalness by preserving vocal characteristics and prosody. However, they remain limited by data sparsity, high computational costs, and generalization challenges for low-resource languages. The current work critically evaluates these approaches, their tradeoffs, and future directions for improving real time multilingual communication.
Machine Learning Applications to Diffuse Reflectance Spectroscopy in Optical Diagnosis; A Systematic Review
Rossberg, Nicola, Li, Celina L., Innocente, Simone, Andersson-Engels, Stefan, Komolibus, Katarzyna, O'Sullivan, Barry, Visentin, Andrea
Its noninvasive nature and sensitivity to absorption related to tissue biomolecular content and scattering change, associated with subcellular morphology, make it an extremely powerful tool to analyse tissue composition, microstructure or oxygenation status, offering promising performance in applications such as cancer diagnostics and surgical guidance [1, 30, 85, 121]. DRS signals are measured by delivering a typically white light source into the tissue and detecting diffusely reflected signals at a certain distance from the source, where the distance between the emitting and receiving fibres determines the tissue depth probed. Depending on the application and clinical objective, multiple illumination or detection fibres can be used to obtain more quantitative information and probe different depths. The light delivery and collection from tissue are often handled using optical fibres or fibre bundles. When incident on the tissue, the light undergoes scattering and absorption processes, which alter the light intensity across the measured spectrum [75, 121].
ClipGrader: Leveraging Vision-Language Models for Robust Label Quality Assessment in Object Detection
Lu, Hong, Bian, Yali, Shah, Rahul C.
A BSTRACT High-quality annotations are essential for object detection models, but ensuring label accuracy -- especially for bounding boxes -- remains both challenging and costly. This paper introduces ClipGrader, a novel approach that leverages vision-language models to automatically assess the accuracy of bounding box annotations. By adapting CLIP (Contrastive Language-Image Pre-training) to evaluate both class label correctness and spatial precision of bounding box, ClipGrader offers an effective solution for grading object detection labels. Tested on modified object detection datasets with artificially disturbed bounding boxes, Clip-Grader achieves 91% accuracy on COCO with a 1.8% false positive rate. Moreover, it maintains 87% accuracy with a 2.1% false positive rate when trained on just 10% of the COCO data. Our experiments demonstrate ClipGrader's ability to identify errors in existing COCO annotations, highlighting its potential for dataset refinement. When integrated into a semi-supervised object detection (SSOD) model, ClipGrader readily improves the pseudo label quality, helping achieve higher mAP (mean Average Precision) throughout the training process. ClipGrader thus provides a scalable AIassisted tool for enhancing annotation quality control and verifying annotations in large-scale object detection datasets. In object detection, a fundamental task in computer vision, the accuracy of annotations which encompasses both the correctness of the class label and spatial precision of the bounding box is crucial. However, curating high-quality object detection datasets is a significant challenge due to the time-consuming and expensive nature of manual annotation processes, not to mention the inevitability of errors creeping in (Kuznetsova et al., 2018; V ondrick et al., 2013). Existing approaches to data collection and annotation, such as crowd sourcing, web scraping, or AI-generated labels, often introduce noise and inconsistencies, potentially compromising model performance (Papadopoulos et al., 2016; Northcutt et al., 2021; Zare & Y azdi, 2022). With the increasing complexity and scale of datasets, traditional methods of quality control such as manual reviews or simple heuristics struggle to meet the demand.
AxBERT: An Interpretable Chinese Spelling Correction Method Driven by Associative Knowledge Network
Wang, Fanyu, Zhu, Hangyu, Xie, Zhenping
Deep learning has shown promising performance on various machine learning tasks. Nevertheless, the uninterpretability of deep learning models severely restricts the usage domains that require feature explanations, such as text correction. Therefore, a novel interpretable deep learning model (named AxBERT) is proposed for Chinese spelling correction by aligning with an associative knowledge network (AKN). Wherein AKN is constructed based on the co-occurrence relations among Chinese characters, which denotes the interpretable statistic logic contrasted with uninterpretable BERT logic. And a translator matrix between BERT and AKN is introduced for the alignment and regulation of the attention component in BERT. In addition, a weight regulator is designed to adjust the attention distributions in BERT to appropriately model the sentence semantics. Experimental results on SIGHAN datasets demonstrate that AxBERT can achieve extraordinary performance, especially upon model precision compared to baselines. Our interpretable analysis, together with qualitative reasoning, can effectively illustrate the interpretability of AxBERT.
Network Traffic Classification Using Machine Learning, Transformer, and Large Language Models
Antari, Ahmad, Abo-Aisheh, Yazan, Shamasneh, Jehad, Ashqar, Huthaifa I.
This study uses various models to address network traffic classification, categorizing traffic into web, browsing, IPSec, backup, and email . We collected a comprehensive dataset from Arbor Edge Defender (AED) devices, comprising of 30,959 observations and 19 features. Multiple models were evaluated, including Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, XGBoost, Deep Neural Networks (DNN), Transformer, and two Large Language Models (LLMs) including GPT - 4o and Gemini with zero - and few - shot learning. Transformer and XGBoost showed the best performance, achieving the highest accuracy of 98.95 and 97.56%, respectively . GPT - 4o and Gemini showed promising results with few - shot learning, improving accuracy significantly from initial zero - shot performance. While Gemini Few - Shot and GPT - 4 o Few - Shot performed well in categories like Web and Email, misclassifications occurred in more complex categories like IPSec and Backup. The study highlights the importance of model selection, fine - tuning, and the balance between training data siz e and model complexity for achieving reliable classification results.
Biomedical Foundation Model: A Survey
Liu, Xiangrui, Zhang, Yuanyuan, Lu, Yingzhou, Yin, Changchang, Hu, Xiaoling, Liu, Xiaoou, Chen, Lulu, Wang, Sheng, Rodriguez, Alexander, Yao, Huaxiu, Yang, Yezhou, Zhang, Ping, Chen, Jintai, Fu, Tianfan, Wang, Xiao
Foundation models, first introduced in 2021, are large-scale pre-trained models (e.g., large language models (LLMs) and vision-language models (VLMs)) that learn from extensive unlabeled datasets through unsupervised methods, enabling them to excel in diverse downstream tasks. These models, like GPT, can be adapted to various applications such as question answering and visual understanding, outperforming task-specific AI models and earning their name due to broad applicability across fields. The development of biomedical foundation models marks a significant milestone in leveraging artificial intelligence (AI) to understand complex biological phenomena and advance medical research and practice. This survey explores the potential of foundation models across diverse domains within biomedical fields, including computational biology, drug discovery and development, clinical informatics, medical imaging, and public health. The purpose of this survey is to inspire ongoing research in the application of foundation models to health science.
Twenty Years of Personality Computing: Threats, Challenges and Future Directions
Celli, Fabio, Kartelj, Aleksandar, Đorđević, Miljan, Suhartono, Derwin, Filipović, Vladimir, Milutinović, Veljko, Spathoulas, Georgios, Vinciarelli, Alessandro, Kosinski, Michal, Lepri, Bruno
Personality Computing is a field at the intersection of Personality Psychology and Computer Science. Started in 2005, research in the field utilizes computational methods to understand and predict human personality traits. The expansion of the field has been very rapid and, by analyzing digital footprints (text, images, social media, etc.), it helped to develop systems that recognize and even replicate human personality. While offering promising applications in talent recruiting, marketing and healthcare, the ethical implications of Personality Computing are significant. Concerns include data privacy, algorithmic bias, and the potential for manipulation by personality-aware Artificial Intelligence. This paper provides an overview of the field, explores key methodologies, discusses the challenges and threats, and outlines potential future directions for responsible development and deployment of Personality Computing technologies.
Linear Representations of Political Perspective Emerge in Large Language Models
Kim, Junsol, Evans, James, Schein, Aaron
Large language models (LLMs) have demonstrated the ability to generate text that realistically reflects a range of different subjective human perspectives. This paper studies how LLMs are seemingly able to reflect more liberal versus more conservative viewpoints among other political perspectives in American politics. We show that LLMs possess linear representations of political perspectives within activation space, wherein more similar perspectives are represented closer together. To do so, we probe the attention heads across the layers of three open transformerbased LLMs (Llama-2-7b-chat, Mistral-7b-instruct, Vicuna-7b). We first prompt models to generate text from the perspectives of different U.S. lawmakers. We then identify sets of attention heads whose activations linearly predict those lawmakers' DW-NOMINATE scores, a widely-used and validated measure of political ideology. We find that highly predictive heads are primarily located in the middle layers, often speculated to encode high-level concepts and tasks. Using probes only trained to predict lawmakers' ideology, we then show that the same probes can predict measures of news outlets' slant from the activations of models prompted to simulate text from those news outlets. These linear probes allow us to visualize, interpret, and monitor ideological stances implicitly adopted by an LLM as it generates open-ended responses. Finally, we demonstrate that by applying linear interventions to these attention heads, we can steer the model outputs toward a more liberal or conservative stance. Overall, our research suggests that LLMs possess a high-level linear representation of American political ideology and that by leveraging recent advances in mechanistic interpretability, we can identify, monitor, and steer the subjective perspective underlying generated text. Large language models (LLMs) have demonstrated the ability to generate text that reflects a range of different subjective perspectives (Argyle et al., 2023b; Gao et al., 2024). This paper examines whether LLMs possess general representations of political perspective in activation space, whether such representations are linear, and whether they can be used to steer model outputs. Specifically, we show that LLMs possess a linear representation of the "liberal-conservative" political axis in American politics. It is widely believed for LLMs that "important" concepts are encoded linearly as directions in activation space (Mikolov et al., 2013; Nanda et al., 2023; Elhage et al., 2022; Gurnee & Tegmark, 2024; Park et al., 2024b). Assistant: On the one hand, those who support immigration On one hand, pro - choice advoc ates argue that a woman argue that it can bring many benefits to a country. On the other ha rt ages in certain industries, such as healthcare or nd, pro - life advoc ates argue that abortion is the intent io technology.
CareerBERT: Matching Resumes to ESCO Jobs in a Shared Embedding Space for Generic Job Recommendations
Rosenberger, Julian, Wolfrum, Lukas, Weinzierl, Sven, Kraus, Mathias, Zschech, Patrick
The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT's performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.