handcrafted feature
Cross-Corpus Validation of Speech Emotion Recognition in Urdu using Domain-Knowledge Acoustic Features
Talpur, Unzela, Syed, Zafi Sherhan, Syed, Muhammad Shehram Shah, Syed, Abbas Shah
Speech Emotion Recognition (SER) is a key affective computing technology that enables emotionally intelligent artificial intelligence. While SER is challenging in general, it is particularly difficult for low-resource languages such as Urdu. This study investigates Urdu SER in a cross-corpus setting, an area that has remained largely unexplored. We employ a cross-corpus evaluation framework across three different Urdu emotional speech datasets to test model generalization. Two standard domain-knowledge based acoustic feature sets, eGeMAPS and ComParE, are used to represent speech signals as feature vectors which are then passed to Logistic Regression and Multilayer Perceptron classifiers. Classification performance is assessed using unweighted average recall (UAR) whilst considering class-label imbalance. Results show that Self-corpus validation often overestimates performance, with UAR exceeding cross-corpus evaluation by up to 13%, underscoring that cross-corpus evaluation offers a more realistic measure of model robustness. Overall, this work emphasizes the importance of cross-corpus validation for Urdu SER and its implications contribute to advancing affective computing research for underrepresented language communities.
- Europe > United Kingdom > England > Greater London > London (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- (2 more...)
"Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue
Ngo, Anh, Rollet, Nicolas, Pelachaud, Catherine, Clavel, Chloe
Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker signals trouble and prompts the other to resolve), plays a vital role. However, Conversational Agents (CAs) still fail to recognize user repair initiation, leading to breakdowns or disengagement. This work proposes a multimodal model to automatically detect repair initiation in Dutch dialogues by integrating linguistic and prosodic features grounded in Conversation Analysis. The results show that prosodic cues complement linguistic features and significantly improve the results of pretrained text and audio embeddings, offering insights into how different features interact. Future directions include incorporating visual cues, exploring multilingual and cross-context corpora to assess the robustness and generalizability.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > California (0.04)
- (8 more...)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)
Learning and Interpreting Gravitational-Wave Features from CNNs with a Random Forest Approach
Tian, Jun, Wang, He, He, Jibo, Pan, Yu, Cao, Shuo, Jiang, Qingquan
Convolutional neural networks (CNNs) have become widely adopted in gravitational wave (GW) detection pipelines due to their ability to automatically learn hierarchical features from raw strain data. However, the physical meaning of these learned features remains underexplored, limiting the interpretability of such models. In this work, we propose a hybrid architecture that combines a CNN-based feature extractor with a random forest (RF) classifier to improve both detection performance and interpretability. Unlike prior approaches that directly connect classifiers to CNN outputs, our method introduces four physically interpretable metrics - variance, signal-to-noise ratio (SNR), waveform overlap, and peak amplitude - computed from the final convolutional layer. These are jointly used with the CNN output in the RF classifier to enable more informed decision boundaries. Tested on long-duration strain datasets, our hybrid model outperforms a baseline CNN model, achieving a relative improvement of 21\% in sensitivity at a fixed false alarm rate of 10 events per month. Notably, it also shows improved detection of low-SNR signals (SNR $\le$ 10), which are especially vulnerable to misclassification in noisy environments. Feature attribution via the RF model reveals that both CNN-extracted and handcrafted features contribute significantly to classification decisions, with learned variance and CNN outputs ranked among the most informative. These findings suggest that physically motivated post-processing of CNN feature maps can serve as a valuable tool for interpretable and efficient GW detection, bridging the gap between deep learning and domain knowledge.
- North America > United States (0.14)
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- (4 more...)
Enhancing GraphQL Security by Detecting Malicious Queries Using Large Language Models, Sentence Transformers, and Convolutional Neural Networks
Perera, Irash, Abeyrathne, Hiranya, Malalgoda, Sanjeewa, Ifthikar, Arshardh
Abstract--GraphQL's flexibility, while beneficial for efficient data fetching, introduces unique security vulnerabilities that traditional API security mechanisms often fail to address. Malicious GraphQL queries can exploit the language's dynamic nature, leading to denial-of-service attacks, data exfiltration through injection, and other exploits. This paper presents a novel, AI-driven approach for real-time detection of malicious GraphQL queries. Our method combines static analysis with machine learning techniques, including Large Language Models (LLMs) for dynamic schema-based configuration, Sentence Transformers (SBERT and Doc2V ec) for contextual embedding of query payloads, and Convolutional Neural Networks (CNNs), Random Forests, and Multilayer Perceptrons for classification. We detail the system architecture, implementation strategies optimized for production environments (including ONNX Runtime optimization and parallel processing), and evaluate the performance of our detection models and the overall system under load. Results demonstrate high accuracy in detecting various threats, including SQL injection, OS command injection, and XSS exploits, alongside effective mitigation of DoS and SSRF attempts. This research contributes a robust and adaptable solution for enhancing GraphQL API security. The adoption of GraphQL has grown due to its efficiency in allowing clients to request specific data, which optimizes data transfer.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper proposes an algorithm to learn a distance metric for time series alignment. The proposed method falls into the structured output prediction framework, and is solved by a combination of convex optimization and dynamic programming. The method is evaluated on synthetic and realistic audio alignment tasks, and demonstrates significant improvement over baseline methods. Overall, this paper presents an interesting method for a real problem faced by practitioners dealing with time-series alignment tasks. The paper is generally well written and easy to follow, although a few points could be stated more clearly.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The authors compare results of shallow networks versus deep networks on the task at hand and demonstrate better performance for deep networks. They also use the scores of the shallow versus deep networks to analyse the role of basic vs. handcrafted features. GENERAL OPINION: This is a very straightforward paper that applies an existing technique on a novel, relevant and interesting application. The technical quality and clarity are very sound. Even though there is no real novelty on the side of machine learning research itself, in my opinion the introduction of a new application domain for deep learning in an extremely relevant scientific field in itself warrants acceptance. One caveat is that this reviewer has a background in physics, and is hence quite familiar with (and enthusiastic about) the concepts explained in the introduction. Whereas I found this explanation very clear, I cannot really speak for the rest of the NIPS community. DETAILED COMMENTS: - If possible, it might be informative to provide a depiction that illustrates the raw data one gets from a typical collision event (an illustration of the detected particles emerging from a single collision). This might give a better feel of the kind of data that is dealt with.
Robustness and sex differences in skin cancer detection: logistic regression vs CNNs
Pedersen, Nikolette, Sydendal, Regitze, Wulff, Andreas, Raumanns, Ralf, Petersen, Eike, Cheplygina, Veronika
Deep learning has been reported to achieve high performances in the detection of skin cancer, yet many challenges regarding the reproducibility of results and biases remain. This study is a replication (different data, same analysis) of a previous study on Alzheimer's disease detection, which studied the robustness of logistic regression (LR) and convolutional neural networks (CNN) across patient sexes. We explore sex bias in skin cancer detection, using the PAD-UFES-20 dataset with LR trained on handcrafted features reflecting dermatological guidelines (ABCDE and the 7-point checklist), and a pre-trained ResNet-50 model. We evaluate these models in alignment with the replicated study: across multiple training datasets with varied sex composition to determine their robustness. Our results show that both the LR and the CNN were robust to the sex distribution, but the results also revealed that the CNN had a significantly higher accuracy (ACC) and area under the receiver operating characteristics (AUROC) for male patients compared to female patients. The data and relevant scripts to reproduce our results are publicly available (https://github.com/
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > Germany (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Health & Medicine > Therapeutic Area > Dermatology (1.00)
- Health & Medicine > Therapeutic Area > Oncology > Skin Cancer (0.95)
An Effective Strategy for Modeling Score Ordinality and Non-uniform Intervals in Automated Speaking Assessment
Lo, Tien-Hong, Chen, Szu-Yu, Sung, Yao-Ting, Chen, Berlin
Abstract--A recent line of research on automated speaking assessment (ASA) has benefited from self-supervised learning (SSL) representations, which capture rich acoustic and linguistic patterns in non-native speech without underlying assumptions of feature curation. However, speech-based SSL models capture acoustic-related traits but overlook linguistic content, while text-based SSL models rely on ASR output and fail to encode prosodic nuances. Moreover, most prior arts treat proficiency levels as nominal classes, ignoring their ordinal structure and non-uniform intervals between proficiency labels. T o address these limitations, we propose an effective ASA approach combining SSL with handcrafted indicator features via a novel modeling paradigm. We further introduce a multi-margin ordinal loss that jointly models both the score ordinality and non-uniform intervals of proficiency labels. Extensive experiments on the TEEMI corpus show that our method consistently outperforms strong baselines and generalizes well to unseen prompts.
- Asia > Taiwan (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Europe > Czechia (0.04)
Classification of 24-hour movement behaviors from wrist-worn accelerometer data: from handcrafted features to deep learning techniques
Sameh, Alireza, Rostami, Mehrdad, Oussalah, Mourad, Farrahi, Vahid
Purpose: We compared the performance of deep learning (DL) and classical machine learning (ML) algorithms for the classification of 24-hour movement behavior into sleep, sedentary, light intensity physical activity (LPA), and moderate-to-vigorous intensity physical activity (MVPA). Methods: Open-access data from 151 adults wearing a wrist-worn accelerometer (Axivity-AX3) was used. Participants were randomly divided into training, validation, and test sets (121, 15, and 15 participants each). Raw acceleration signals were segmented into non-overlapping 10-second windows, and then a total of 104 handcrafted features were extracted. Four DL algorithms-Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), Gated Recurrent Units (GRU), and One-Dimensional Convolutional Neural Network (1D-CNN)-were trained using raw acceleration signals and with handcrafted features extracted from these signals to predict 24-hour movement behavior categories. The handcrafted features were also used to train classical ML algorithms, namely Random Forest (RF), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression (LR), Artificial Neural Network (ANN), and Decision Tree (DT) for classifying 24-hour movement behavior intensities. Results: LSTM, BiLSTM, and GRU showed an overall accuracy of approximately 85% when trained with raw acceleration signals, and 1D-CNN an overall accuracy of approximately 80%. When trained on handcrafted features, the overall accuracy for both DL and classical ML algorithms ranged from 70% to 81%. Overall, there was a higher confusion in classification of MVPA and LPA, compared to sleep and sedentary categories. Conclusion: DL methods with raw acceleration signals had only slightly better performance in predicting 24-hour movement behavior intensities, compared to when DL and classical ML were trained with handcrafted features.
- Europe > Finland > Northern Ostrobothnia > Oulu (0.05)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
- Health & Medicine > Consumer Health (0.57)
- Health & Medicine > Therapeutic Area (0.46)
DRetNet: A Novel Deep Learning Framework for Diabetic Retinopathy Diagnosis
Okuwobi, Idowu Paul, Liu, Jingyuan, Wan, Jifeng, Jiang, Jiaojiao
Diabetic retinopathy (DR) is a leading cause of blindness worldwide, necessitating early detection to prevent vision loss. Current automated DR detection systems often struggle with poor-quality images, lack interpretability, and insufficient integration of domain-specific knowledge. To address these challenges, we introduce a novel framework that integrates three innovative contributions: (1) Adaptive Retinal Image Enhancement Using Physics-Informed Neural Networks (PINNs): this technique dynamically enhances retinal images by incorporating physical constraints, improving the visibility of critical features such as microaneurysms, hemorrhages, and exudates; (2) Hybrid Feature Fusion Network (HFFN): by combining deep learning embeddings with handcrafted features, HFFN leverages both learned representations and domain-specific knowledge to enhance generalization and accuracy; (3) Multi-Stage Classifier with Uncertainty Quantification: this method breaks down the classification process into logical stages, providing interpretable predictions and confidence scores, thereby improving clinical trust. The proposed framework achieves an accuracy of 92.7%, a precision of 92.5%, a recall of 92.6%, an F1-score of 92.5%, an AUC of 97.8%, a mAP of 0.96, and an MCC of 0.85. Ophthalmologists rated the framework's predictions as highly clinically relevant (4.8/5), highlighting its alignment with real-world diagnostic needs. Qualitative analyses, including Grad-CAM visualizations and uncertainty heatmaps, further enhance the interpretability and trustworthiness of the system. The framework demonstrates robust performance across diverse conditions, including low-quality images, noisy data, and unseen datasets. These features make the proposed framework a promising tool for clinical adoption, enabling more accurate and reliable DR detection in resource-limited settings.
- North America > United States > Wisconsin (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Asia > China > Guangxi Province (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
- Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.73)