Goto

Collaborating Authors

 Performance Analysis


3D Brain MRI Classification for Alzheimer Diagnosis Using CNN with Data Augmentation

arXiv.org Artificial Intelligence

A three-dimensional convolutional neural network was developed to classify T1-weighted brain MRI scans as healthy or Alzheimer. The network comprises 3D convolution, pooling, batch normalization, dense ReLU layers, and a sigmoid output. Using stochastic noise injection and five-fold cross-validation, the model achieved test set accuracy of 0.912 and area under the ROC curve of 0.961, an improvement of approximately 0.027 over resizing alone. Sensitivity and specificity both exceeded 0.90. These results align with prior work reporting up to 0.10 gain via synthetic augmentation. The findings demonstrate the effectiveness of simple augmentation for 3D MRI classification and motivate future exploration of advanced augmentation methods and architectures such as 3D U-Net and vision transformers.


From What to Respond to When to Respond: Timely Response Generation for Open-domain Dialogue Agents

arXiv.org Artificial Intelligence

While research on dialogue response generation has primarily focused on generating coherent responses conditioning on textual context, the critical question of when to respond grounded on the temporal context remains underexplored. To bridge this gap, we propose a novel task called timely dialogue response generation and introduce the TimelyChat benchmark, which evaluates the capabilities of language models to predict appropriate time intervals and generate time-conditioned responses. Additionally, we construct a large-scale training dataset by leveraging unlabeled event knowledge from a temporal commonsense knowledge graph and employing a large language model (LLM) to synthesize 55K event-driven dialogues. We then train Timer, a dialogue agent designed to proactively predict time intervals and generate timely responses that align with those intervals. Experimental results show that Timer outperforms prompting-based LLMs and other fine-tuned baselines in both turn-level and dialogue-level evaluations. We publicly release our data, model, and code.


Making deep neural networks work for medical audio: representation, compression and domain adaptation

arXiv.org Artificial Intelligence

This thesis addresses the technical challenges of applying machine learning to understand and interpret medical audio signals. The sounds of our lungs, heart, and voice convey vital information about our health. Yet, in contemporary medicine, these sounds are primarily analyzed through auditory interpretation by experts using devices like stethoscopes. Automated analysis offers the potential to standardize the processing of medical sounds, enable screening in low-resource settings where physicians are scarce, and detect subtle patterns that may elude human perception, thereby facilitating early diagnosis and treatment. Focusing on the analysis of infant cry sounds to predict medical conditions, this thesis contributes on four key fronts. First, in low-data settings, we demonstrate that large databases of adult speech can be harnessed through neural transfer learning to develop more accurate and robust models for infant cry analysis. Second, in cost-effective modeling, we introduce an end-to-end model compression approach for recurrent networks using tensor decomposition. Our method requires no post-hoc processing, achieves compression rates of several hundred-fold, and delivers accurate, portable models suitable for resource-constrained devices. Third, we propose novel domain adaptation techniques tailored for audio models and adapt existing methods from computer vision. These approaches address dataset bias and enhance generalization across domains while maintaining strong performance on the original data. Finally, to advance research in this domain, we release a unique, open-source dataset of infant cry sounds, developed in collaboration with clinicians worldwide. This work lays the foundation for recognizing the infant cry as a vital sign and highlights the transformative potential of AI-driven audio monitoring in shaping the future of accessible and affordable healthcare.


Density-aware Walks for Coordinated Campaign Detection

arXiv.org Artificial Intelligence

Coordinated campaigns frequently exploit social media platforms by artificially amplifying topics, making inauthentic trends appear organic, and misleading users into engagement. Distinguishing these coordinated efforts from genuine public discourse remains a significant challenge due to the sophisticated nature of such attacks. Our work focuses on detecting coordinated campaigns by modeling the problem as a graph classification task. We leverage the recently introduced Large Engagement Networks (LEN) dataset, which contains over 300 networks capturing engagement patterns from both fake and authentic trends on Twitter prior to the 2023 Turkish elections. The graphs in LEN were constructed by collecting interactions related to campaigns that stemmed from ephemeral astroturfing. Established graph neural networks (GNNs) struggle to accurately classify campaign graphs, highlighting the challenges posed by LEN due to the large size of its networks. To address this, we introduce a new graph classification method that leverages the density of local network structures. We propose a random weighted walk (RWW) approach in which node transitions are biased by local density measures such as degree, core number, or truss number. These RWWs are encoded using the Skip-gram model, producing density-aware structural embeddings for the nodes. Training message-passing neural networks (MPNNs) on these density-aware embeddings yields superior results compared to the simpler node features available in the dataset, with nearly a 12\% and 5\% improvement in accuracy for binary and multiclass classification, respectively. Our findings demonstrate that incorporating density-aware structural encoding with MPNNs provides a robust framework for identifying coordinated inauthentic behavior on social media networks such as Twitter.


KGMark: A Diffusion Watermark for Knowledge Graphs

arXiv.org Artificial Intelligence

Knowledge graphs (KGs) are ubiquitous in numerous real-world applications, and watermarking facilitates protecting intellectual property and preventing potential harm from AI-generated content. Existing watermarking methods mainly focus on static plain text or image data, while they can hardly be applied to dynamic graphs due to spatial and temporal variations of structured data. This motivates us to propose KGMARK, the first graph watermarking framework that aims to generate robust, detectable, and transparent diffusion fingerprints for dynamic KG data. Specifically, we propose a novel clustering-based alignment method to adapt the watermark to spatial variations. Meanwhile, we present a redundant embedding strategy to harden the diffusion watermark against various attacks, facilitating the robustness of the watermark to the temporal variations. Additionally, we introduce a novel learnable mask matrix to improve the transparency of diffusion fingerprints. By doing so, our KGMARK properly tackles the variation challenges of structured data. Experiments on various public benchmarks show the effectiveness of our proposed KGMARK. Our code is available at https://github.com/phrara/kgmark.


CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

arXiv.org Artificial Intelligence

Prompt injection remains a major security risk for large language models. However, the efficacy of existing guardrail models in context-aware settings remains underexplored, as they often rely on static attack benchmarks. Additionally, they have over-defense tendencies. We introduce CAPTURE, a novel context-aware benchmark assessing both attack detection and over-defense tendencies with minimal in-domain examples. Our experiments reveal that current prompt injection guardrail models suffer from high false negatives in adversarial cases and excessive false positives in benign scenarios, highlighting critical limitations. To demonstrate our framework's utility, we train CaptureGuard on our generated data. This new model drastically reduces both false negative and false positive rates on our context-aware datasets while also generalizing effectively to external benchmarks, establishing a path toward more robust and practical prompt injection defenses.


A Model-Mediated Stacked Ensemble Approach for Depression Prediction Among Professionals

arXiv.org Artificial Intelligence

Depression is a significant mental health concern, particularly in professional environments where work-related stress, financial pressure, and lifestyle imbalances contribute to deteriorating well-being. Despite increasing awareness, researchers and practitioners face critical challenges in developing accurate and generalizable predictive models for mental health disorders. Traditional classification approaches often struggle with the complexity of depression, as it is influenced by multifaceted, interdependent factors, including occupational stress, sleep patterns, and job satisfaction. This study addresses these challenges by proposing a stacking-based ensemble learning approach to improve the predictive accuracy of depression classification among professionals. The Depression Professional Dataset has been collected from Kaggle. The dataset comprises demographic, occupational, and lifestyle attributes that influence mental well-being. Our stacking model integrates multiple base learners with a logistic regression-mediated model, effectively capturing diverse learning patterns. The experimental results demonstrate that the proposed model achieves high predictive performance, with an accuracy of 99.64% on training data and 98.75% on testing data, with precision, recall, and F1-score all exceeding 98%. These findings highlight the effectiveness of ensemble learning in mental health analytics and underscore its potential for early detection and intervention strategies.


Fair for a few: Improving Fairness in Doubly Imbalanced Datasets

arXiv.org Artificial Intelligence

With the technological advancements of the last couple of decades, machine learning (ML) and artificial intelligence (AI) play an important part in automated decision-making pipelines [1-3]. Even though these tools are generally created by optimising with respect to their accuracy and performance, there are other important aspects that should be considered, such as their fairness, robustness, and privacy [4]. One of these aspects, fairness, becomes even more crucial when AI-based tools are used for decision-making tasks such as checking whether accepting a credit application is profitable and risk-free, if an applicant is worthy of a job position, or if a defendant has a higher risk of committing a crime again.


Pose State Perception of Interventional Robot for Cardio-cerebrovascular Procedures

arXiv.org Artificial Intelligence

-- In response to the increasing demand for cardio-cerebrovascular interventional surgeries, precise control of in-terventional robots has become increasingly important. Within these complex vascular scenarios, the accurate and reliable perception of the pose state for interventional robots is particularly crucial. This paper presents a novel vision-based approach without the need of additional sensors or markers. The core of this paper's method consists of a three-part framework: firstly, a dual-head multitask U-Net model for simultaneous vessel segment and interventional robot detection; secondly, an advanced algorithm for skeleton extraction and optimization; and finally, a comprehensive pose state perception system based on geometric features is implemented to accurately identify the robot's pose state and provide strategies for subsequent control. The experimental results demonstrate the proposed method's high reliability and accuracy in trajectory tracking and pose state perception. I. INTRODUCTION The cardiovascular system plays a vital role in supplying oxygenated blood and essential nutrients to the heart and brain.


Evaluating Loss Functions for Graph Neural Networks: Towards Pretraining and Generalization

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) became useful for learning on non-Euclidean data. However, their best performance depends on choosing the right model architecture and the training objective, also called the loss function. Researchers have studied these parts separately, but a large-scale evaluation has not looked at how GNN models and many loss functions work together across different tasks. To fix this, we ran a thorough study - it included seven well-known GNN architectures. We also used a large group of 30 single plus mixed loss functions. The study looked at both inductive and transductive settings. Our evaluation spanned three distinct real-world datasets, assessing performance in both inductive and transductive settings using 21 comprehensive evaluation metrics. From these extensive results (detailed in supplementary information 1 \& 2), we meticulously analyzed the top ten model-loss combinations for each metric based on their average rank. Our findings reveal that, especially for the inductive case: 1) Hybrid loss functions generally yield superior and more robust performance compared to single loss functions, indicating the benefit of multi-objective optimization. 2) The GIN architecture always showed the highest-level average performance, especially with Cross-Entropy loss. 3) Although some combinations had overall lower average ranks, models such as GAT, particularly with certain hybrid losses, demonstrated incredible specialized strengths, maximizing the most top-1 results among the individual metrics, emphasizing subtle strengths for particular task demands. 4) On the other hand, the MPNN architecture typically lagged behind the scenarios it was tested against.