Performance Analysis
Artificial Intelligence Based Predictive Maintenance for Electric Buses
Ercevik, Ayse Irmak, Ozbayoglu, Ahmet Murat
Predictive maintenance (PdM) is crucial for optimizing efficiency and minimizing downtime of electric buses. While these vehicles provide environmental benefits, they pose challenges for PdM due to complex electric transmission and battery systems. Traditional maintenance, often based on scheduled inspections, struggles to capture anomalies in multi-dimensional real-time CAN Bus data. This study employs a graph-based feature selection method to analyze relationships among CAN Bus parameters of electric buses and investigates the prediction performance of targeted alarms using artificial intelligence techniques. The raw data collected over two years underwent extensive preprocessing to ensure data quality and consistency. A hybrid graph-based feature selection tool was developed by combining statistical filtering (Pearson correlation, Cramer's V, ANOVA F-test) with optimization-based community detection algorithms (InfoMap, Leiden, Louvain, Fast Greedy). Machine learning models, including SVM, Random Forest, and XGBoost, were optimized through grid and random search with data balancing via SMOTEEN and binary search-based down-sampling. Model interpretability was achieved using LIME to identify the features influencing predictions. The results demonstrate that the developed system effectively predicts vehicle alarms, enhances feature interpretability, and supports proactive maintenance strategies aligned with Industry 4.0 principles.
On the Societal Impact of Machine Learning
This PhD thesis investigates the societal impact of machine learning (ML). ML increasingly informs consequential decisions and recommendations, significantly affecting many aspects of our lives. As these data-driven systems are often developed without explicit fairness considerations, they carry the risk of discriminatory effects. The contributions in this thesis enable more appropriate measurement of fairness in ML systems, systematic decomposition of ML systems to anticipate bias dynamics, and effective interventions that reduce algorithmic discrimination while maintaining system utility. I conclude by discussing ongoing challenges and future research directions as ML systems, including generative artificial intelligence, become increasingly integrated into society. This work offers a foundation for ensuring that ML's societal impact aligns with broader social values.
RoGBot: Relationship-Oblivious Graph-based Neural Network with Contextual Knowledge for Bot Detection
Anshul, Ashutosh, Rehman, Mohammad Zia Ur, Kadali, Sri Akash, Kumar, Nagendra
Abstract--Detecting automated accounts (bots) among genuine users on platforms like Twitter remains a challenging task due to the evolving behaviors and adaptive strategies of such accounts. While recent methods have achieved strong detection performance by combining text, metadata, and user relationship information within graph-based frameworks, many of these models heavily depend on explicit user-user relationship data. This reliance limits their applicability in scenarios where such information is unavailable. T o address this limitation, we propose a novel multimodal framework that integrates detailed textual features with enriched user metadata while employing graph-based reasoning without requiring follower-following data. Our method uses transformer-based models (e.g., BERT) to extract deep semantic embeddings from tweets, which are aggregated using max pooling to form comprehensive user-level representations. These are further combined with auxiliary behavioral features and passed through a GraphSAGE model to capture both local and global patterns in user behavior . Experimental results on the Cresci-15, Cresci-17, and PAN 2019 datasets demonstrate the robustness of our approach, achieving accuracies of 99.8%, 99.1%, and 96.8%, respectively, and highlighting its effectiveness against increasingly sophisticated bot strategies. Social media platforms have become essential tools for communication, sharing news, and fostering public engagement. However, the increasing presence of automated accounts, commonly known as bots, has raised serious concerns.
Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning
Jing, Zihao, Sun, Yan, Li, Yan Yi, Janarthanan, Sugitha, Deng, Alana, Hu, Pingzhao
Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization. We propose MuMo, a structured multimodal fusion framework that addresses these challenges in molecular representation through two key strategies. To reduce the instability of conformer-dependent fusion, we design a Structured Fusion Pipeline (SFP) that combines 2D topology and 3D geometry into a unified and stable structural prior. To mitigate modality collapse caused by naive fusion, we introduce a Progressive Injection (PI) mechanism that asymmetrically integrates this prior into the sequence stream, preserving modality-specific modeling while enabling cross-modal enrichment. Built on a state space backbone, MuMo supports long-range dependency modeling and robust information propagation. Across 29 benchmark tasks from Therapeutics Data Commons (TDC) and MoleculeNet, MuMo achieves an average improvement of 2.7% over the best-performing baseline on each task, ranking first on 22 of them, including a 27% improvement on the LD50 task. These results validate its robustness to 3D conformer noise and the effectiveness of multimodal fusion in molecular representation. The code is available at: github.com/selmiss/MuMo.
Adversarially-Aware Architecture Design for Robust Medical AI Systems
Gerhart, Alyssa, Iyangar, Balaji
Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These attacks, often imperceptible to human perception, threaten patient safety, particularly in underserved populations. Our study explores these vulnerabilities through empirical experimentation on a dermatological dataset, where adversarial methods significantly reduce classification accuracy. Through detailed threat modeling, experimental benchmarking, and model evaluation, we demonstrate both the severity of the threat and the partial success of defenses like adversarial training and distillation. Our results show that while defenses reduce attack success rates, they must be balanced against model performance on clean data. We conclude with a call for integrated technical, ethical, and policy-based approaches to build more resilient, equitable AI in healthcare.
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles
Mehrotra, Siddharth, Huang, Jin, Fu, Xuelong, Dobbe, Roel, Sรกnchez, Clara I., de Rijke, Maarten
Background: Trustworthy AI serves as a foundational pillar for two major AI ethics conferences: AIES and FAccT. However, current research often adopts techno-centric approaches, focusing primarily on technical attributes such as reliability, robustness, and fairness, while overlooking the sociotechnical dimensions critical to understanding AI trustworthiness in real-world contexts. Objectives: This scoping review aims to examine how the AIES and FAccT communities conceptualize, measure, and validate AI trustworthiness, identifying major gaps and opportunities for advancing a holistic understanding of trustworthy AI systems. Methods: We conduct a scoping review of AIES and FAccT conference proceedings to date, systematically analyzing how trustworthiness is defined, operationalized, and applied across different research domains. Our analysis focuses on conceptualization approaches, measurement methods, verification and validation techniques, application areas, and underlying values. Results: While significant progress has been made in defining technical attributes such as transparency, accountability, and robustness, our findings reveal critical gaps. Current research often predominantly emphasizes technical precision at the expense of social and ethical considerations. The sociotechnical nature of AI systems remains less explored and trustworthiness emerges as a contested concept shaped by those with the power to define it. Conclusions: An interdisciplinary approach combining technical rigor with social, cultural, and institutional considerations is essential for advancing trustworthy AI. We propose actionable measures for the AI ethics community to adopt holistic frameworks that genuinely address the complex interplay between AI systems and society, ultimately promoting responsible technological development that benefits all stakeholders.
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
Wu, Yu, Shu, Ke, Fischer, Jonas, Pivovarova, Lidia, Rosson, David, Mรคkelรค, Eetu, Tolonen, Mikko
This paper presents a novel task of extracting Latin fragments from mixed-language historical documents with varied layouts. We benchmark and evaluate the performance of large foundation models against a multimodal dataset of 724 annotated pages. The results demonstrate that reliable Latin detection with contemporary models is achievable. Our study provides the first comprehensive analysis of these models' capabilities and limits for this task.
Detecting Sockpuppetry on Wikipedia Using Meta-Learning
Raszewski, Luc, De Kock, Christine
Malicious sockpuppet detection on Wikipedia is critical to preserving access to reliable information on the internet and preventing the spread of disinformation. Prior machine learning approaches rely on stylistic and meta-data features, but do not prioritise adaptability to author-specific behaviours. As a result, they struggle to effectively model the behaviour of specific sockpuppet-groups, especially when text data is limited. To address this, we propose the application of meta-learning, a machine learning technique designed to improve performance in data-scarce settings by training models across multiple tasks. Meta-learning optimises a model for rapid adaptation to the writing style of a new sockpuppet-group. Our results show that meta-learning significantly enhances the precision of predictions compared to pre-trained models, marking an advancement in combating sockpuppetry on open editing platforms. We release a new dataset of sockpuppet investigations to foster future research in both sockpuppetry and meta-learning fields.
Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning
Tang, Liou, Joshi, James, Kundu, Ashish
Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the original ML model from scratch. While MU itself has been employed to provide privacy protection and regulatory compliance, it can also increase the attack surface of the model. Existing privacy inference attacks towards MU that aim to infer properties of the unlearned set rely on the weaker threat model that assumes the attacker has access to both the unlearned model and the original model, limiting their feasibility toward real-life scenarios. We propose a novel privacy attack, A Posteriori Label-Only Membership Inference Attack towards MU, Apollo, that infers whether a data sample has been unlearned, following a strict threat model where an adversary has access to the label-output of the unlearned model only. We demonstrate that our proposed attack, while requiring less access to the target model compared to previous attacks, can achieve relatively high precision on the membership status of the unlearned samples.
Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis
Shi, Enze, Bhagwat, Pankaj, Yang, Zhixian, Kong, Linglong, Jiang, Bei
Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constraints at the prediction level, without addressing underlying biases in data representations. In this work, we propose a principled framework that adjusts data representations to balance predictive utility and fairness. Using sufficient dimension reduction, we decompose the feature space into target-relevant, sensitive, and shared components, and control the fairness-utility trade-off by selectively removing sensitive information. We provide a theoretical analysis of how prediction error and fairness gaps evolve as shared subspaces are added, and employ influence functions to quantify their effects on the asymptotic behavior of parameter estimates. Experiments on both synthetic and real-world datasets validate our theoretical insights and show that the proposed method effectively improves fairness while preserving predictive performance.