Performance Analysis
Decision-centric fairness: Evaluation and optimization for resource allocation problems
De Vos, Simon, Van Belle, Jente, Algaba, Andres, Verbeke, Wouter, Verboven, Sam
Data-driven decision support tools play an increasingly central role in decision-making across various domains. In this work, we focus on binary classification models for predicting positive-outcome scores and deciding on resource allocation, e.g., credit scores for granting loans or churn propensity scores for targeting customers with a retention campaign. Such models may exhibit discriminatory behavior toward specific demographic groups through their predicted scores, potentially leading to unfair resource allocation. We focus on demographic parity as a fairness metric to compare the proportions of instances that are selected based on their positive outcome scores across groups. In this work, we propose a decision-centric fairness methodology that induces fairness only within the decision-making region -- the range of relevant decision thresholds on the score that may be used to decide on resource allocation -- as an alternative to a global fairness approach that seeks to enforce parity across the entire score distribution. By restricting the induction of fairness to the decision-making region, the proposed decision-centric approach avoids imposing overly restrictive constraints on the model, which may unnecessarily degrade the quality of the predicted scores. We empirically compare our approach to a global fairness approach on multiple (semi-synthetic) datasets to identify scenarios in which focusing on fairness where it truly matters, i.e., decision-centric fairness, proves beneficial.
RV-Syn: Rational and Verifiable Mathematical Reasoning Data Synthesis based on Structured Function Library
Wang, Jiapeng, Jiang, Jinhao, Zhang, Zhiqiang, Zhou, Jun, Zhao, Wayne Xin
The advancement of reasoning capabilities in Large Language Models (LLMs) requires substantial amounts of high-quality reasoning data, particularly in mathematics. Existing data synthesis methods, such as data augmentation from annotated training sets or direct question generation based on relevant knowledge points and documents, have expanded datasets but face challenges in mastering the inner logic of the problem during generation and ensuring the verifiability of the solutions. To address these issues, we propose RV-Syn, a novel Rational and Verifiable mathematical Synthesis approach. RV-Syn constructs a structured mathematical operation function library based on initial seed problems and generates computational graphs as solutions by combining Python-formatted functions from this library. These graphs are then back-translated into complex problems. Based on the constructed computation graph, we achieve solution-guided logic-aware problem generation. Furthermore, the executability of the computational graph ensures the verifiability of the solving process. Experimental results show that RV-Syn surpasses existing synthesis methods, including those involving human-generated problems, achieving greater efficient data scaling. This approach provides a scalable framework for generating high-quality reasoning datasets.
AKIBoards: A Structure-Following Multiagent System for Predicting Acute Kidney Injury
Gordon, David, Petousis, Panayiotis, Nicholas, Susanne B., Bui, Alex A. T.
Diagnostic reasoning entails a physician's local (mental) model based on an assumed or known shared perspective (global model) to explain patient observations with evidence assigned towards a clinical assessment. But in several (complex) medical situations, multiple experts work together as a team to optimize health evaluation and decision-making by leveraging different perspectives. Such consensus-driven reasoning reflects individual knowledge contributing toward a broader perspective on the patient. In this light, we introduce STRUCture-following for Multiagent Systems (STRUC-MAS), a framework automating the learning of these global models and their incorporation as prior beliefs for agents in multiagent systems (MAS) to follow. We demonstrate proof of concept with a prosocial MAS application for predicting acute kidney injuries (AKIs). In this case, we found that incorporating a global structure enabled multiple agents to achieve better performance (average precision, AP) in predicting AKI 48 hours before onset (structure-following-fine-tuned, SF-FT, AP=0.195; SF-FT-retrieval-augmented generation, SF-FT-RAG, AP=0.194) vs. baseline (non-structure-following-FT, NSF-FT, AP=0.141; NSF-FT-RAG, AP=0.180) for balanced precision-weighted-recall-weighted voting. Markedly, SF-FT agents with higher recall scores reported lower confidence levels in the initial round on true positive and false negative cases. But after explicit interactions, their confidence in their decisions increased (suggesting reinforced belief). In contrast, the SF-FT agent with the lowest recall decreased its confidence in true positive and false negative cases (suggesting a new belief). This approach suggests that learning and leveraging global structures in MAS is necessary prior to achieving competitive classification and diagnostic reasoning performance.
Improving trajectory continuity in drone-based crowd monitoring using a set of minimal-cost techniques and deep discriminative correlation filters
Drone-based crowd monitoring is the key technology for applications in surveillance, public safety, and event management. However, maintaining tracking continuity and consistency remains a significant challenge. Traditional detection-assignment tracking methods struggle with false positives, false negatives, and frequent identity switches, leading to degraded counting accuracy and making in-depth analysis impossible. This paper introduces a point-oriented online tracking algorithm that improves trajectory continuity and counting reliability in drone-based crowd monitoring. Our method builds on the Simple Online and Real-time Tracking (SORT) framework, replacing the original bounding-box assignment with a point-distance metric. The algorithm is enhanced with three cost-effective techniques: camera motion compensation, altitude-aware assignment, and classification-based trajectory validation. Further, Deep Discriminative Correlation Filters (DDCF) that re-use spatial feature maps from localisation algorithms for increased computational efficiency through neural network resource sharing are integrated to refine object tracking by reducing noise and handling missed detections. The proposed method is evaluated on the DroneCrowd and newly shared UP-COUNT-TRACK datasets, demonstrating substantial improvements in tracking metrics, reducing counting errors to 23% and 15%, respectively. The results also indicate a significant reduction of identity switches while maintaining high tracking accuracy, outperforming baseline online trackers and even an offline greedy optimisation method.
Pediatric Asthma Detection with Googles HeAR Model: An AI-Driven Respiratory Sound Classifier
Ehtesham, Abul, Kumar, Saket, Singh, Aditi, Khoei, Tala Talaei
Early detection of asthma in children is crucial to prevent long-term respiratory complications and reduce emergency interventions. This work presents an AI-powered diagnostic pipeline that leverages Googles Health Acoustic Representations (HeAR) model to detect early signs of asthma from pediatric respiratory sounds. The SPRSound dataset, the first open-access collection of annotated respiratory sounds in children aged 1 month to 18 years, is used to extract 2-second audio segments labeled as wheeze, crackle, rhonchi, stridor, or normal. Each segment is embedded into a 512-dimensional representation using HeAR, a foundation model pretrained on 300 million health-related audio clips, including 100 million cough sounds. Multiple classifiers, including SVM, Random Forest, and MLP, are trained on these embeddings to distinguish between asthma-indicative and normal sounds. The system achieves over 91\% accuracy, with strong performance on precision-recall metrics for positive cases. In addition to classification, learned embeddings are visualized using PCA, misclassifications are analyzed through waveform playback, and ROC and confusion matrix insights are provided. This method demonstrates that short, low-resource pediatric recordings, when powered by foundation audio models, can enable fast, noninvasive asthma screening. The approach is especially promising for digital diagnostics in remote or underserved healthcare settings.
Transforming Evidence Synthesis: A Systematic Review of the Evolution of Automated Meta-Analysis in the Age of AI
Li, Lingbo, Mathrani, Anuradha, Susnjak, Teo
Exponential growth in scientific literature has heightened the demand for efficient evidence-based synthesis, driving the rise of the field of Automated Meta-analysis (AMA) powered by natural language processing and machine learning. This PRISMA systematic review introduces a structured framework for assessing the current state of AMA, based on screening 978 papers from 2006 to 2024, and analyzing 54 studies across diverse domains. Findings reveal a predominant focus on automating data processing (57%), such as extraction and statistical modeling, while only 17% address advanced synthesis stages. Just one study (2%) explored preliminary full-process automation, highlighting a critical gap that limits AMA's capacity for comprehensive synthesis. Despite recent breakthroughs in large language models (LLMs) and advanced AI, their integration into statistical modeling and higher-order synthesis, such as heterogeneity assessment and bias evaluation, remains underdeveloped. This has constrained AMA's potential for fully autonomous meta-analysis. From our dataset spanning medical (67%) and non-medical (33%) applications, we found that AMA has exhibited distinct implementation patterns and varying degrees of effectiveness in actually improving efficiency, scalability, and reproducibility. While automation has enhanced specific meta-analytic tasks, achieving seamless, end-to-end automation remains an open challenge. As AI systems advance in reasoning and contextual understanding, addressing these gaps is now imperative. Future efforts must focus on bridging automation across all meta-analysis stages, refining interpretability, and ensuring methodological robustness to fully realize AMA's potential for scalable, domain-agnostic synthesis.
Self-Healing Software Systems: Lessons from Nature, Powered by AI
Baqar, Mohammad, Khanda, Rajat, Naqvi, Saba
As modern software systems grow in complexity and scale, their ability to autonomously detect, diagnose, and recover from failures becomes increasingly vital. Drawing inspiration from biological healing--where the human body detects damage, signals the brain, and activates targeted recovery--this paper explores the concept of self-healing software driven by artificial intelligence. We propose a novel framework that mimics this biological model: system observability tools serve as sensory inputs, AI models function as the cognitive core for diagnosis and repair, and healing agents apply targeted code and test modifications. By combining log analysis, static code inspection, and AI-driven generation of patches or test updates, our approach aims to reduce downtime, accelerate debugging, and enhance software resilience. We evaluate the effectiveness of this model through case studies and simulations, comparing it against traditional manual debugging and recovery workflows. This work paves the way toward intelligent, adaptive, and self-reliant software systems capable of continuous healing, akin to living organisms.
DNAD: Differentiable Neural Architecture Distillation
Rao, Xuan, Zhao, Bo, Liu, Derong
To meet the demand for designing efficient neural networks with appropriate trade-offs between model performance (e.g., classification accuracy) and computational complexity, the differentiable neural architecture distillation (DNAD) algorithm is developed based on two cores, namely search by deleting and search by imitating. Primarily, to derive neural architectures in a space where cells of the same type no longer share the same topology, the super-network progressive shrinking (SNPS) algorithm is developed based on the framework of differentiable architecture search (DARTS), i.e., search by deleting. Unlike conventional DARTS-based approaches which yield neural architectures with simple structures and derive only one architecture during the search procedure, SNPS is able to derive a Pareto-optimal set of architectures with flexible structures by forcing the dynamic super-network shrink from a dense structure to a sparse one progressively. Furthermore, since knowledge distillation (KD) has shown great effectiveness to train a compact network with the assistance of an over-parameterized model, we integrate SNPS with KD to formulate the DNAD algorithm, i.e., search by imitating. By minimizing behavioral differences between the super-network and teacher network, the over-fitting of one-level DARTS is avoided and well-performed neural architectures are derived. Experiments on CIFAR-10 and ImageNet classification tasks demonstrate that both SNPS and DNAD are able to derive a set of architectures which achieve similar or lower error rates with fewer parameters and FLOPs. Particularly, DNAD achieves the top-1 error rate of 23.7% on ImageNet classification with a model of 6.0M parameters and 598M FLOPs, which outperforms most DARTS-based methods.
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution
Bongini, Pietro, Mandelli, Sara, Montibeller, Andrea, Casu, Mirko, Pontorno, Orazio, Ragaglia, Claudio Vittorio, Zanchetta, Luca, Aquilina, Mattia, Wani, Taiba Majid, Guarnera, Luca, Tondi, Benedetta, Boato, Giulia, Bestagini, Paolo, Amerini, Irene, De Natale, Francesco, Battiato, Sebastiano, Barni, Mauro
Synthetic image source attribution is an open challenge, with an increasing number of image generators being released yearly. The complexity and the sheer number of available generative techniques, as well as the scarcity of high-quality open source datasets of diverse nature for this task, make training and benchmarking synthetic image source attribution models very challenging. WILD is a new in-the-Wild Image Linkage Dataset designed to provide a powerful training and benchmarking tool for synthetic image attribution models. The dataset is built out of a closed set of 10 popular commercial generators, which constitutes the training base of attribution models, and an open set of 10 additional generators, simulating a real-world in-the-wild scenario. Each generator is represented by 1,000 images, for a total of 10,000 images in the closed set and 10,000 images in the open set. Half of the images are post-processed with a wide range of operators. WILD allows benchmarking attribution models in a wide range of tasks, including closed and open set identification and verification, and robust attribution with respect to post-processing and adversarial attacks. Models trained on WILD are expected to benefit from the challenging scenario represented by the dataset itself. Moreover, an assessment of seven baseline methodologies on closed and open set attribution is presented, including robustness tests with respect to post-processing.
Interpretable machine learning-guided design of Fe-based soft magnetic alloys
Nachnani, Aditi, Li-Caldwell, Kai K., Biswas, Saptarshi, Sharma, Prince, Ouyang, Gaoyuan, Singh, Prashant
We present a machine-learning guided approach to predict saturation magnetization (MS) and coercivity (HC) in Fe-rich soft magnetic alloys, particularly Fe-Si-B systems. ML models trained on experimental data reveals that increasing Si and B content reduces MS from 1.81T (DFT~2.04 T) to ~1.54 T (DFT~1.56T) in Fe-Si-B, which is attributed to decreased magnetic density and structural modifications. Experimental validation of ML predicted magnetic saturation on Fe-1Si-1B (2.09T), Fe-5Si-5B (2.01T) and Fe-10Si-10B (1.54T) alloy compositions further support our findings. These trends are consistent with density functional theory (DFT) predictions, which link increased electronic disorder and band broadening to lower MS values. Experimental validation on selected alloys confirms the predictive accuracy of the ML model, with good agreement across compositions. Beyond predictive accuracy, detailed uncertainty quantification and model interpretability including through feature importance and partial dependence analysis reveals that MS is governed by a nonlinear interplay between Fe content, early transition metal ratios, and annealing temperature, while HC is more sensitive to processing conditions such as ribbon thickness and thermal treatment windows. The ML framework was further applied to Fe-Si-B/Cr/Cu/Zr/Nb alloys in a pseudo-quaternary compositional space, which shows comparable magnetic properties to NANOMET (Fe84.8Si0.5B9.4Cu0.8 P3.5C1), FINEMET (Fe73.5Si13.5B9 Cu1Nb3), NANOPERM (Fe88Zr7B4Cu1), and HITPERM (Fe44Co44Zr7B4Cu1. Our fundings demonstrate the potential of ML framework for accelerated search of high-performance, Co- and Ni-free, soft magnetic materials.