Goto

Collaborating Authors

 Oceania


AdaDPIGU: Differentially Private SGD with Adaptive Clipping and Importance-Based Gradient Updates for Deep Neural Networks

arXiv.org Machine Learning

Differential privacy has been proven effective for stochastic gradient descent; however, existing methods often suffer from performance degradation in high-dimensional settings, as the scale of injected noise increases with dimensionality. To tackle this challenge, we propose AdaDPIGU--a new differentially private SGD framework with importance-based gradient updates tailored for deep neural networks. In the pretraining stage, we apply a differentially private Gaussian mechanism to estimate the importance of each parameter while preserving privacy. During the gradient update phase, we prune low-importance coordinates and introduce a coordinate-wise adaptive clipping mechanism, enabling sparse and noise-efficient gradient updates. Theoretically, we prove that AdaDPIGU satisfies $(\varepsilon, δ)$-differential privacy and retains convergence guarantees. Extensive experiments on standard benchmarks validate the effectiveness of AdaDPIGU. All results are reported under a fixed retention ratio of 60%. On MNIST, our method achieves a test accuracy of 99.12% under a privacy budget of $ε= 8$, nearly matching the non-private model. Remarkably, on CIFAR-10, it attains 73.21% accuracy at $ε= 4$, outperforming the non-private baseline of 71.12%, demonstrating that adaptive sparsification can enhance both privacy and utility.


Instance-Wise Monotonic Calibration by Constrained Transformation

arXiv.org Machine Learning

Deep neural networks often produce miscalibrated probability estimates, leading to overconfident predictions. A common approach for calibration is fitting a post-hoc calibration map on unseen validation data that transforms predicted probabilities. A key desirable property of the calibration map is instance-wise monotonicity (i.e., preserving the ranking of probability outputs). However, most existing post-hoc calibration methods do not guarantee monotonicity. Previous monotonic approaches either use an under-parameterized calibration map with limited expressive ability or rely on black-box neural networks, which lack interpretability and robustness. In this paper, we propose a family of novel monotonic post-hoc calibration methods, which employs a constrained calibration map parameterized linearly with respect to the number of classes. Our proposed approach ensures expressiveness, robustness, and interpretability while preserving the relative ordering of the probability output by formulating the proposed calibration map as a constrained optimization problem. Our proposed methods achieve state-of-the-art performance across datasets with different deep neural network models, outperforming existing calibration methods while being data and computation-efficient. Our code is available at https://github.com/YunruiZhang/Calibration-by-Constrained-Transformation


Neural Canonical Polyadic Factorization for Traffic Analysis

arXiv.org Machine Learning

Modern intelligent transportation systems rely on accurate spatiotemporal traffic analysis to optimize urban mobility and infrastructure resilience. However, pervasive missing data caused by sensor failures and heterogeneous sensing gaps fundamentally hinders reliable traffic modeling. This paper proposes a Neural Canonical Polyadic Factorization (NCPF) model that synergizes low-rank tensor algebra with deep representation learning for robust traffic data imputation. The model innovatively embeds CP decomposition into neural architecture through learnable embedding projections, where sparse traffic tensors are encoded into dense latent factors across road segments, time intervals, and mobility metrics. A hierarchical feature fusion mechanism employs Hadamard products to explicitly model multilinear interactions, while stacked multilayer perceptron layers nonlinearly refine these representations to capture complex spatiotemporal couplings. Extensive evaluations on six urban traffic datasets demonstrate NCPF's superiority over six state-of-the-art baselines. By unifying CP decomposition's interpretable factor analysis with neural network's nonlinear expressive power, NCPF provides a principled yet flexible approaches for high-dimensional traffic data imputation, offering critical support for next-generation transportation digital twins and adaptive traffic control systems.


Distribution-free inference for LightGBM and GLM with Tweedie loss

arXiv.org Machine Learning

Prediction uncertainty quantification is a key research topic in recent years scientific and business problems. In insurance industries (\cite{parodi2023pricing}), assessing the range of possible claim costs for individual drivers improves premium pricing accuracy. It also enables insurers to manage risk more effectively by accounting for uncertainty in accident likelihood and severity. In the presence of covariates, a variety of regression-type models are often used for modeling insurance claims, ranging from relatively simple generalized linear models (GLMs) to regularized GLMs to gradient boosting models (GBMs). Conformal predictive inference has arisen as a popular distribution-free approach for quantifying predictive uncertainty under relatively weak assumptions of exchangeability, and has been well studied under the classic linear regression setting. In this work, we propose new non-conformity measures for GLMs and GBMs with GLM-type loss. Using regularized Tweedie GLM regression and LightGBM with Tweedie loss, we demonstrate conformal prediction performance with these non-conformity measures in insurance claims data. Our simulation results favor the use of locally weighted Pearson residuals for LightGBM over other methods considered, as the resulting intervals maintained the nominal coverage with the smallest average width.


Constella: Supporting Storywriters' Interconnected Character Creation through LLM-based Multi-Agents

arXiv.org Artificial Intelligence

Creating a cast of characters by attending to their relational dynamics is a critical aspect of most long-form storywriting. However, our formative study (N=14) reveals that writers struggle to envision new characters that could influence existing ones, to balance similarities and differences among characters, and to intricately flesh out their relationships. Based on these observations, we designed Constella, an LLM-based multi-agent tool that supports storywriters' interconnected character creation process. Constella suggests related characters (FRIENDS DISCOVERY feature), reveals the inner mindscapes of several characters simultaneously (JOURNALS feature), and manifests relationships through inter-character responses (COMMENTS feature). Our 7-8 day deployment study with storywriters (N=11) shows that Constella enabled the creation of expansive communities composed of related characters, facilitated the comparison of characters' thoughts and emotions, and deepened writers' understanding of character relationships. We conclude by discussing how multi-agent interactions can help distribute writers' attention and effort across the character cast.


Preemptive Solving of Future Problems: Multitask Preplay in Humans and Machines

arXiv.org Artificial Intelligence

Humans can pursue a near-infinite variety of tasks, but typically can only pursue a small number at the same time. We hypothesize that humans leverage experience on one task to preemptively learn solutions to other tasks that were accessible but not pursued. We formalize this idea as Multitask Preplay, a novel algorithm that replays experience on one task as the starting point for "preplay" -- counterfactual simulation of an accessible but unpursued task. Preplay is used to learn a predictive representation that can support fast, adaptive task performance later on. We first show that, compared to traditional planning and predictive representation methods, multitask preplay better predicts how humans generalize to tasks that were accessible but not pursued in a small grid-world, even when people didn't know they would need to generalize to these tasks. We then show these predictions generalize to Craftax, a partially observable 2D Minecraft environment. Finally, we show that Multitask Preplay enables artificial agents to learn behaviors that transfer to novel Craftax worlds sharing task co-occurrence structure. These findings demonstrate that Multitask Preplay is a scalable theory of how humans counterfactually learn and generalize across multiple tasks; endowing artificial agents with the same capacity can significantly improve their performance in challenging multitask environments.


Narrowing the Gap: Supervised Fine-Tuning of Open-Source LLMs as a Viable Alternative to Proprietary Models for Pedagogical Tools

arXiv.org Artificial Intelligence

Frontier Large language models (LLMs) like ChatGPT and Gemini can decipher cryptic compiler errors for novice programmers, but their computational scale, cost, and tendency to over-assist make them problematic for widespread pedagogical adoption. This work demonstrates that smaller, specialised language models, enhanced via Supervised Fine-Tuning (SFT), present a more viable alternative for educational tools. We utilise a new dataset of 40,000 C compiler error explanations, derived from real introductory programming (CS1/2) student-generated programming errors, which we used to fine-tune three open-source models: Qwen3-4B, Llama-3.1-8B, and Qwen3-32B. We performed a dual evaluation, combining expert human reviews with a large-scale automated analysis of 8,000 responses using a validated LLM-as-judge ensemble. Our results show that SFT significantly boosts the pedagogical quality of smaller models, achieving performance comparable to much larger models. We analyse the trade-offs between model size and quality, confirming that fine-tuning compact, efficient models on high-quality, domain-specific data is a potent strategy for creating specialised models to drive educational tools. We provide a replicable methodology to foster broader access to generative AI capabilities in educational contexts.


ASSURE: Metamorphic Testing for AI-powered Browser Extensions

arXiv.org Artificial Intelligence

The integration of Large Language Models (LLMs) into browser extensions has revolutionized web browsing, enabling sophisticated functionalities like content summarization, intelligent translation, and context-aware writing assistance. However, these AI-powered extensions introduce unprecedented challenges in testing and reliability assurance. Traditional browser extension testing approaches fail to address the non-deterministic behavior, context-sensitivity, and complex web environment integration inherent to LLM-powered extensions. Similarly, existing LLM testing methodologies operate in isolation from browser-specific contexts, creating a critical gap in effective evaluation frameworks. To bridge this gap, we present ASSURE, a modular automated testing framework specifically designed for AI-powered browser extensions. ASSURE comprises three principal components: (1) a modular test case generation engine that supports plugin-based extension of testing scenarios, (2) an automated execution framework that orchestrates the complex interactions between web content, extension processing, and AI model behavior, and (3) a configurable validation pipeline that systematically evaluates behavioral consistency and security invariants rather than relying on exact output matching. Our evaluation across six widely-used AI browser extensions demonstrates ASSURE's effectiveness, identifying 531 distinct issues spanning security vulnerabilities, metamorphic relation violations, and content alignment problems. ASSURE achieves 6.4x improved testing throughput compared to manual approaches, detecting critical security vulnerabilities within 12.4 minutes on average. This efficiency makes ASSURE practical for integration into development pipelines, offering a comprehensive solution to the unique challenges of testing AI-powered browser extensions.


An autonomous agent for auditing and improving the reliability of clinical AI models

arXiv.org Artificial Intelligence

The deployment of AI models in clinical practice faces a critical challenge: models achieving expert-level performance on benchmarks can fail catastrophically when confronted with real-world variations in medical imaging. Minor shifts in scanner hardware, lighting or demographics can erode accuracy, but currently reliability auditing to identify such catastrophic failure cases before deployment is a bespoke and time-consuming process. Practitioners lack accessible and interpretable tools to expose and repair hidden failure modes. Here we introduce ModelAuditor, a self-reflective agent that converses with users, selects task-specific metrics, and simulates context-dependent, clinically relevant distribution shifts. ModelAuditor then generates interpretable reports explaining how much performance likely degrades during deployment, discussing specific likely failure modes and identifying root causes and mitigation strategies. Our comprehensive evaluation across three real-world clinical scenarios - inter-institutional variation in histopathology, demographic shifts in dermatology, and equipment heterogeneity in chest radiography - demonstrates that ModelAuditor is able correctly identify context-specific failure modes of state-of-the-art models such as the established SIIM-ISIC melanoma classifier. Its targeted recommendations recover 15-25% of performance lost under real-world distribution shift, substantially outperforming both baseline models and state-of-the-art augmentation methods. These improvements are achieved through a multi-agent architecture and execute on consumer hardware in under 10 minutes, costing less than US$0.50 per audit.


AI-Based Demand Forecasting and Load Balancing for Optimising Energy use in Healthcare Systems: A real case study

arXiv.org Artificial Intelligence

- This paper addresses the critical need for efficient energy management in healthcare facilities, where fluctuating energy demands challenge both operational and sustainability goals. Traditional energy management methods often fall short in healthcare settings, lead ing to inefficiencies and increased costs. To address this, the paper explores AI - driven approaches for demand forecasting and load balancing, introducing a novel integration of LSTM (Long Short - Term Memory), g enetic a lgorithm, and SHAP (Shapley Additive E xplanations) specifically tailored for healthcare energy management. While LSTM has been widely used for time - series forecasting, its application in healthcare energy demand prediction is underexplored. Here, LSTM is demonstrated to significantly outperfor m ARIMA and Prophet models in handling complex, non - linear demand patterns. Results show that LSTM achieved a Mean Absolute Error (MAE) of 21.69 and Root Mean Square Error (RMSE) of 29.96, significantly improving upon Prophet (MAE: 59.78, RMSE: 81.22) and ARIMA (MAE: 87.73, RMSE: 125.22), highlighting its superior forecasting capability. Genetic algorithm is employed not only for optimising forecasting model parameters but also for dynamically improving load balancing strategies, ensuring adaptability to real - time energy fluctuations. Additionally, SHAP analysis is used to interpret the models and understan d the impact of various input features on predictions, enhancing model transparency and trustworthiness in energy decision - making. The combined LSTM - GA - SH AP approach offers a comprehensive framework that improves forecasting accuracy, enhances energy efficiency, and supports sustainability in healthcare environments. Future work could focus on real - time implementation and further hybridisation with reinforc ement learning for continuous optimisation. This study establishes a strong foundation for leveraging AI in healthcare energy management, showcasing its potential for scalability, efficiency, and resilience. Introduction Australia has a big capacity of using renewable energy in different regions ( Holloway, R, 2023; Rahimi et al., 2025) . Australian healthcare system plays a major role in using renewable energies. Optimising energy use in healthcare systems is essential due to the high and often unpredictable energy demands needed to run medical equipment, keep environmental conditions stable, and support constant patient care.