Oceania
Hyperspectral in situ remote sensing of water surface nitrate in the Fitzroy River estuary, Queensland, Australia, using deep learning
Guo, Yiqing, Cherukuru, Nagur, Lehmann, Eric, Unnithan, S. L. Kesav, Kerrisk, Gemma, Malthus, Tim, Islam, Faisal
Nitrate ($\text{NO}_3^-$) is a form of dissolved inorganic nitrogen derived primarily from anthropogenic sources. The recent increase in river-discharged nitrate poses a major risk for coral bleaching in the Great Barrier Reef (GBR) lagoon. Although nitrate is an optically inactive (i.e., colourless) constituent, previous studies have demonstrated there is an indirect, non-causal relationship between water surface nitrate and water-leaving reflectance that is mediated through optically active water quality parameters such as total suspended solids and coloured dissolved organic matter. This work aims to advance our understanding of this relationship with an effort to measure time-series nitrate and simultaneous hyperspectral reflectance at the Fitzroy River estuary, Queensland, Australia. Time-series observations revealed periodic cycles in nitrate loads due to the tidal influence in the estuarine study site. The water surface nitrate loads were predicted from hyperspectral reflectance and water salinity measurements, with hyperspectral reflectance indicating the concentrations of optically active variables and salinity indicating the mixing of river water and seawater proportions. The accuracy assessment of model-predicted nitrate against in-situ measured nitrate values showed that the predicted nitrate values correlated well with the ground-truth data, with an $R^2$ score of 0.86, and an RMSE of 0.03 mg/L. This work demonstrates the feasibility of predicting water surface nitrate from hyperspectral reflectance and salinity measurements.
keepitsimple at SemEval-2025 Task 3: LLM-Uncertainty based Approach for Multilingual Hallucination Span Detection
Vemula, Saketh Reddy, Krishnamurthy, Parameswari
Identification of hallucination spans in black-box language model generated text is essential for applications in the real world. A recent attempt at this direction is SemEval-2025 Task 3, Mu-SHROOM-a Multilingual Shared Task on Hallucinations and Related Observable Over-generation Errors. In this work, we present our solution to this problem, which capitalizes on the variability of stochastically-sampled responses in order to identify hallucinated spans. Our hypothesis is that if a language model is certain of a fact, its sampled responses will be uniform, while hallucinated facts will yield different and conflicting results. We measure this divergence through entropy-based analysis, allowing for accurate identification of hallucinated segments. Our method is not dependent on additional training and hence is cost-effective and adaptable. In addition, we conduct extensive hyperparameter tuning and perform error analysis, giving us crucial insights into model behavior.
Liouville PDE-based sliced-Wasserstein flow for fair regression
The sliced Wasserstein flow (SWF), a nonparametric and implicit generative gradient flow, is applied to fair regression. We have improved the SWF in a few aspects. First, the stochastic diffusive term from the Fokker-Planck equation-based Monte Carlo is transformed to Liouville partial differential equation (PDE)-based transport with density estimation, however, without the diffusive term. Now, the computation of the Wasserstein barycenter is approximated by the SWF barycenter with the prescription of Kantorovich potentials for the induced gradient flow to generate its samples. These two efforts improve the convergence in training and testing SWF and SWF barycenters with reduced variance. Applying the generative SWF barycenter for fair regression demonstrates competent profiles in the accuracy-fairness Pareto curves.
Discrete Neural Flow Samplers with Locally Equivariant Transformer
Ou, Zijing, Zhang, Ruixiang, Li, Yingzhen
Sampling from unnormalised discrete distributions is a fundamental problem across various domains. While Markov chain Monte Carlo offers a principled approach, it often suffers from slow mixing and poor convergence. In this paper, we propose Discrete Neural Flow Samplers (DNFS), a trainable and efficient framework for discrete sampling. DNFS learns the rate matrix of a continuous-time Markov chain such that the resulting dynamics satisfy the Kolmogorov equation. As this objective involves the intractable partition function, we then employ control variates to reduce the variance of its Monte Carlo estimation, leading to a coordinate descent learning algorithm. To further facilitate computational efficiency, we propose locally equivaraint Transformer, a novel parameterisation of the rate matrix that significantly improves training efficiency while preserving powerful network expressiveness. Empirically, we demonstrate the efficacy of DNFS in a wide range of applications, including sampling from unnormalised distributions, training discrete energy-based models, and solving combinatorial optimisation problems.
Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures
Thai, Tuan, Nguyen, TrungTin, Do, Dat, Ho, Nhat, Drovandi, Christopher
Mixture of Experts (MoE) models constitute a widely utilized class of ensemble learning approaches in statistics and machine learning, known for their flexibility and computational efficiency. They have become integral components in numerous state-of-the-art deep neural network architectures, particularly for analyzing heterogeneous data across diverse domains. Despite their practical success, the theoretical understanding of model selection, especially concerning the optimal number of mixture components or experts, remains limited and poses significant challenges. These challenges primarily stem from the inclusion of covariates in both the Gaussian gating functions and expert networks, which introduces intrinsic interactions governed by partial differential equations with respect to their parameters. In this paper, we revisit the concept of dendrograms of mixing measures and introduce a novel extension to Gaussian-gated Gaussian MoE models that enables consistent estimation of the true number of mixture components and achieves the pointwise optimal convergence rate for parameter estimation in overfitted scenarios. Notably, this approach circumvents the need to train and compare a range of models with varying numbers of components, thereby alleviating the computational burden, particularly in high-dimensional or deep neural network settings. Experimental results on synthetic data demonstrate the effectiveness of the proposed method in accurately recovering the number of experts. It outperforms common criteria such as the Akaike information criterion, the Bayesian information criterion, and the integrated completed likelihood, while achieving optimal convergence rates for parameter estimation and accurately approximating the regression function.
Quantifying uncertainty in spectral clusterings: expectations for perturbed and incomplete data
Dรถlz, Jรผrgen, Weygandt, Jolanda
Spectral clustering is a popular unsupervised learning technique which is able to partition unlabelled data into disjoint clusters of distinct shapes. However, the data under consideration are often experimental data, implying that the data is subject to measurement errors and measurements may even be lost or invalid. These uncertainties in the corrupted input data induce corresponding uncertainties in the resulting clusters, and the clusterings thus become unreliable. Modelling the uncertainties as random processes, we discuss a mathematical framework based on random set theory for the computational Monte Carlo approximation of statistically expected clusterings in case of corrupted, i.e., perturbed, incomplete, and possibly even additional, data. We propose several computationally accessible quantities of interest and analyze their consistency in the infinite data point and infinite Monte Carlo sample limit. Numerical experiments are provided to illustrate and compare the proposed quantities.
A Principled Bayesian Framework for Training Binary and Spiking Neural Networks
Walker, James A., Khajehnejad, Moein, Razi, Adeel
We propose a Bayesian framework for training binary and spiking neural networks that achieves state-of-the-art performance without normalisation layers. Unlike commonly used surrogate gradient methods -- often heuristic and sensitive to hyperparameter choices -- our approach is grounded in a probabilistic model of noisy binary networks, enabling fully end-to-end gradient-based optimisation. We introduce importance-weighted straight-through (IW-ST) estimators, a unified class generalising straight-through and relaxation-based estimators. We characterise the bias-variance trade-off in this family and derive a bias-minimising objective implemented via an auxiliary loss. Building on this, we introduce Spiking Bayesian Neural Networks (SBNNs), a variational inference framework that uses posterior noise to train Binary and Spiking Neural Networks with IW-ST. This Bayesian approach minimises gradient bias, regularises parameters, and introduces dropout-like noise. By linking low-bias conditions, vanishing gradients, and the KL term, we enable training of deep residual networks without normalisation. Experiments on CIFAR-10, DVS Gesture, and SHD show our method matches or exceeds existing approaches without normalisation or hand-tuned gradients.
Object-level Cross-view Geo-localization with Location Enhancement and Multi-Head Cross Attention
Huang, Zheyang, Aryal, Jagannath, Nahavandi, Saeid, Lu, Xuequan, Lim, Chee Peng, Wei, Lei, Zhou, Hailing
--Cross-view geo-localization determines the location of a query image, captured by a drone or ground-based camera, by matching it to a geo-referenced satellite image. While traditional approaches focus on image-level localization, many applications, such as search-and-rescue, infrastructure inspection, and precision delivery, demand object-level accuracy. This enables users to prompt a specific object with a single click on a drone image to retrieve precise geo-tagged information of the object. However, variations in viewpoints, timing, and imaging conditions pose significant challenges, especially when identifying visually similar objects in extensive satellite imagery. T o address these challenges, we propose an Object-level Cross-view Geo-localization Network (OCGNet). It integrates user-specified click locations using Gaussian Kernel Transfer (GKT) to preserve location information throughout the network. This cue is dually embedded into the feature encoder and feature matching blocks, ensuring robust object-specific localization. Additionally, OCGNet incorporates a Location Enhancement (LE) module and a Multi-Head Cross Attention (MHCA) module to adaptively emphasize object-specific features or expand focus to relevant contextual regions when necessary. It also demonstrates few-shot learning capabilities, effectively generalizing from limited examples, making it suitable for diverse applications (https://github.com/ZheyangH/OCGNet).
Signals from the Floods: AI-Driven Disaster Analysis through Multi-Source Data Fusion
Gong, Xian, McCarthy, Paul X., Tian, Lin, Rizoiu, Marian-Andrei
Massive and diverse web data are increasingly vital for government disaster response, as demonstrated by the 2022 floods in New South Wales (NSW), Australia. This study examines how X (formerly Twitter) and public inquiry submissions provide insights into public behaviour during crises. We analyse more than 55,000 flood-related tweets and 1,450 submissions to identify behavioural patterns during extreme weather events. While social media posts are short and fragmented, inquiry submissions are detailed, multi-page documents offering structured insights. Our methodology integrates Latent Dirichlet Allocation (LDA) for topic modelling with Large Language Models (LLMs) to enhance semantic understanding. LDA reveals distinct opinions and geographic patterns, while LLMs improve filtering by identifying flood-relevant tweets using public submissions as a reference. This Relevance Index method reduces noise and prioritizes actionable content, improving situ-ational awareness for emergency responders. By combining these complementary data streams, our approach introduces a novel AI-driven method to refine crisis-related social media content, improve real-time disaster response, and inform long-term resilience planning.
Constrained Non-negative Matrix Factorization for Guided Topic Modeling of Minority Topics
Ebrahimi, Seyedeh Fatemeh, Peltonen, Jaakko
Topic models often fail to capture low-prevalence, domain-critical themes, so-called minority topics, such as mental health themes in online comments. While some existing methods can incorporate domain knowledge, such as expected topical content, methods allowing guidance may require overly detailed expected topics, hindering the discovery of topic divisions and variation. We propose a topic modeling solution via a specially constrained NMF. We incorporate a seed word list characterizing minority content of interest, but we do not require experts to pre-specify their division across minority topics. Through prevalence constraints on minority topics and seed word content across topics, we learn distinct data-driven minority topics as well as majority topics. The constrained NMF is fitted via Karush-Kuhn-Tucker (KKT) conditions with multiplicative updates. We outperform several baselines on synthetic data in terms of topic purity, normalized mutual information, and also evaluate topic quality using Jensen-Shannon divergence (JSD). We conduct a case study on YouTube vlog comments, analyzing viewer discussion of mental health content; our model successfully identifies and reveals this domain-relevant minority content.