Oceania
FedSKC: Federated Learning with Non-IID Data via Structural Knowledge Collaboration
Wang, Huan, Li, Haoran, Chen, Huaming, Yan, Jun, Wang, Lijuan, Shi, Jiahua, Chen, Shiping, Shen, Jun
--With the advancement of edge computing, federated learning (FL) displays a bright promise as a privacy-preserving collaborative learning paradigm. However, one major challenge for FL is the data heterogeneity issue, which refers to the biased labeling preferences among multiple clients, negatively impacting convergence and model performance. Most previous FL methods attempt to tackle the data heterogeneity issue locally or globally, neglecting underlying class-wise structure information contained in each client. In this paper, we first study how data heterogeneity affects the divergence of the model and decompose it into local, global, and sampling drift sub-problems. T o explore the potential of using intra-client class-wise structural knowledge in handling these drifts, we thus propose Federated Learning with Structural Knowledge Collaboration (FedSKC). The key idea of FedSKC is to extract and transfer domain preferences from inter-client data distributions, offering diverse class-relevant knowledge and a fair convergent signal. FedSKC comprises three components: i) local contrastive learning, to prevent weight divergence resulting from local training; ii) global discrepancy aggregation, which addresses the parameter deviation between the server and clients; iii) global period review, correcting for the sampling drift introduced by the server randomly selecting devices. We have theoretically analyzed FedSKC under non-convex objectives and empirically validated its superiority through extensive experimental results.
Agent-Based Decentralized Energy Management of EV Charging Station with Solar Photovoltaics via Multi-Agent Reinforcement Learning
Fan, Jiarong, Huang, Chenghao, Wang, Hao
In the pursuit of energy net zero within smart cities, transportation electrification plays a pivotal role. The adoption of Electric Vehicles (EVs) keeps increasing, making energy management of EV charging stations critically important. While previous studies have managed to reduce energy cost of EV charging while maintaining grid stability, they often overlook the robustness of EV charging management against uncertainties of various forms, such as varying charging behaviors and possible faults in faults in some chargers. To address the gap, a novel Multi-Agent Reinforcement Learning (MARL) approach is proposed treating each charger to be an agent and coordinate all the agents in the EV charging station with solar photovoltaics in a more realistic scenario, where system faults may occur. A Long Short-Term Memory (LSTM) network is incorporated in the MARL algorithm to extract temporal features from time-series. Additionally, a dense reward mechanism is designed for training the agents in the MARL algorithm to improve EV charging experience. Through validation on a real-world dataset, we show that our approach is robust against system uncertainties and faults and also effective in minimizing EV charging costs and maximizing charging service satisfaction.
ReflectGAN: Modeling Vegetation Effects for Soil Carbon Estimation from Satellite Imagery
Datta, Dristi, Paul, Manoranjan, Murshed, Manzur, Teng, Shyh Wei, Schmidtke, Leigh M.
--Soil organic carbon (SOC) is a critical indicator of soil health, but its accurate estimation from satellite imagery is hindered in vegetated regions due to spectral contamination from plant cover, which obscures soil reflectance and reduces model reliability. This study proposes the Reflectance Transformation Generative Adversarial Network (ReflectGAN), a novel paired GAN-based framework designed to reconstruct accurate bare soil reflectance from vegetated soil satellite observations. Using the LUCAS 2018 dataset and corresponding Landsat 8 imagery, we trained multiple learning-based models on both original and ReflectGAN-reconstructed reflectance inputs. Models trained on ReflectGAN outputs consistently outperformed those using existing vegetation correction methods. The performance of the models with ReflectGAN is also better compared to their counterparts when applied to another dataset, i.e., Sentinel-2 imagery. These findings demonstrate the potential of ReflectGAN to improve SOC estimation accuracy in vegetated landscapes, supporting more reliable soil monitoring. OIL organic carbon (SOC) is a fundamental indicator of soil health, influencing agricultural productivity, carbon sequestration, improved soil moisture retention and overall ecosystem sustainability. Accurate estimation of SOC is essential for promoting sustainable agriculture, improving soil management practices, and monitoring environmental changes [1], [2]. Traditional methods for estimating SOC rely on laboratory-based soil analyses, which, although precise, are labor-intensive, costly, and limited in spatial coverage [3], [4]. D. Datta and M. Paul are with the School of Computing, Mathematics, and Engineering, Charles Sturt University, Bathurst, NSW 2795, Australia, and also with the Cooperative Research Centre for High Performance Soils, Callaghan, NSW 2308, Australia (e-mail: ddatta@csu.edu.au; M. Murshed is with the School of Information Technology, Deakin University, Burwood, VIC 3125, Australia (e-mail: manzur.murshed@deakin.edu.au). S. W . Teng is with the Institute of Innovation, Science and Sustainability, Federation University, Mount Helen, VIC 3350, Australia, and also with the Cooperative Research Centre for High Performance Soils, Callaghan, NSW 2308, Australia (e-mail: s.w.teng@federation.edu.au). Laboratory-based hyperspectral imaging (HSI) provides a powerful tool for SOC estimation by offering high spatial and spectral resolution, enabling detailed analysis of soil properties without the need for destructive sampling [5]-[7]. Numerous studies have validated the effectiveness of HSI in accurately estimating SOC levels [7], [8]. However, the widespread deployment of HSI is constrained by the high cost of equipment and limited accessibility, making it impractical for large-scale applications.
FRAME-C: A knowledge-augmented deep learning pipeline for classifying multi-electrode array electrophysiological signals
Ranasinghe, Nisal, Do-Ha, Dzung, Maksour, Simon, Malepathirana, Tamasha, Seneviratne, Sachith, Ooi, Lezanne, Halgamuge, Saman
-- Amyotrophic lateral sclerosis (ALS) is a fatal neu-rodegenerative disorder characterized by motor neuron degeneration, with alterations in neural excitability serving as key indicators. Recent advancements in induced pluripotent stem cell (iPSC) technology have enabled the generation of human iPSC-derived neuronal cultures, which, when combined with multi-electrode array (MEA) electrophysiology, provide rich spatial and temporal electrophysiological data. Traditionally, MEA data is analyzed using handcrafted features based on potentially imperfect domain knowledge, which while useful may not fully capture all the useful characteristics inherent in the MEA data. Machine learning, in particular deep learning has the potential to automatically learn relevant characteristics (features) from raw data, without solely relying on handcrafted feature extraction. However, handcrafted features remain critical for encoding domain knowledge and improving model interpretability, especially in scenarios with limited or noisy data, as is often the case in most experimental studies. This study introduces FRAME-C, a knowledge-augmented machine learning pipeline that combines domain knowledge, raw spike waveform data, and deep learning techniques to classify MEA signals and identify ALS-specific phenotypes. FRAME-C leverages deep learning to learn important features from spike waveforms, while also incorporating handcrafted features such as spike amplitude, inter-spike interval, and spike duration, thus preserving key spatial and temporal information. We validate FRAME-C on both simulated and real-world MEA data from human iPSC-derived neuronal cultures, demonstrating its superior performance compared to existing methods for MEA classification. FRAME-C performs significantly better, showing more than a 11% improvement on real-world data and up to 25% improvement on simulated data in terms of the test accuracy. Moreover, we show that FRAME-C can be used to evaluate the importance of each of the handcrafted features, and thereby contributing to the interpretation of the classification results. Permutation feature importances are calculated for these handcrafted features, providing further insights into the phenotypes of ALS. Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease that leads to a progressive loss of motor neurons. At the onset of ALS, symptoms may include limb weakness and difficulty in swallowing. However, the disease invariably progresses towards paralysis and respiratory failure within three to five years [1]. A small portion of ALS patients (5 - 10%) are familial (fALS) in nature and can be linked to a family history of ALS. However, the majority (90 - 95%) are sporadic (sALS) and do not have any known family history.
Large Language Models' Expert-level Global History Knowledge Benchmark (HiST-LLM)
Large Language Models (LLMs) have the potential to transform humanities and social science research, yet their history knowledge and comprehension at a graduate level remains untested. Benchmarking LLMs in history is particularly challenging, given that human knowledge of history is inherently unbalanced, with more information available on Western history and recent periods. We introduce the History Seshat Test for LLMs (HiST-LLM), based on a subset of the Seshat Global History Databank, which provides a structured representation of human historical knowledge, containing 36,000 data points across 600 historical societies and over 2,700 scholarly references. This dataset covers every major world region from the Neolithic period to the Industrial Revolution and includes information reviewed and assembled by history experts and graduate research assistants. We find that, in a four-choice format, LLMs have a balanced accuracy ranging from 33.6% (Llama-3.1-8B)
Fit for our purpose, not yours: Benchmark for a low-resource, Indigenous language
Influential and popular benchmarks in AI are largely irrelevant to developing NLP tools for low-resource, Indigenous languages. With the primary goal of measuring the performance of general-purpose AI systems, these benchmarks fail to give due consideration and care to individual language communities, especially low-resource languages. The datasets contain numerous grammatical and orthographic errors, poor pronunciation, limited vocabulary, and the content lacks cultural relevance to the language community. To overcome the issues with these benchmarks, we have created a dataset for te reo Mฤori (the Indigenous language of Aotearoa/New Zealand) to pursue NLP tools that are'fit-for-our-purpose'. This paper demonstrates how low-resourced, Indigenous languages can develop tailored, high-quality benchmarks that; i.
SolarCube: An Integrative Benchmark Dataset Harnessing Satellite and In-situ Observations for Large-scale Solar Energy Forecasting
Solar power is a critical source of renewable energy, offering significant potential to lower greenhouse gas emissions and mitigate climate change. However, the cloud induced-variability of solar radiation reaching the earth's surface presents a challenge for integrating solar power into the grid (e.g., storage and backup management). The new generation of geostationary satellites such as GOES-16 has become an important data source for large-scale and high temporal frequency solar radiation forecasting. However, no machine-learning-ready dataset has integrated geostationary satellite data with fine-grained solar radiation information to support forecasting model development and benchmarking with consistent metrics. SolarCube covers 19 study areas distributed over multiple continents: North America, South America, Asia, and Oceania.
The facial feature that means you're more likely to have a son
You might think that having a boy or a girl is completely up to chance. But expectant parents might be able to hazard a good guess โ depending on what the father's facial features are like. Researchers wanted to find out whether certain traits in parents were linked to the sex of their firstborn. The team, from the University of Michigan, recruited 104 pairs of parents with at least one child. Both were asked to submit facial photographs which were rated for attractiveness, dominance and masculinity or femininity by university students.
Man who posted deepfake images of prominent Australian women could face 450,000 penalty
The online safety regulator wants a 450,000 maximum penalty imposed on a man who posted deepfake images of prominent Australian women to a website, in the first case of its kind heard in an Australian court. The eSafety commissioner has launched proceedings against Anthony Rotondo over his failure to remove "intimate images" of several prominent Australian women from a deepfake pornography website. The federal court has kept the names of the women confidential. Rotondo initially refused to comply with the order while he was based in the Philippines, the court heard, but the commissioner launched the case once he returned to Australia. Rotondo posted the images to the MrDeepFakes website, which has since been shut down.
Graph Attention Neural Network for Botnet Detection: Evaluating Autoencoder, VAE and PCA-Based Dimension Reduction
Wasswa, Hassan, Abbass, Hussein, Lynar, Timothy
With the rise of IoT-based botnet attacks, researchers have explored various learning models for detection, including traditional machine learning, deep learning, and hybrid approaches. A key advancement involves deploying attention mechanisms to capture long-term dependencies among features, significantly improving detection accuracy. However, most models treat attack instances independently, overlooking inter-instance relationships. Graph Neural Networks (GNNs) address this limitation by learning an embedding space via iterative message passing where similar instances are placed closer based on node features and relationships, enhancing classification performance. To further improve detection, attention mechanisms have been embedded within GNNs, leveraging both long-range dependencies and inter-instance connections. However, transforming the high dimensional IoT attack datasets into a graph structured dataset poses challenges, such as large graph structures leading computational overhead. To mitigate this, this paper proposes a framework that first reduces dimensionality of the NetFlow-based IoT attack dataset before transforming it into a graph dataset. We evaluate three dimension reduction techniques--Variational Autoencoder (VAE-encoder), classical autoencoder (AE-encoder), and Principal Component Analysis (PCA)--and compare their effects on a Graph Attention neural network (GAT) model for botnet attack detection