Oceania
Can an AI chatbot of Dr Karl change climate sceptics' minds? He's willing to give it a try
There's arguably no face, voice or collection of exuberant, patterned shirts more recognisable than those belonging to Dr Karl Kruszelnicki. The bespectacled boffin has been answering curly listener questions about science, with characteristic excitement and passion, for more than 40 years. Despite a seemingly tireless work ethic, Kruszelnicki, now 77 years old, can't be everywhere all at once. Those questions now come in waves, across social media platforms at all hours of the day. "Sometimes I get 300 requests a day on Twitter to answer an involved question about climate change," Kruszelnicki says.
Tax relief and Carmen Sandiego: Australia's once-dismissed video game industry is finally getting a leg-up
The idea that video games are not "serious things", says Ross Symons, overlooks the benefits they offer to gamers feeling isolated. "One thing that struck me during Covid is that games were the way that people connected and stayed together." The chief executive of Big Ant Studios, a Melbourne-based game developer, recalls when in 2010 the then opposition leader Tony Abbott dismissed the national broadband network as being for "internet-based television, video entertainment and gaming". Symons says that dismissiveness of the video game industry has not stood the test of time. Last year alone, Australians spent 3.8bn on video games, according to the Interactive Games and Entertainment Association (IGEA).
Teenager who lost his legs in crash will 'never forgive' driver
Teenager who lost his legs in crash will'never forgive' driver 38 minutes agoShareSaveKen Banks and Louise HosieBBC Scotland NewsShareSaveBBC Adam Golebiewski had a double amputation after the crash last year A teenager who lost his lower legs in a crash says he "will never forgive" the drink-driver at the wheel. Young footballer Adam Golebiewski, 18, had been a passenger in Arran Paterson's car in Macduff, Aberdeenshire, in September last year. Paterson, 19, admitted dangerous driving, being over the drink-drive limit and driving without insurance at Aberdeen Sheriff Court. Adam walked into court unaided on prosthetic legs following intensive rehabilitation. He said: "I want to try to enjoy life again and stay positive."
T3Time: Tri-Modal Time Series Forecasting via Adaptive Multi-Head Alignment and Residual Fusion
Chowdhury, Abdul Monaf, Akter, Rabeya, Arib, Safaeid Hossain
Multivariate time series forecasting (MTSF) seeks to model temporal dynamics among variables to predict future trends. Transformer-based models and large language models (LLMs) have shown promise due to their ability to capture long-range dependencies and patterns. However, current methods often rely on rigid inductive biases, ignore inter-variable interactions, or apply static fusion strategies that limit adaptability across forecast horizons. These limitations create bottlenecks in capturing nuanced, horizon-specific relationships in time-series data. To solve this problem, we propose T3Time, a novel trimodal framework consisting of time, spectral, and prompt branches, where the dedicated frequency encoding branch captures the periodic structures along with a gating mechanism that learns prioritization between temporal and spectral features based on the prediction horizon. We also proposed a mechanism which adaptively aggregates multiple cross-modal alignment heads by dynamically weighting the importance of each head based on the features. Extensive experiments on benchmark datasets demonstrate that our model consistently outperforms state-of-the-art baselines, achieving an average reduction of 3.28% in MSE and 2.29% in MAE. Furthermore, it shows strong generalization in few-shot learning settings: with 5% training data, we see a reduction in MSE and MAE by 4.13% and 1.91%, respectively; and with 10% data, by 3.62% and 1.98% on average.
Generative Flexible Latent Structure Regression (GFLSR) model
Grazian, Clara, Jin, Qian, De Micheaux, Pierre Lafaye
Latent structure methods, specifically linear continuous latent structure methods, are a type of fundamental statistical learning strategy. They are widely used for dimension reduction, regression and prediction, in the fields of chemometrics, economics, social science and etc. However, due to the lack of model inference, generative form, and unidentifiable parameters, most of these methods are always used as an algorithm, instead of a model. This paper proposed a Generative Flexible Latent Structure Regression (GFLSR) model structure to address this problem. Moreover, we show that most linear continuous latent variable methods can be represented under the proposed framework. The recursive structure allows potential model inference and residual analysis. Then, the traditional Partial Least Squares (PLS) is focused; we show that the PLS can be specialised in the proposed model structure, named Generative-PLS. With a model structure, we analyse the convergence of the parameters and the latent variables. Under additional distribution assumptions, we show that the proposed model structure can lead to model inference without solving the probabilistic model. Additionally, we proposed a novel bootstrap algorithm that enables uncertainty on parameters and on prediction for new datasets. A simulation study and a Real-world dataset are used to verify the proposed Generative-PLS model structure. Although the traditional PLS is a special case, this proposed GFLSRM structure leads to a potential inference structure for all the linear continuous latent variable methods.
ContextASR-Bench: A Massive Contextual Speech Recognition Benchmark
Wang, He, Ma, Linhan, Guo, Dake, Wang, Xiong, Xie, Lei, Xu, Jin, Lin, Junyang
Automatic Speech Recognition (ASR) has been extensively investigated, yet prior benchmarks have largely focused on assessing the acoustic robustness of ASR models, leaving evaluations of their linguistic capabilities relatively underexplored. This largely stems from the limited parameter sizes and training corpora of conventional ASR models, leaving them with insufficient world knowledge, which is crucial for accurately recognizing named entities across diverse domains. For instance, drug and treatment names in medicine or specialized technical terms in engineering. Recent breakthroughs in Large Language Models (LLMs) and corresponding Large Audio Language Models (LALMs) have markedly enhanced the visibility of advanced context modeling and general artificial intelligence capabilities. Leveraging LLMs, we envision a unified system capable of robust speech recognition across diverse real-world domains, yet existing benchmarks are inadequate for evaluating this objective. To address this gap, we propose ContextASR-Bench: a comprehensive, large-scale benchmark designed to assess the linguistic competence of ASR systems using corpora that feature numerous named entities across multiple domains. It encompasses up to 40,000 data entries with more than 300,000 named entities across over 10 domains. Beyond the audio and its transcription, each sample provides the domain it belongs to and a list of named entities it contains, which are referred to as the context. Based on this, we introduce three evaluation modes to assess how effectively models can exploit such context to improve ASR accuracy. Extensive evaluation on ContextASR-Bench highlights that LALMs outperform conventional ASR models by a large margin thanks to the strong world knowledge and context modeling of LLMs, yet there remains ample room for further improvement. The dataset and evaluation code have been released.
Experimental Analysis of Productive Interaction Strategy with ChatGPT: User Study on Function and Project-level Code Generation Tasks
Hyun, Sangwon, Kim, Hyunjun, Jang, Jinhyuk, Choi, Hyojin, Babar, M. Ali
The application of Large Language Models (LLMs) is growing in the productive completion of Software Engineering tasks. Yet, studies investigating the productive prompting techniques often employed a limited problem space, primarily focusing on well-known prompting patterns and mainly targeting function-level SE practices. We identify significant gaps in real-world workflows that involve complexities beyond class-level (e.g., multi-class dependencies) and different features that can impact Human-LLM Interactions (HLIs) processes in code generation. To address these issues, we designed an experiment that comprehensively analyzed the HLI features regarding the code generation productivity. Our study presents two project-level benchmark tasks, extending beyond function-level evaluations. We conducted a user study with 36 participants from diverse backgrounds, asking them to solve the assigned tasks by interacting with the GPT assistant using specific prompting patterns. We also examined the participants' experience and their behavioral features during interactions by analyzing screen recordings and GPT chat logs. Our statistical and empirical investigation revealed (1) that three out of 15 HLI features significantly impacted the productivity in code generation; (2) five primary guidelines for enhancing productivity for HLI processes; and (3) a taxonomy of 29 runtime and logic errors that can occur during HLI processes, along with suggested mitigation plans.
A Robust and Efficient Pipeline for Enterprise-Level Large-Scale Entity Resolution
Kannangara, Sandeepa, Abrahamyan, Arman, Elias, Daniel, Kilby, Thomas, Dar, Nadav, Pizzato, Luiz, Leontjeva, Anna, Jermyn, Dan
Entity resolution (ER) remains a significant challenge in data management, especially when dealing with large datasets. This paper introduces MERAI (Massive Entity Resolution using AI), a robust and efficient pipeline designed to address record deduplication and linkage issues in high-volume datasets at an enterprise level. The pipeline's resilience and accuracy have been validated through various large-scale record deduplication and linkage projects. To evaluate MERAI's performance, we compared it with two well-known entity resolution libraries, Dedupe and Splink. While Dedupe failed to scale beyond 2 million records due to memory constraints, MERAI successfully processed datasets of up to 15.7 million records and produced accurate results across all experiments. Experimental data demonstrates that MERAI outperforms both baseline systems in terms of matching accuracy, with consistently higher F1 scores in both deduplication and record linkage tasks. MERAI offers a scalable and reliable solution for enterprise-level large-scale entity resolution, ensuring data integrity and consistency in real-world applications.
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
Wang, Zhepeng, Zhu, Yingjian, Dong, Guanghao, Yi, Hongzhu, Chen, Feng, Wang, Xinming, Xie, Jun
This study investigates the integration of trustworthy prior reasoning knowledge from MLLMs into multimodal emotion recognition. We employ Gemini to generate fine-grained, modality-separable reasoning traces, which are injected as priors during the fusion stage to enrich cross-modal interactions. To mitigate the pronounced class-imbalance in multimodal emotion recognition, we introduce Balanced Dual-Contrastive Learning, a loss formulation that jointly balances inter-class and intra-class distributions. Applied to the MER2024 benchmark, our prior-enhanced framework yields substantial performance gains, demonstrating that the reliability of MLLM-derived reasoning can be synergistically combined with the domain adaptability of lightweight fusion networks for robust, scalable emotion recognition.
CoughViT: A Self-Supervised Vision Transformer for Cough Audio Representation Learning
Luong, Justin, Xue, Hao, Salim, Flora D.
Physicians routinely assess respiratory sounds during the diagnostic process, providing insight into the condition of a patient's airways. In recent years, AI-based diagnostic systems operating on respiratory sounds, have demonstrated success in respiratory disease detection. These systems represent a crucial advancement in early and accessible diagnosis which is essential for timely treatment. However, label and data scarcity remain key challenges, especially for conditions beyond COVID-19, limiting diagnostic performance and reliable evaluation. In this paper, we propose CoughViT, a novel pre-training framework for learning general-purpose cough sound representations, to enhance diagnostic performance in tasks with limited data. To address label scarcity, we employ masked data modelling to train a feature encoder in a self-supervised learning manner. We evaluate our approach against other pre-training strategies on three diagnostically important cough classification tasks. Experimental results show that our representations match or exceed current state-of-the-art supervised audio representations in enhancing performance on downstream tasks.