Oceania
Multiple LLM Agents Debate for Equitable Cultural Alignment
Ki, Dayeon, Rudinger, Rachel, Zhou, Tianyi, Carpuat, Marine
Large Language Models (LLMs) need to adapt their predictions to diverse cultural contexts to benefit diverse communities across the world. While previous efforts have focused on single-LLM, single-turn approaches, we propose to exploit the complementary strengths of multiple LLMs to promote cultural adaptability. We introduce a Multi-Agent Debate framework, where two LLM-based agents debate over a cultural scenario and collaboratively reach a final decision. We propose two variants: one where either LLM agents exclusively debate and another where they dynamically choose between self-reflection and debate during their turns. We evaluate these approaches on 7 open-weight LLMs (and 21 LLM combinations) using the NormAd-ETI benchmark for social etiquette norms in 75 countries. Experiments show that debate improves both overall accuracy and cultural group parity over single-LLM baselines. Notably, multi-agent debate enables relatively small LLMs (7-9B) to achieve accuracies comparable to that of a much larger model (27B parameters).
Enhancing Reliability in LLM-Integrated Robotic Systems: A Unified Approach to Security and Safety
Zhang, Wenxiao, Kong, Xiangrui, Dewitt, Conan, Brรคunl, Thomas, Hong, Jin B.
Integrating Large Language Models (LLMs) into robotic systems has revolutionised embodied artificial intelligence, enabling advanced decision-making and adaptability. However, ensuring reliability--encompassing both security against adversarial attacks and safety in complex environments--remains a critical challenge. To address this, we propose a unified framework that mitigates prompt injection attacks while enforcing operational safety through robust validation mechanisms. Our approach combines prompt assembling, state management, and safety validation, evaluated using both performance and security metrics. Experiments show a 30.8% improvement under injection attacks and up to a 325% improvement in complex environment settings under adversarial conditions compared to baseline scenarios. The framework is open-sourced with simulation and physical deployment demos at https://llmeyesim.vercel.app/. Introduction The integration of Large Language Models (LLMs) into embodied robotic systems represents a significant leap in robotic autonomy and adaptability [11]. Recent advances enable robots to interpret natural language instructions, fuse multimodal sensor data, and make planning decisions using the general-purpose reasoning capabilities of models like GPT -4o [22]. These capabilities promise generalist agents that can execute complex, interactive tasks without task-specific training [14]. By drawing on vast internet-scale training corpora, LLMs can produce structured action plans from ambiguous user goals, acting as high-level controllers in dynamic and unpredictable environments [12]. However, these benefits come with risks. Unlike traditional robotic architectures that rely on modular safety subsystems, such as collision avoidance, mission timeouts, and hardware constraints, LLM-based controllers can bypass these safeguards via incorrect inference or adversarial inputs. The semantic sensitivity of LLMs to phrasing, ambiguity, or hallucinated knowledge introduces vulnerabilities not addressed by existing robotics safety protocols [6]. Moreover, integrating multimodal perception (e.g., camera, LiDAR) expands the input space but also introduces new failure modes, where partial, spoofed, or contextually misleading inputs can lead to unsafe behaviours [31]. The current literature lacks a unified methodology to secure and validate the behaviour of LLM-driven robots. Most prior work evaluates vision-language reasoning or robotic planning in isolation and does not consider how prompt injection attacks or input spoofing a ffect downstream physical actions. Similarly, existing LLM safety work focuses on digital assistants or text-only settings, leaving a critical gap in embodied use cases such as autonomous navigation and exploration [37, 20]. As robots begin to operate in open-world human environments, the absence of integrated security and safety layers poses real risks to both mission success and human-robot interaction.
HiGraph: A Large-Scale Hierarchical Graph Dataset for Malware Analysis
Chen, Han, Wang, Hanchen, Chen, Hongmei, Zhang, Ying, Qin, Lu, Zhang, Wenjie
The advancement of graph-based malware analysis is critically limited by the absence of large-scale datasets that capture the inherent hierarchical structure of software. Existing methods often oversimplify programs into single level graphs, failing to model the crucial semantic relationship between high-level functional interactions and low-level instruction logic. To bridge this gap, we introduce \dataset, the largest public hierarchical graph dataset for malware analysis, comprising over \textbf{200M} Control Flow Graphs (CFGs) nested within \textbf{595K} Function Call Graphs (FCGs). This two-level representation preserves structural semantics essential for building robust detectors resilient to code obfuscation and malware evolution. We demonstrate HiGraph's utility through a large-scale analysis that reveals distinct structural properties of benign and malicious software, establishing it as a foundational benchmark for the community. The dataset and tools are publicly available at https://higraph.org.
AI-Driven Marine Robotics: Emerging Trends in Underwater Perception and Ecosystem Monitoring
Raine, Scarlett, Fischer, Tobias
Marine ecosystems face increasing pressure due to climate change, driving the need for scalable, AI-powered monitoring solutions. This paper examines the rapid emergence of underwater AI as a major research frontier and analyzes the factors that have transformed marine perception from a niche application into a catalyst for AI innovation. We identify three convergent drivers: environmental necessity for ecosystem-scale monitoring, democratization of underwater datasets through citizen science platforms, and researcher migration from saturated terrestrial computer vision domains. Our analysis reveals how unique underwater challenges - turbidity, cryptic species detection, expert annotation bottlenecks, and cross-ecosystem generalization - are driving fundamental advances in weakly supervised learning, open-set recognition, and robust perception under degraded conditions. We survey emerging trends in datasets, scene understanding and 3D reconstruction, highlighting the paradigm shift from passive observation toward AI-driven, targeted intervention capabilities. The paper demonstrates how underwater constraints are pushing the boundaries of foundation models, self-supervised learning, and perception, with methodological innovations that extend far beyond marine applications to benefit general computer vision, robotics, and environmental monitoring.
Learning Longitudinal Stress Dynamics from Irregular Self-Reports via Time Embeddings
Simon, Louis, Chetouani, Mohamed
The widespread adoption of mobile and wearable sensing technologies has enabled continuous and personalized monitoring of affect, mood disorders, and stress. When combined with ecological self-report questionnaires, these systems offer a powerful opportunity to explore longitudinal modeling of human behaviors. However, challenges arise from missing data and the irregular timing of self-reports, which make challenging the prediction of human states and behaviors. In this study, we investigate the use of time embeddings to capture time dependencies within sequences of Ecological Momentary Assessments (EMA). We introduce a novel time embedding method, Ema2Vec, designed to effectively handle irregularly spaced self-reports, and evaluate it on a new task of longitudinal stress prediction. Our method outperforms standard stress prediction baselines that rely on fixed-size daily windows, as well as models trained directly on longitudinal sequences without time-aware representations. These findings emphasize the importance of incorporating time embeddings when modeling irregularly sampled longitudinal data.
Robust Knowledge Editing via Explicit Reasoning Chains for Distractor-Resilient Multi-Hop QA
Wu, Yuchen, Ding, Liang, Shen, Li, Tao, Dacheng
Large language models (LLMs) encode vast amounts of world knowledge but remain static once trained, making the timely integration of emerging facts prohibitively expensive via full retraining. Knowledge-editing techniques have thus emerged to inject or overwrite specific facts into LLMs, yet they either over-rely on superficial cues or incur complex, iterative pipelines that collapse under noisy, multi-hop conditions. We introduce Reason-KE, an end-to-end reasoning-chain-based editing framework that steers a pretrained LLM through four structured stages-fact acknowledgment, relevance determination, selective application, and final reasoning-to filter distractors in a single pass. Trained on MQuAKE-CF with up to four irrelevant facts, Reason-KE elevates Qwen2.5-7B's multi-hop QA accuracy to 90.2% while suffering merely a 6.3% drop under heavy distraction and <1% when answers are leaked. Our quantitative analysis confirms Reason-KE's resilience and efficiency, establishing a new state-of-the-art for reliable LLM knowledge updates.
Reinforcement Learning Driven Generalizable Feature Representation for Cross-User Activity Recognition
Ye, Xiaozhou, Wang, Kevin I-Kai
Human Activity Recognition (HAR) using wearable sensors is crucial for healthcare, fitness tracking, and smart environments, yet cross-user variability -- stemming from diverse motion patterns, sensor placements, and physiological traits -- hampers generalization in real-world settings. Conventional supervised learning methods often overfit to user-specific patterns, leading to poor performance on unseen users. Existing domain generalization approaches, while promising, frequently overlook temporal dependencies or depend on impractical domain-specific labels. We propose Temporal-Preserving Reinforcement Learning Domain Generalization (TPRL-DG), a novel framework that redefines feature extraction as a sequential decision-making process driven by reinforcement learning. TPRL-DG leverages a Transformer-based autoregressive generator to produce temporal tokens that capture user-invariant activity dynamics, optimized via a multi-objective reward function balancing class discrimination and cross-user invariance. Key innovations include: (1) an RL-driven approach for domain generalization, (2) autoregressive tokenization to preserve temporal coherence, and (3) a label-free reward design eliminating the need for target user annotations. Evaluations on the DSADS and PAMAP2 datasets show that TPRL-DG surpasses state-of-the-art methods in cross-user generalization, achieving superior accuracy without per-user calibration. By learning robust, user-invariant temporal patterns, TPRL-DG enables scalable HAR systems, facilitating advancements in personalized healthcare, adaptive fitness tracking, and context-aware environments.
AI-driven Dispensing of Coral Reseeding Devices for Broad-scale Restoration of the Great Barrier Reef
Raine, Scarlett, Moshirian, Benjamin, Fischer, Tobias
Coral reefs are on the brink of collapse, with climate change, ocean acidification, and pollution leading to a projected 70-90% loss of coral species within the next decade. Restoration efforts are crucial, but their success hinges on introducing automation to upscale efforts. We present automated deployment of coral re-seeding devices powered by artificial intelligence, computer vision, and robotics. Specifically, we perform automated substrate classification, enabling detection of areas of the seafloor suitable for coral growth, thus significantly reducing reliance on human experts and increasing the range and efficiency of restoration. Real-world testing of the algorithms on the Great Barrier Reef leads to deployment accuracy of 77.8%, sub-image patch classification of 89.1%, and real-time model inference at 5.5 frames per second. Further, we present and publicly contribute a large collection of annotated substrate image data to foster future research in this area.
Ranking of Bangla Word Graph using Graph-based Ranking Algorithms
Ranking words is an important way to summarize a text or to retrieve information. A word graph is a way to represent the words of a sentence or a text as the vertices of a graph and to show the relationship among the words. It is also useful to determine the relative importance of a word among the words in the word-graph. In this research, the ranking of Bangla words are calculated, representing Bangla words from a text in a word graph using various graph based ranking algorithms. There is a lack of a standard Bangla word database. In this research, the Indian Language POS-tag Corpora is used, which has a rich collection of Bangla words in the form of sentences with their parts of speech tags. For applying a word graph to various graph based ranking algorithms, several standard procedures are applied. The preprocessing steps are done in every word graph and then applied to graph based ranking algorithms to make a comparison among these algorithms. This paper illustrate the entire procedure of calculating the ranking of Bangla words, including the construction of the word graph from text. Experimental result analysis on real data reveals the accuracy of each ranking algorithm in terms of F1 measure.
Look Beyond: Two-Stage Scene View Generation via Panorama and Video Diffusion
Kang, Xueyang, Xiang, Zhengkang, Zhang, Zezheng, Khoshelham, Kourosh
Novel view synthesis (NVS) from a single image is highly ill-posed due to large unobserved regions, especially for views that deviate significantly from the input. While existing methods focus on consistency between the source and generated views, they often fail to maintain coherence and correct view alignment across long-range or looped trajectories. We propose a model that addresses this by decomposing single-view NVS into a 360-degree scene extrapolation followed by novel view interpolation. This design ensures long-term view and scene consistency by conditioning on keyframes extracted and warped from a generated panoramic representation. In the first stage, a panorama diffusion model learns the scene prior from the input perspective image. Perspective keyframes are then sampled and warped from the panorama and used as anchor frames in a pre-trained video diffusion model, which generates novel views through a proposed spatial noise diffusion process. Compared to prior work, our method produces globally consistent novel views -- even in loop closure scenarios -- while enabling flexible camera control. Experiments on diverse scene datasets demonstrate that our approach outperforms existing methods in generating coherent views along user-defined trajectories. Our implementation is available at https://github.com/YiGuYT/LookBeyond.