Oceania
Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation
Malone, Connor, Claxton, Owen, Shames, Iman, Milford, Michael
-- Stand-alone Visual Place Recognition (VPR) systems have little defence against a well-designed adversarial attack, which can lead to disastrous consequences when deployed for robot navigation. We then propose how to close the loop between VPR, an Adversarial Attack Detector (AAD), and active navigation decisions by demonstrating the performance benefit of simulated AADs in a novel experiment paradigm - which we detail for the robotics community to use as a system framework. In the proposed experiment paradigm, we see the addition of AADs across a range of detection accuracies can improve performance over baseline; demonstrating a significant improvement - such as a 50% reduction in the mean along-track localization error - can be achieved with True Positive and False Positive detection rates of only 75% and up to 25% respectively. We examine a variety of metrics including: Along-Track Error, Percentage of Time Attacked, Percentage of Time in an'Unsafe' State, and Longest Continuous Time Under Attack. Expanding further on these results, we provide the first investigation into the efficacy of the Fast Gradient Sign Method (FGSM) adversarial attack for VPR. The analysis in this work highlights the need for AADs in real-world systems for trustworthy navigation, and informs quantitative requirements for system design. Although the impact of adversity in Visual Place Recognition (VPR) is widely understood, with state-of-the-art models offering increasing levels of robustness [1]-[4], the effects of adversarial attacks remain under-explored. Adversarial attacks generally refer to perturbations made to signals or input data by adversaries, with the goal of forcing the output of a system to be incorrect [5]. There has been a significant amount of work researching their effects on perception tasks such as image classification and object detection [5]-[9], yet they have not been widely investigated in the context of VPR. Adversarial attacks on perception systems vary depending on the level of access and information available to an attacker, including digital, physical-world, subtle, or overt attacks [5].
Secret koala population discovered near Australian city
Breakthroughs, discoveries, and DIY tips sent every weekday. When you think of koalas (Phascolarctos cinereus), chances are that words like cute or fluffy come to mind--not cryptic or stealthy. And yet, researchers in southeastern Australia have just discovered hundreds of previously undocumented koalas living surprisingly close to the city of Newcastle. The team conducted what they claim to be the largest and most accurate peer-reviewed koala survey to date. As detailed in a study published this month in the journal Biological Conversation, the survey estimates that a population of 4,357 koalas across 166,302 acres of land is living in the state of New South Wales.
'We're all connected – but it's not the connection I imagined': Hideo Kojima on Death Stranding 2
Hideo Kojima – the acclaimed video game director who helmed the stealth-action Metal Gear series for decades before founding his own company to make Death Stranding, a supernatural post-apocalyptic delivery game this publication described as "2019's most interesting blockbuster" – is still starstruck, or perhaps awestruck. "George [Miller] is my sensei, my God," he proclaims gleefully. Kojima is visiting Australia for a sold-out chat with Miller, the creator of the Mad Max film franchise, at the Sydney film festival. The two struck up an unlikely but fierce friendship nearly a decade ago, and Kojima says that, as a teenager, the first two Mad Max films inspired him to become a movie director and thus, eventually, a video game maker. At the panel later, Miller is equally effusive, calling Kojima "almost my brother"; the Australian even lent his appearance to a major character in Kojima's latest game, Death Stranding 2. It's actually because of Miller that much of this latest game is set in a heavily fictionalised version of Australia, Kojima jokes.
Efficient and Generalizable Environmental Understanding for Visual Navigation
Wang, Ruoyu, Li, Xinshu, Wang, Chen, Yao, Lina
Visual Navigation is a core task in Embodied AI, enabling agents to navigate complex environments toward given objectives. Across diverse settings within Navigation tasks, many necessitate the modelling of sequential data accumulated from preceding time steps. While existing methods perform well, they typically process all historical observations simultaneously, overlooking the internal association structure within the data, which may limit the potential for further improvements in task performance. We address this by examining the unique characteristics of Navigation tasks through the lens of causality, introducing a causal framework to highlight the limitations of conventional sequential methods. Leveraging this insight, we propose Causality-Aware Navigation (CAN), which incorporates a Causal Understanding Module to enhance the agent's environmental understanding capability. Empirical evaluations show that our approach consistently outperforms baselines across various tasks and simulation environments. Extensive ablations studies attribute these gains to the Causal Understanding Module, which generalizes effectively in both Reinforcement and Supervised Learning settings without computational overhead.
Optimal Convergence Rates of Deep Neural Network Classifiers
Zhang, Zihan, Shi, Lei, Zhou, Ding-Xuan
In this paper, we study the binary classification problem on $[0,1]^d$ under the Tsybakov noise condition (with exponent $s \in [0,\infty]$) and the compositional assumption. This assumption requires the conditional class probability function of the data distribution to be the composition of $q+1$ vector-valued multivariate functions, where each component function is either a maximum value function or a Hölder-$β$ smooth function that depends only on $d_*$ of its input variables. Notably, $d_*$ can be significantly smaller than the input dimension $d$. We prove that, under these conditions, the optimal convergence rate for the excess 0-1 risk of classifiers is $$ \left( \frac{1}{n} \right)^{\frac{β\cdot(1\wedgeβ)^q}{{\frac{d_*}{s+1}+(1+\frac{1}{s+1})\cdotβ\cdot(1\wedgeβ)^q}}}\;\;\;, $$ which is independent of the input dimension $d$. Additionally, we demonstrate that ReLU deep neural networks (DNNs) trained with hinge loss can achieve this optimal convergence rate up to a logarithmic factor. This result provides theoretical justification for the excellent performance of ReLU DNNs in practical classification tasks, particularly in high-dimensional settings. The technique used to establish these results extends the oracle inequality presented in our previous work. The generalized approach is of independent interest.
Free Privacy Protection for Wireless Federated Learning: Enjoy It or Suffer from It?
Li, Weicai, Lv, Tiejun, Zhao, Xiyu, Yuan, Xin, Ni, Wei
--Inherent communication noises have the potential to preserve privacy for wireless federated learning (WFL) but have been overlooked in digital communication systems predominantly using floating-point number standards, e.g., IEEE 754, for data storage and transmission. This is due to the potentially catastrophic consequences of bit errors in floating-point numbers, e.g., on the sign or exponent bits. This paper presents a novel channel-native bit-flipping differential privacy (DP) mechanism tailored for WFL, where transmit bits are randomly flipped and communication noises are leveraged, to collectively preserve the privacy of WFL in digital communication systems. The key idea is to interpret the bit perturbation at the transmitter and bit errors caused by communication noises as a bit-flipping DP process. This is achieved by designing a new floating-point-to-fixed-point conversion method that only transmits the bits in the fraction part of model parameters, hence eliminating the need for transmitting the sign and exponent bits and preventing the catastrophic consequence of bit errors. We analyze a new metric to measure the bit-level distance of the model parameters and prove that the proposed mechanism satisfies ( λ, ϵ) -Rényi DP and does not violate the WFL convergence. Experiments validate privacy and convergence analysis of the proposed mechanism and demonstrate its superiority to the state-of-the-art Gaussian mechanisms that are channel-agnostic and add Gaussian noise for privacy protection. Privacy-preserving federated learning (FL) integrates privacy models into a distributed machine learning (ML) framework, offering provable privacy assurances [1]-[5]. Combining DP with FL permits clients to train their local models within specified privacy protection levels [11], [12]. This paper was supported in part by the National Natural Science Foundation of China under No. 62271068, and the Beijing Natural Science Foundation under Grant No. L222046.
OpenAI boss accuses Meta of trying to poach staff with 100m sign-on bonuses
The boss of OpenAI has claimed that Mark Zuckerberg's Meta has tried to poach his top artificial intelligence experts with "crazy" signing bonuses of 100m ( 74m), as the scramble for talent in the booming sector intensifies. Sam Altman spoke about the offers in a podcast on Tuesday. They have not been confirmed by Meta. OpenAI, the company that developed ChatGPT, said it had nothing to add beyond its chief executive's comments. "They started making these giant offers to a lot of people on our team – 100m signing bonuses, more than that comp [compensation] per year," Altman told the Uncapped podcast, which is presented by his brother, Jack.
Public Acceptance of Cybernetic Avatars in the service sector: Evidence from a Large-Scale Survey in Dubai
Aymerich-Franch, Laura, Taha, Tarek, Miyashita, Takahiro, Kamide, Hiroko, Ishiguro, Hiroshi, Dario, Paolo
Cybernetic avatars are hybrid interaction robots or digital representations that combine autonomous capabilities with teleoperated control. This study investigates the acceptance of cybernetic avatars in the highly multicultural society of Dubai, with particular emphasis on robotic avatars for customer service. Specifically, we explore how acceptance varies as a function of robot appearance (e.g., android, robotic-looking, cartoonish), deployment settings (e.g., shopping malls, hotels, hospitals), and functional tasks (e.g., providing information, patrolling). To this end, we conducted a large-scale survey with over 1,000 participants. Overall, cybernetic avatars received a high level of acceptance, with physical robot avatars receiving higher acceptance than digital avatars. In terms of appearance, robot avatars with a highly anthropomorphic robotic appearance were the most accepted, followed by cartoonish designs and androids. Animal-like appearances received the lowest level of acceptance. Among the tasks, providing information and guidance was rated as the most valued. Shopping malls, airports, public transport stations, and museums were the settings with the highest acceptance, whereas healthcare-related spaces received lower levels of support. An analysis by community cluster revealed among others that Emirati respondents showed significantly greater acceptance of android appearances compared to the overall sample, while participants from the 'Other Asia' cluster were significantly more accepting of cartoonish appearances. Our study underscores the importance of incorporating citizen feedback into the design and deployment of cybernetic avatars from the early stages to enhance acceptance of this technology in society.
LingoLoop Attack: Trapping MLLMs via Linguistic Context and State Entrapment into Endless Loops
Fu, Jiyuan, Jiang, Kaixun, Hong, Lingyi, Li, Jinglun, Guo, Haijing, Yang, Dingkang, Chen, Zhaoyu, Zhang, Wenqiang
Multimodal Large Language Models (MLLMs) have shown great promise but require substantial computational resources during inference. Attackers can exploit this by inducing excessive output, leading to resource exhaustion and service degradation. Prior energy-latency attacks aim to increase generation time by broadly shifting the output token distribution away from the EOS token, but they neglect the influence of token-level Part-of-Speech (POS) characteristics on EOS and sentence-level structural patterns on output counts, limiting their efficacy. To address this, we propose LingoLoop, an attack designed to induce MLLMs to generate excessively verbose and repetitive sequences. First, we find that the POS tag of a token strongly affects the likelihood of generating an EOS token. Based on this insight, we propose a POS-Aware Delay Mechanism to postpone EOS token generation by adjusting attention weights guided by POS information. Second, we identify that constraining output diversity to induce repetitive loops is effective for sustained generation. We introduce a Generative Path Pruning Mechanism that limits the magnitude of hidden states, encouraging the model to produce persistent loops. Extensive experiments demonstrate LingoLoop can increase generated tokens by up to 30 times and energy consumption by a comparable factor on models like Qwen2.5-VL-3B, consistently driving MLLMs towards their maximum generation limits. These findings expose significant MLLMs' vulnerabilities, posing challenges for their reliable deployment. The code will be released publicly following the paper's acceptance.
Investigating the Potential of Large Language Model-Based Router Multi-Agent Architectures for Foundation Design Automation: A Task Classification and Expert Selection Study
Youwai, Sompote, Phim, David, Murcia, Vianne Gayl, Onas, Rianne Clair
This study investigates router-based multi-agent systems for automating foundation design calculations through intelligent task classification and expert selection. Three approaches were evaluated: single-agent processing, multi-agent designer-checker architecture, and router-based expert selection. Performance assessment utilized baseline models including DeepSeek R1, ChatGPT 4 Turbo, Grok 3, and Gemini 2.5 Pro across shallow foundation and pile design scenarios. The router-based configuration achieved performance scores of 95.00% for shallow foundations and 90.63% for pile design, representing improvements of 8.75 and 3.13 percentage points over standalone Grok 3 performance respectively. The system outperformed conventional agentic workflows by 10.0 to 43.75 percentage points. Grok 3 demonstrated superior standalone performance without external computational tools, indicating advances in direct LLM mathematical reasoning for engineering applications. The dual-tier classification framework successfully distinguished foundation types, enabling appropriate analytical approaches. Results establish router-based multi-agent systems as optimal for foundation design automation while maintaining professional documentation standards. Given safety-critical requirements in civil engineering, continued human oversight remains essential, positioning these systems as advanced computational assistance tools rather than autonomous design replacements in professional practice.