Oceania
Inferring Implicit Goals Across Differing Task Models
Tulli, Silvia, Vasileiou, Stylianos Loukas, Chetouani, Mohamed, Sreedharan, Sarath
This should be all well and good, provided value-aligned behavior is to not only account for the human bottleneck states are also bottleneck states for the the specified user objectives but also any implicit agent. Otherwise, the agent must make an effort to figure out or unspecified user requirements. The existence what the user's underlying subgoals may be. of such implicit requirements could be particularly To see how such problems may arise, consider an agent common in settings where the user's understanding tasked with guiding a tourist to a famous art museum. The of the task model may differ from the agent's estimate tourist simply says, "Get me a plan to get to the art museum," of the model. Under this scenario, the user unaware of the city's metro system and expecting an may incorrectly expect some agent behavior to be above-ground route passing certain landmarks. The agent, inevitable or guaranteed. This paper addresses such however, might plan a route using the metro system. For the expectation mismatch in the presence of differing agent's metro route, bottlenecks migh include entering the models by capturing the possibility of unspecified metro, making transfers, and exiting at the correct station.
Be careful with DeepSeek, Australia says - so is it safe to use?
"For any openly available AI model, with a web or app interface - including but not limited to DeepSeek - the prompts, or questions that are asked of the AI, then become available to the makers of that model, as are the answers," said Emily Taylor, chief executive of Oxford Information Labs "So, anyone working on confidential or national security areas needs to be aware of those risks," she told the BBC. Dr Richard Whittle from University of Salford said he had "various concerns about data and privacy" with the app, but said there were "plenty of concerns" with the models used in the US too. "Consumers should always be wary, especially in the hype and fear of missing out on a new, highly popular, app," he said. The UK data regulator, the Information Commissioner's Office has urged the public to be aware of their rights around their information being used to train AI models. Asked by BBC News if it shared the Australian government's concerns, it said in a statement: "Generative AI developers and deployers need to make sure people have meaningful, concise and easily accessible information about the use of their personal data and have clear and effective processes for enabling people to exercise their information rights.
Australia's new chief scientist open to nuclear power but focused on energy forms available 'right now'
Australia's new chief scientist has said he is open to the prospect of nuclear power playing a role in the country's energy mix, but remained focused on forms of energy that were "available to help us right now". On his first day in the job, Prof Tony Haymet said new energy-intensive technologies like artificial intelligence could be powered by renewables, but that he thought serious discussions about nuclear in Australia were likely to be years away. "If you go back and look at Chernobyl and Three Mile Island and so on, there wasn't enough transparency and openness. I think the nuclear industry has accepted the fact that they have to rebuild their social licence to operate," Haymet told a press conference when asked about small modular reactors (SMRs). "You know, for the next chief scientist in 2030 or 2040, I think you can re-ask your question."
Why China's AI startup DeepSeek is sending shockwaves through global tech
DeepSeek, a little-known Chinese startup, has sent shockwaves through the global tech sector with the release of an artificial intelligence (AI) model whose capabilities rival the creations of Google and OpenAI. DeepSeek-R1's creator says its model was developed using less advanced, and fewer, computer chips than those employed by tech giants in the United States. In a research paper released last week, the model's development team said they had spent less than 6m on computing power to train the model – a fraction of the multibillion-dollar AI budgets enjoyed by US tech giants such as OpenAI, Alphabet and Meta. Marc Andreessen, one of the most influential tech venture capitalists in Silicon Valley, hailed the release of the model as "AI's Sputnik moment". The sudden emergence of a small Chinese startup capable of rivalling Silicon Valley's top players has challenged assumptions about US dominance in AI and raised fears that the sky-high market valuations of companies such as Nvidia, Alphabet and Meta may be detached from reality. On Monday, Nvidia, which holds a near-monopoly on producing the semiconductors that power generative AI, lost nearly 600bn in market capitalisation after its shares plummeted 17 percent.
One Head Eight Arms: Block Matrix based Low Rank Adaptation for CLIP-based Few-Shot Learning
Zhou, Chunpeng, Shen, Qianqian, Yu, Zhi, Bu, Jiajun, Wang, Haishuai
Recent advancements in fine-tuning Vision-Language Foundation Models (VLMs) have garnered significant attention for their effectiveness in downstream few-shot learning tasks.While these recent approaches exhibits some performance improvements, they often suffer from excessive training parameters and high computational costs. To address these challenges, we propose a novel Block matrix-based low-rank adaptation framework, called Block-LoRA, for fine-tuning VLMs on downstream few-shot tasks. Inspired by recent work on Low-Rank Adaptation (LoRA), Block-LoRA partitions the original low-rank decomposition matrix of LoRA into a series of sub-matrices while sharing all down-projection sub-matrices. This structure not only reduces the number of training parameters, but also transforms certain complex matrix multiplication operations into simpler matrix addition, significantly lowering the computational cost of fine-tuning. Notably, Block-LoRA enables fine-tuning CLIP on the ImageNet few-shot benchmark using a single 24GB GPU. We also show that Block-LoRA has the more tighter bound of generalization error than vanilla LoRA. Without bells and whistles, extensive experiments demonstrate that Block-LoRA achieves competitive performance compared to state-of-the-art CLIP-based few-shot methods, while maintaining a low training parameters count and reduced computational overhead.
General Scene Adaptation for Vision-and-Language Navigation
Hong, Haodong, Qiao, Yanyuan, Wang, Sen, Liu, Jiajun, Wu, Qi
Vision-and-Language Navigation (VLN) tasks mainly evaluate agents based on one-time execution of individual instructions across multiple environments, aiming to develop agents capable of functioning in any environment in a zero-shot manner. However, real-world navigation robots often operate in persistent environments with relatively consistent physical layouts, visual observations, and language styles from instructors. Such a gap in the task setting presents an opportunity to improve VLN agents by incorporating continuous adaptation to specific environments. To better reflect these real-world conditions, we introduce GSA-VLN, a novel task requiring agents to execute navigation instructions within a specific scene and simultaneously adapt to it for improved performance over time. To evaluate the proposed task, one has to address two challenges in existing VLN datasets: the lack of OOD data, and the limited number and style diversity of instructions for each scene. Therefore, we propose a new dataset, GSA-R2R, which significantly expands the diversity and quantity of environments and instructions for the R2R dataset to evaluate agent adaptability in both ID and OOD contexts. Furthermore, we design a three-stage instruction orchestration pipeline that leverages LLMs to refine speaker-generated instructions and apply role-playing techniques to rephrase instructions into different speaking styles. This is motivated by the observation that each individual user often has consistent signatures or preferences in their instructions. We conducted extensive experiments on GSA-R2R to thoroughly evaluate our dataset and benchmark various methods. Based on our findings, we propose a novel method, GR-DUET, which incorporates memory-based navigation graphs with an environment-specific training strategy, achieving state-of-the-art results on all GSA-R2R splits.
Amplifier: Bringing Attention to Neglected Low-Energy Components in Time Series Forecasting
Fei, Jingru, Yi, Kun, Fan, Wei, Zhang, Qi, Niu, Zhendong
We propose an energy amplification technique to address the issue that existing models easily overlook low-energy components in time series forecasting. This technique comprises an energy amplification block and an energy restoration block. The energy amplification block enhances the energy of low-energy components to improve the model's learning efficiency for these components, while the energy restoration block returns the energy to its original level. Moreover, considering that the energy-amplified data typically displays two distinct energy peaks in the frequency spectrum, we integrate the energy amplification technique with a seasonal-trend forecaster to model the temporal relationships of these two peaks independently, serving as the backbone for our proposed model, Amplifier. Additionally, we propose a semi-channel interaction temporal relationship enhancement block for Amplifier, which enhances the model's ability to capture temporal relationships from the perspective of the commonality and specificity of each channel in the data. Extensive experiments on eight time series forecasting benchmarks consistently demonstrate our model's superiority in both effectiveness and efficiency compared to state-of-the-art methods.
The Right to AI
Mushkani, Rashid, Berard, Hugo, Cohen, Allison, Koeski, Shin
This paper proposes a Right to AI, which asserts that individuals and communities should meaningfully participate in the development and governance of the AI systems that shape their lives. Motivated by the increasing deployment of AI in critical domains and inspired by Henri Lefebvre's concept of the Right to the City, we reconceptualize AI as a societal infrastructure, rather than merely a product of expert design. In this paper, we critically evaluate how generative agents, large-scale data extraction, and diverse cultural values bring new complexities to AI oversight. The paper proposes that grassroots participatory methodologies can mitigate biased outcomes and enhance social responsiveness. It asserts that data is socially produced and should be managed and owned collectively. Drawing on Sherry Arnstein's Ladder of Citizen Participation and analyzing nine case studies, the paper develops a four-tier model for the Right to AI that situates the current paradigm and envisions an aspirational future. It proposes recommendations for inclusive data ownership, transparent design processes, and stakeholder-driven oversight. We also discuss market-led and state-centric alternatives and argue that participatory approaches offer a better balance between technical efficiency and democratic legitimacy.
MultiChallenge: A Realistic Multi-Turn Conversation Evaluation Benchmark Challenging to Frontier LLMs
Sirdeshmukh, Ved, Deshpande, Kaustubh, Mols, Johannes, Jin, Lifeng, Cardona, Ed-Yeremai, Lee, Dean, Kritz, Jeremy, Primack, Willow, Yue, Summer, Xing, Chen
We present MultiChallenge, a pioneering benchmark evaluating large language models (LLMs) on conducting multi-turn conversations with human users, a crucial yet underexamined capability for their applications. MultiChallenge identifies four categories of challenges in multi-turn conversations that are not only common and realistic among current human-LLM interactions, but are also challenging to all current frontier LLMs. All 4 challenges require accurate instruction-following, context allocation, and in-context reasoning at the same time. We also develop LLM as judge with instance-level rubrics to facilitate an automatic evaluation method with fair agreement with experienced human raters. Despite achieving near-perfect scores on existing multi-turn evaluation benchmarks, all frontier models have less than 50% accuracy on MultiChallenge, with the top-performing Claude 3.5 Sonnet (June 2024) achieving just a 41.4% average accuracy.
Poisoning Attacks and Defenses to Federated Unlearning
Wang, Wenbin, Ma, Qiwen, Zhang, Zifan, Liu, Yuchen, Liu, Zhuqing, Fang, Minghong
Federated learning allows multiple clients to collaboratively train a global model with the assistance of a server. However, its distributed nature makes it susceptible to poisoning attacks, where malicious clients can compromise the global model by sending harmful local model updates to the server. To unlearn an accurate global model from a poisoned one after identifying malicious clients, federated unlearning has been introduced. Yet, current research on federated unlearning has primarily concentrated on its effectiveness and efficiency, overlooking the security challenges it presents. In this work, we bridge the gap via proposing BadUnlearn, the first poisoning attacks targeting federated unlearning. In BadUnlearn, malicious clients send specifically designed local model updates to the server during the unlearning process, aiming to ensure that the resulting unlearned model remains poisoned. To mitigate these threats, we propose UnlearnGuard, a robust federated unlearning framework that is provably robust against both existing poisoning attacks and our BadUnlearn. The core concept of UnlearnGuard is for the server to estimate the clients' local model updates during the unlearning process and employ a filtering strategy to verify the accuracy of these estimations. Theoretically, we prove that the model unlearned through UnlearnGuard closely resembles one obtained by train-from-scratch. Empirically, we show that BadUnlearn can effectively corrupt existing federated unlearning methods, while UnlearnGuard remains secure against poisoning attacks.