Goto

Collaborating Authors

 Oceania


Objects matter: object-centric world models improve reinforcement learning in visually complex environments

arXiv.org Artificial Intelligence

Deep reinforcement learning has achieved remarkable success in learning control policies from pixels across a wide range of tasks, yet its application remains hindered by low sample efficiency, requiring significantly more environment interactions than humans to reach comparable performance. Model-based reinforcement learning (MBRL) offers a solution by leveraging learnt world models to generate simulated experience, thereby improving sample efficiency. However, in visually complex environments, small or dynamic elements can be critical for decision-making. Yet, traditional MBRL methods in pixel-based environments typically rely on auto-encoding with an $L_2$ loss, which is dominated by large areas and often fails to capture decision-relevant details. To address these limitations, we propose an object-centric MBRL pipeline, which integrates recent advances in computer vision to allow agents to focus on key decision-related elements. Our approach consists of four main steps: (1) annotating key objects related to rewards and goals with segmentation masks, (2) extracting object features using a pre-trained, frozen foundation vision model, (3) incorporating these object features with the raw observations to predict environmental dynamics, and (4) training the policy using imagined trajectories generated by this object-centric world model. Building on the efficient MBRL algorithm STORM, we call this pipeline OC-STORM. We demonstrate OC-STORM's practical value in overcoming the limitations of conventional MBRL approaches on both Atari games and the visually complex game Hollow Knight.


What is Formal Verification without Specifications? A Survey on mining LTL Specifications

arXiv.org Artificial Intelligence

Virtually all verification techniques using formal methods rely on the availability of a formal specification, which describes the design requirements precisely. However, formulating specifications remains a manual task that is notoriously challenging and error-prone. To address this bottleneck in formal verification, recent research has thus focussed on automatically generating specifications for formal verification from examples of (desired and undesired) system behavior. In this survey, we list and compare recent advances in mining specifications in Linear Temporal Logic (LTL), the de facto standard specification language for reactive systems. Several approaches have been designed for learning LTL formulas, which address different aspects and settings of specification design. Moreover, the approaches rely on a diverse range of techniques such as constraint solving, neural network training, enumerative search, etc. We survey the current state-of-the-art techniques and compare them for the convenience of the formal methods practitioners.


Distributionally Robust Graph Out-of-Distribution Recommendation via Diffusion Model

arXiv.org Machine Learning

The distributionally robust optimization (DRO)-based graph neural network methods improve recommendation systems' out-of-distribution (OOD) generalization by optimizing the model's worst-case performance. However, these studies fail to consider the impact of noisy samples in the training data, which results in diminished generalization capabilities and lower accuracy. Through experimental and theoretical analysis, this paper reveals that current DRO-based graph recommendation methods assign greater weight to noise distribution, leading to model parameter learning being dominated by it. When the model overly focuses on fitting noise samples in the training data, it may learn irrelevant or meaningless features that cannot be generalized to OOD data. To address this challenge, we design a Distributionally Robust Graph model for OOD recommendation (DRGO). Specifically, our method first employs a simple and effective diffusion paradigm to alleviate the noisy effect in the latent space. Additionally, an entropy regularization term is introduced in the DRO objective function to avoid extreme sample weights in the worst-case distribution. Finally, we provide a theoretical proof of the generalization error bound of DRGO as well as a theoretical analysis of how our approach mitigates noisy sample effects, which helps to better understand the proposed framework from a theoretical perspective. We conduct extensive experiments on four datasets to evaluate the effectiveness of our framework against three typical distribution shifts, and the results demonstrate its superiority in both independently and identically distributed distributions (IID) and OOD.


Kernels of Selfhood: GPT-4o shows humanlike patterns of cognitive consistency moderated by free choice

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have surprised the scientific community and even their creators by exhibiting emergent abilities once thought to be uniquely human, such as advanced cognition and reasoning (1-6), although the full extent of these accomplishments is debated (3, 7-10). These capabilities align with the rational and deliberative aspects of human nature, but humans are not purely rational creatures, and it is unclear whether LLMs will mimic a broader spectrum of human psychological tendencies. Here we test whether OpenAI's GPT-4o replicates behaviors associated with the human tendency toward cognitive consistency as well as human sensitivity to choice, characterized by greater attitude shifts when the behaviors inducing these changes are freely chosen. Decades of research demonstrate that humans will irrationally twist their attitudes to align with behaviors they were induced to perform. For example, consider an individual who opposes single-payer healthcare, but volunteers, in response to a request for help, to craft an argument in favor of the policy. Rationally, this individual's attitude toward single-payer healthcare should not move in a more supportive direction; they should be able to discriminate between their genuine attitude and the opposing one that they have articulated only to be helpful.


Physics-Trained Neural Network as Inverse Problem Solver for Potential Fields: An Example of Downward Continuation between Arbitrary Surfaces

arXiv.org Artificial Intelligence

We treat downward continuation as an inverse problem that relies on solving a forward problem defined by the formula for upward continuation, and we propose a new physics-trained deep neural network (DNN)-based solution for this task. We hard-code the upward continuation process into the DNN's learning framework, where the DNN itself learns to act as the inverse problem solver and can perform downward continuation without ever being shown any ground truth data. We test the proposed method on both synthetic magnetic data and real-world magnetic data from West Antarctica. The preliminary results demonstrate its effectiveness through comparison with selected benchmarks, opening future avenues for the combined use of DNNs and established geophysical theories to address broader potential field inverse problems, such as density and geometry modelling. Introduction Downward continuation of potential field, including gravity or magnetic field, refers to transferring the data from one observation surface to a lower surface that is closer to the source of the field. The goal is to enhance the resolution of the continued field and amplify the shallow geological signals. Airborne surveys are typically flown at uneven heights, making continuation from these surfaces a common requirement. Downward continuation is a critical task in the processing of potential field data, impacting the success of various downstream analyses, such as revealing the density structure and boundaries of anomalous bodies, especially for detecting and highlighting shallow anomalous sources. Many methods have been developed for the task of downward continuation (e.g.


CENSOR: Defense Against Gradient Inversion via Orthogonal Subspace Bayesian Sampling

arXiv.org Artificial Intelligence

Federated learning collaboratively trains a neural network on a global server, where each local client receives the current global model weights and sends back parameter updates (gradients) based on its local private data. The process of sending these model updates may leak client's private data information. Existing gradient inversion attacks can exploit this vulnerability to recover private training instances from a client's gradient vectors. Recently, researchers have proposed advanced gradient inversion techniques that existing defenses struggle to handle effectively. In this work, we present a novel defense tailored for large neural network models. Our defense capitalizes on the high dimensionality of the model parameters to perturb gradients within a subspace orthogonal to the original gradient. By leveraging cold posteriors over orthogonal subspaces, our defense implements a refined gradient update mechanism. This enables the selection of an optimal gradient that not only safeguards against gradient inversion attacks but also maintains model utility. We conduct comprehensive experiments across three different datasets and evaluate our defense against various state-of-the-art attacks and defenses. Code is available at https://censor-gradient.github.io.


Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?

arXiv.org Artificial Intelligence

Existing research primarily evaluates the values of LLMs by examining their stated inclinations towards specific values. However, the "Value-Action Gap," a phenomenon rooted in environmental and social psychology, reveals discrepancies between individuals' stated values and their actions in real-world contexts. To what extent do LLMs exhibit a similar gap between their stated values and their actions informed by those values? This study introduces ValueActionLens, an evaluation framework to assess the alignment between LLMs' stated values and their value-informed actions. The framework encompasses the generation of a dataset comprising 14.8k value-informed actions across twelve cultures and eleven social topics, and two tasks to evaluate how well LLMs' stated value inclinations and value-informed actions align across three different alignment measures. Extensive experiments reveal that the alignment between LLMs' stated values and actions is sub-optimal, varying significantly across scenarios and models. Analysis of misaligned results identifies potential harms from certain value-action gaps. To predict the value-action gaps, we also uncover that leveraging reasoned explanations improves performance. These findings underscore the risks of relying solely on the LLMs' stated values to predict their behaviors and emphasize the importance of context-aware evaluations of LLM values and value-action gaps.


Contextual Knowledge Sharing in Multi-Agent Reinforcement Learning with Decentralized Communication and Coordination

arXiv.org Artificial Intelligence

Decentralized Multi-Agent Reinforcement Learning (Dec-MARL) has emerged as a pivotal approach for addressing complex tasks in dynamic environments. Existing Multi-Agent Reinforcement Learning (MARL) methodologies typically assume a shared objective among agents and rely on centralized control. However, many real-world scenarios feature agents with individual goals and limited observability of other agents, complicating coordination and hindering adaptability. Existing Dec-MARL strategies prioritize either communication or coordination, lacking an integrated approach that leverages both. This paper presents a novel Dec-MARL framework that integrates peer-to-peer communication and coordination, incorporating goal-awareness and time-awareness into the agents' knowledge-sharing processes. Our framework equips agents with the ability to (i) share contextually relevant knowledge to assist other agents, and (ii) reason based on information acquired from multiple agents, while considering their own goals and the temporal context of prior knowledge. We evaluate our approach through several complex multi-agent tasks in environments with dynamically appearing obstacles. Our work demonstrates that incorporating goal-aware and time-aware knowledge sharing significantly enhances overall performance.


LLM-powered Multi-agent Framework for Goal-oriented Learning in Intelligent Tutoring System

arXiv.org Artificial Intelligence

Intelligent Tutoring Systems (ITSs) have revolutionized education by offering personalized learning experiences. However, as goal-oriented learning, which emphasizes efficiently achieving specific objectives, becomes increasingly important in professional contexts, existing ITSs often struggle to deliver this type of targeted learning experience. In this paper, we propose GenMentor, an LLM-powered multi-agent framework designed to deliver goal-oriented, personalized learning within ITS. GenMentor begins by accurately mapping learners' goals to required skills using a fine-tuned LLM trained on a custom goal-to-skill dataset. After identifying the skill gap, it schedules an efficient learning path using an evolving optimization approach, driven by a comprehensive and dynamic profile of learners' multifaceted status. Additionally, GenMentor tailors learning content with an exploration-drafting-integration mechanism to align with individual learner needs. Extensive automated and human evaluations demonstrate GenMentor's effectiveness in learning guidance and content quality. Furthermore, we have deployed it in practice and also implemented it as an application. Practical human study with professional learners further highlights its effectiveness in goal alignment and resource targeting, leading to enhanced personalization. Supplementary resources are available at https://github.com/GeminiLight/gen-mentor.


Refined climatologies of future precipitation over High Mountain Asia using probabilistic ensemble learning

arXiv.org Machine Learning

High Mountain Asia holds the largest concentration of frozen water outside the polar regions, serving as a crucial water source for more than 1.9 billion people. In the face of climate change, precipitation represents the largest source of uncertainty for hydrological modelling in this area. Future precipitation predictions remain challenging due to complex orography, lack of in situ hydrological observations, and limitations in climate model resolution and parametrisation for this region. To address the uncertainty posed by these challenges, climate models are often aggregated into multi-model ensembles. While multi-model ensembles are known to improve the predictive accuracy and analysis of future climate projections, consensus regarding how models are aggregated is lacking. In this study, we propose a probabilistic machine learning framework to systematically combine 13 regional climate models from the Coordinated Regional Downscaling Experiment (CORDEX) over High Mountain Asia. Our approach accounts for seasonal and spatial biases within the models, enabling the prediction of more faithful precipitation distributions. The framework is validated against gridded historical precipitation data and is used to generate projections for the near-future (2036-2065) and far-future (2066-2095) under RCP4.5 and RCP8.5 scenarios.