Goto

Collaborating Authors

 Wang, Siyu


Policy-Guided Causal State Representation for Offline Reinforcement Learning Recommendation

arXiv.org Artificial Intelligence

In offline reinforcement learning-based recommender systems (RLRS), learning effective state representations is crucial for capturing user preferences that directly impact long-term rewards. However, raw state representations often contain high-dimensional, noisy information and components that are not causally relevant to the reward. Additionally, missing transitions in offline data make it challenging to accurately identify features that are most relevant to user satisfaction. To address these challenges, we propose Policy-Guided Causal Representation (PGCR), a novel two-stage framework for causal feature selection and state representation learning in offline RLRS. In the first stage, we learn a causal feature selection policy that generates modified states by isolating and retaining only the causally relevant components (CRCs) while altering irrelevant components. This policy is guided by a reward function based on the Wasserstein distance, which measures the causal effect of state components on the reward and encourages the preservation of CRCs that directly influence user interests. In the second stage, we train an encoder to learn compact state representations by minimizing the mean squared error (MSE) loss between the latent representations of the original and modified states, ensuring that the representations focus on CRCs. We provide a theoretical analysis proving the identifiability of causal effects from interventions, validating the ability of PGCR to isolate critical state components for decision-making. Extensive experiments demonstrate that PGCR significantly improves recommendation performance, confirming its effectiveness for offline RL-based recommender systems.


CAD-GPT: Synthesising CAD Construction Sequence with Spatial Reasoning-Enhanced Multimodal LLMs

arXiv.org Artificial Intelligence

Computer-aided design (CAD) significantly enhances the efficiency, accuracy, and innovation of design processes by enabling precise 2D and 3D modeling, extensive analysis, and optimization. Existing methods for creating CAD models rely on latent vectors or point clouds, which are difficult to obtain and costly to store. Recent advances in Multimodal Large Language Models (MLLMs) have inspired researchers to use natural language instructions and images for CAD model construction. However, these models still struggle with inferring accurate 3D spatial location and orientation, leading to inaccuracies in determining the spatial 3D starting points and extrusion directions for constructing geometries. This work introduces CAD-GPT, a CAD synthesis method with spatial reasoning-enhanced MLLM that takes either a single image or a textual description as input. To achieve precise spatial inference, our approach introduces a 3D Modeling Spatial Mechanism. This method maps 3D spatial positions and 3D sketch plane rotation angles into a 1D linguistic feature space using a specialized spatial unfolding mechanism, while discretizing 2D sketch coordinates into an appropriate planar space to enable precise determination of spatial starting position, sketch orientation, and 2D sketch coordinate translations. Extensive experiments demonstrate that CAD-GPT consistently outperforms existing state-of-the-art methods in CAD model synthesis, both quantitatively and qualitatively.


Action-Attentive Deep Reinforcement Learning for Autonomous Alignment of Beamlines

arXiv.org Artificial Intelligence

Synchrotron radiation sources play a crucial role in fields such as materials science, biology, and chemistry. The beamline, a key subsystem of the synchrotron, modulates and directs the radiation to the sample for analysis. However, the alignment of beamlines is a complex and time-consuming process, primarily carried out manually by experienced engineers. Even minor misalignments in optical components can significantly affect the beam's properties, leading to suboptimal experimental outcomes. Current automated methods, such as bayesian optimization (BO) and reinforcement learning (RL), although these methods enhance performance, limitations remain. The relationship between the current and target beam properties, crucial for determining the adjustment, is not fully considered. Additionally, the physical characteristics of optical elements are overlooked, such as the need to adjust specific devices to control the output beam's spot size or position. This paper addresses the alignment of beamlines by modeling it as a Markov Decision Process (MDP) and training an intelligent agent using RL. The agent calculates adjustment values based on the current and target beam states, executes actions, and iterates until optimal parameters are achieved. A policy network with action attention is designed to improve decision-making by considering both state differences and the impact of optical components. Experiments on two simulated beamlines demonstrate that our algorithm outperforms existing methods, with ablation studies highlighting the effectiveness of the action attention-based policy network.


FoLDTree: A ULDA-Based Decision Tree Framework for Efficient Oblique Splits and Feature Selection

arXiv.org Machine Learning

Traditional decision trees are limited by axis-orthogonal splits, which can perform poorly when true decision boundaries are oblique. While oblique decision tree methods address this limitation, they often face high computational costs, difficulties with multi-class classification, and a lack of effective feature selection. In this paper, we introduce LDATree and FoLDTree, two novel frameworks that integrate Uncorrelated Linear Discriminant Analysis (ULDA) and Forward ULDA into a decision tree structure. These methods enable efficient oblique splits, handle missing values, support feature selection, and provide both class labels and probabilities as model outputs. Through evaluations on simulated and real-world datasets, LDATree and FoLDTree consistently outperform axis-orthogonal and other oblique decision tree methods, achieving accuracy levels comparable to the random forest.


On Causally Disentangled State Representation Learning for Reinforcement Learning based Recommender Systems

arXiv.org Artificial Intelligence

In Reinforcement Learning-based Recommender Systems (RLRS), the complexity and dynamism of user interactions often result in high-dimensional and noisy state spaces, making it challenging to discern which aspects of the state are truly influential in driving the decision-making process. This issue is exacerbated by the evolving nature of user preferences and behaviors, requiring the recommender system to adaptively focus on the most relevant information for decision-making while preserving generaliability. To tackle this problem, we introduce an innovative causal approach for decomposing the state and extracting \textbf{C}ausal-\textbf{I}n\textbf{D}ispensable \textbf{S}tate Representations (CIDS) in RLRS. Our method concentrates on identifying the \textbf{D}irectly \textbf{A}ction-\textbf{I}nfluenced \textbf{S}tate Variables (DAIS) and \textbf{A}ction-\textbf{I}nfluence \textbf{A}ncestors (AIA), which are essential for making effective recommendations. By leveraging conditional mutual information, we develop a framework that not only discerns the causal relationships within the generative process but also isolates critical state variables from the typically dense and high-dimensional state representations. We provide theoretical evidence for the identifiability of these variables. Then, by making use of the identified causal relationship, we construct causal-indispensable state representations, enabling the training of policies over a more advantageous subset of the agent's state space. We demonstrate the efficacy of our approach through extensive experiments, showcasing our method outperforms state-of-the-art methods.


CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models

arXiv.org Artificial Intelligence

Multimodal large language models (MLLMs) have demonstrated promising results in a variety of tasks that combine vision and language. As these models become more integral to research and applications, conducting comprehensive evaluations of their capabilities has grown increasingly important. However, most existing benchmarks fail to consider that, in certain situations, images need to be interpreted within a broader context. In this work, we introduce a new benchmark, named as CODIS, designed to assess the ability of models to use context provided in free-form text to enhance visual comprehension. Our findings indicate that MLLMs consistently fall short of human performance on this benchmark. Further analysis confirms that these models struggle to effectively extract and utilize contextual information to improve their understanding of images. This underscores the pressing need to enhance the ability of MLLMs to comprehend visuals in a context-dependent manner. View our project website at https://thunlp-mt.github.io/CODIS.


Agent Hospital: A Simulacrum of Hospital with Evolvable Medical Agents

arXiv.org Artificial Intelligence

In this paper, we introduce a simulacrum of hospital called Agent Hospital that simulates the entire process of treating illness. All patients, nurses, and doctors are autonomous agents powered by large language models (LLMs). Our central goal is to enable a doctor agent to learn how to treat illness within the simulacrum. To do so, we propose a method called MedAgent-Zero. As the simulacrum can simulate disease onset and progression based on knowledge bases and LLMs, doctor agents can keep accumulating experience from both successful and unsuccessful cases. Simulation experiments show that the treatment performance of doctor agents consistently improves on various tasks. More interestingly, the knowledge the doctor agents have acquired in Agent Hospital is applicable to real-world medicare benchmarks. After treating around ten thousand patients (real-world doctors may take over two years), the evolved doctor agent achieves a state-of-the-art accuracy of 93.06% on a subset of the MedQA dataset that covers major respiratory diseases. This work paves the way for advancing the applications of LLM-powered agent techniques in medical scenarios.


Retentive Decision Transformer with Adaptive Masking for Reinforcement Learning based Recommendation Systems

arXiv.org Artificial Intelligence

Reinforcement Learning-based Recommender Systems (RLRS) have shown promise across a spectrum of applications, from e-commerce platforms to streaming services. Yet, they grapple with challenges, notably in crafting reward functions and harnessing large pre-existing datasets within the RL framework. Recent advancements in offline RLRS provide a solution for how to address these two challenges. However, existing methods mainly rely on the transformer architecture, which, as sequence lengths increase, can introduce challenges associated with computational resources and training costs. Additionally, the prevalent methods employ fixed-length input trajectories, restricting their capacity to capture evolving user preferences. In this study, we introduce a new offline RLRS method to deal with the above problems. We reinterpret the RLRS challenge by modeling sequential decision-making as an inference task, leveraging adaptive masking configurations. This adaptive approach selectively masks input tokens, transforming the recommendation task into an inference challenge based on varying token subsets, thereby enhancing the agent's ability to infer across diverse trajectory lengths. Furthermore, we incorporate a multi-scale segmented retention mechanism that facilitates efficient modeling of long sequences, significantly enhancing computational efficiency. Our experimental analysis, conducted on both online simulator and offline datasets, clearly demonstrates the advantages of our proposed method.


Uncertainty-aware Distributional Offline Reinforcement Learning

arXiv.org Artificial Intelligence

Offline reinforcement learning (RL) presents distinct challenges as it relies solely on observational data. A central concern in this context is ensuring the safety of the learned policy by quantifying uncertainties associated with various actions and environmental stochasticity. Traditional approaches primarily emphasize mitigating epistemic uncertainty by learning risk-averse policies, often overlooking environmental stochasticity. In this study, we propose an uncertainty-aware distributional offline RL method to simultaneously address both epistemic uncertainty and environmental stochasticity. We propose a model-free offline RL algorithm capable of learning risk-averse policies and characterizing the entire distribution of discounted cumulative rewards, as opposed to merely maximizing the expected value of accumulated discounted returns. Our method is rigorously evaluated through comprehensive experiments in both risk-sensitive and risk-neutral benchmarks, demonstrating its superior performance.


Computation Offloading for Multi-server Multi-access Edge Vehicular Networks: A DDQN-based Method

arXiv.org Artificial Intelligence

Abstract--In this paper, we investigate a multi-user offloading problem in the overlapping domain of a multi-server mobile edge computing system. We divide the original problem into two stages: the offloading decision making stage and the request scheduling stage. To prevent the terminal from going out of service area during offloading, we consider the mobility parameter of the terminal according to the human behaviour model when making the offloading decision, and then introduce a server evaluation mechanism based on both the mobility parameter and the server load to select the optimal offloading server. In order to fully utilise the server resources, we design a double deep Q-network (DDQN)-based reward evaluation algorithm that considers the priority of tasks when scheduling offload requests. The authors of [3] proposed an effective task scheduling algorithm based on dynamic priority, which significantly reduced With the development of Multi-access Edge Computing task completion time and improved QoS. In [4], the authors (MEC) technology, MEC servers are moving closer to the proposed a hybrid task offloading scheme based on deep reinforcement terminal devices (TDs), which can be served more efficiently learning that achieved vehicle-to-edge and vehicleto-vehicle as the transmission latency is greatly reduced [1].