Song, Rui
NukesFormers: Unpaired Hyperspectral Image Generation with Non-Uniform Domain Alignment
Li, Jiaojiao, Duan, Shiyao, XU, Haitao, Song, Rui
The inherent difficulty in acquiring accurately co-registered RGB-hyperspectral image (HSI) pairs has significantly impeded the practical deployment of current data-driven Hyperspectral Image Generation (HIG) networks in engineering applications. Gleichzeitig, the ill-posed nature of the aligning constraints, compounded with the complexities of mining cross-domain features, also hinders the advancement of unpaired HIG (UnHIG) tasks. In this paper, we conquer these challenges by modeling the UnHIG to range space interaction and compensations of null space through Range-Null Space Decomposition (RND) methodology. Specifically, the introduced contrastive learning effectively aligns the geometric and spectral distributions of unpaired data by building the interaction of range space, considering the consistent feature in degradation process. Following this, we map the frequency representations of dual-domain input and thoroughly mining the null space, like degraded and high-frequency components, through the proposed Non-uniform Kolmogorov-Arnold Networks. Extensive comparative experiments demonstrate that it establishes a new benchmark in UnHIG.
A Review of Causal Decision Making
Ge, Lin, Cai, Hengrui, Wan, Runzhe, Xu, Yang, Song, Rui
To make effective decisions, it is important to have a thorough understanding of the causal relationships among actions, environments, and outcomes. This review aims to surface three crucial aspects of decision-making through a causal lens: 1) the discovery of causal relationships through causal structure learning, 2) understanding the impacts of these relationships through causal effect learning, and 3) applying the knowledge gained from the first two aspects to support decision making via causal policy learning. Moreover, we identify challenges that hinder the broader utilization of causal decision-making and discuss recent advances in overcoming these challenges. Finally, we provide future research directions to address these challenges and to further enhance the implementation of causal decision-making in practice, with real-world applications illustrated based on the proposed causal decision-making. We aim to offer a comprehensive methodology and practical implementation framework by consolidating various methods in this area into a Python-based collection. URL: https://causaldm.github.io/Causal-Decision-Making.
Dynamic Causal Structure Discovery and Causal Effect Estimation
Wang, Jianian, Song, Rui
To represent the causal relationships between variables, a directed acyclic graph (DAG) is widely utilized in many areas, such as social sciences, epidemics, and genetics. Many causal structure learning approaches are developed to learn the hidden causal structure utilizing deep-learning approaches. However, these approaches have a hidden assumption that the causal relationship remains unchanged over time, which may not hold in real life. In this paper, we develop a new framework to model the dynamic causal graph where the causal relations are allowed to be time-varying. We incorporate the basis approximation method into the score-based causal discovery approach to capture the dynamic pattern of the causal graphs. Utilizing the autoregressive model structure, we could capture both contemporaneous and time-lagged causal relationships while allowing them to vary with time. We propose an algorithm that could provide both past-time estimates and future-time predictions on the causal graphs, and conduct simulations to demonstrate the usefulness of the proposed method. We also apply the proposed method for the covid-data analysis, and provide causal estimates on how policy restriction's effect changes.
ROLO-SLAM: Rotation-Optimized LiDAR-Only SLAM in Uneven Terrain with Ground Vehicle
Wang, Yinchuan, Ren, Bin, Zhang, Xiang, Wang, Pengyu, Wang, Chaoqun, Song, Rui, Li, Yibin, Meng, Max Q. -H.
LiDAR-based SLAM is recognized as one effective method to offer localization guidance in rough environments. However, off-the-shelf LiDAR-based SLAM methods suffer from significant pose estimation drifts, particularly components relevant to the vertical direction, when passing to uneven terrains. This deficiency typically leads to a conspicuously distorted global map. In this article, a LiDAR-based SLAM method is presented to improve the accuracy of pose estimations for ground vehicles in rough terrains, which is termed Rotation-Optimized LiDAR-Only (ROLO) SLAM. The method exploits a forward location prediction to coarsely eliminate the location difference of consecutive scans, thereby enabling separate and accurate determination of the location and orientation at the front-end. Furthermore, we adopt a parallel-capable spatial voxelization for correspondence-matching. We develop a spherical alignment-guided rotation registration within each voxel to estimate the rotation of vehicle. By incorporating geometric alignment, we introduce the motion constraint into the optimization formulation to enhance the rapid and effective estimation of LiDAR's translation. Subsequently, we extract several keyframes to construct the submap and exploit an alignment from the current scan to the submap for precise pose estimation. Meanwhile, a global-scale factor graph is established to aid in the reduction of cumulative errors. In various scenes, diverse experiments have been conducted to evaluate our method. The results demonstrate that ROLO-SLAM excels in pose estimation of ground vehicles and outperforms existing state-of-the-art LiDAR SLAM frameworks.
Shortcut Learning in In-Context Learning: A Survey
Song, Rui, Li, Yingji, Shi, Lida, Giunchiglia, Fausto, Xu, Hao
Shortcut learning refers to the phenomenon where models employ simple, non-robust decision rules in practical tasks, which hinders their generalization and robustness. With the rapid development of large language models (LLMs) in recent years, an increasing number of studies have shown the impact of shortcut learning on LLMs. This paper provides a novel perspective to review relevant research on shortcut learning in In-Context Learning (ICL). It conducts a detailed exploration of the types of shortcuts in ICL tasks, their causes, available benchmarks, and strategies for mitigating shortcuts. Based on corresponding observations, it summarizes the unresolved issues in existing research and attempts to outline the future research landscape of shortcut learning.
A Review of Reinforcement Learning in Financial Applications
Bai, Yahui, Gao, Yuhe, Wan, Runzhe, Zhang, Sheng, Song, Rui
A financial market is a marketplace where financial instruments such as stocks and bonds are bought and sold (Fama 1970). Individuals and organizations can play crucial roles in financial markets to facilitate the allocation of capital. Market participants face diverse challenges, such as portfolio management, which aims to maximize investment returns over time, and market-making, which seeks to profit from the bid-ask spread while managing inventory risk. As the volume of financial data has increased dramatically over time, new opportunities and challenges have arisen in the analysis process, leading to the increased adoption of advanced Machine Learning (ML) models. Reinforcement Learning (RL)(Sutton & Barto 2018), as one of the main categories of ML, has revolutionized the field of artificial intelligence by empowering agents to interact with the environment and allowing them to learn and improve their performance. The success of RL has been demonstrated in various fields, including games, robots, mobile health (Nash Jr 1950, Kalman 1960, Murphy 2003), etc. In finance, applications such as market making, portfolio management, and order execution can benefit from the ability of RL algorithms to learn and adapt to changing environments. Compared to traditional models that rely on statistical techniques and econometric methods such as time series models (ARMA, ARIMA), factor models, and panel models, the RL framework empowers agents to learn decision-making by interacting with an environment and deducing the consequences of past actions to maximize cumulative rewards (Charpentier et al. 2021).
Online Posterior Sampling with a Diffusion Prior
Kveton, Branislav, Oreshkin, Boris, Park, Youngsuk, Deshmukh, Aniket, Song, Rui
Posterior sampling in contextual bandits with a Gaussian prior can be implemented exactly or approximately using the Laplace approximation. The Gaussian prior is computationally efficient but it cannot describe complex distributions. In this work, we propose approximate posterior sampling algorithms for contextual bandits with a diffusion model prior. The key idea is to sample from a chain of approximate conditional posteriors, one for each stage of the reverse process, which are estimated in a closed form using the Laplace approximation. Our approximations are motivated by posterior sampling with a Gaussian prior, and inherit its simplicity and efficiency. They are asymptotically consistent and perform well empirically on a variety of contextual bandit problems.
Self-supervised Preference Optimization: Enhance Your Language Model with Preference Degree Awareness
Li, Jian, Huang, Haojing, Zhang, Yujia, Xu, Pengfei, Chen, Xi, Song, Rui, Shi, Lida, Wang, Jingwen, Xu, Hao
Recently, there has been significant interest in replacing the reward model in Reinforcement Learning with Human Feedback (RLHF) methods for Large Language Models (LLMs), such as Direct Preference Optimization (DPO) and its variants. These approaches commonly use a binary cross-entropy mechanism on pairwise samples, i.e., minimizing and maximizing the loss based on preferred or dis-preferred responses, respectively. However, while this training strategy omits the reward model, it also overlooks the varying preference degrees within different responses. We hypothesize that this is a key factor hindering LLMs from sufficiently understanding human preferences. To address this problem, we propose a novel Self-supervised Preference Optimization (SPO) framework, which constructs a self-supervised preference degree loss combined with the alignment loss, thereby helping LLMs improve their ability to understand the degree of preference. Extensive experiments are conducted on two widely used datasets of different tasks. The results demonstrate that SPO can be seamlessly integrated with existing preference optimization methods and significantly boost their performance to achieve state-of-the-art performance. We also conduct detailed analyses to offer comprehensive insights into SPO, which verifies its effectiveness. The code is available at https://github.com/lijian16/SPO.
Linear Contextual Bandits with Interference
Xu, Yang, Lu, Wenbin, Song, Rui
Interference, a key concept in causal inference, extends the reward modeling process by accounting for the impact of one unit's actions on the rewards of others. In contextual bandit (CB) settings, where multiple units are present in the same round, potential interference can significantly affect the estimation of expected rewards for different arms, thereby influencing the decision-making process. Although some prior work has explored multi-agent and adversarial bandits in interference-aware settings, the effect of interference in CB, as well as the underlying theory, remains significantly underexplored. In this paper, we introduce a systematic framework to address interference in Linear CB (LinCB), bridging the gap between causal inference and online decision-making. We propose a series of algorithms that explicitly quantify the interference effect in the reward modeling process and provide comprehensive theoretical guarantees, including sublinear regret bounds, finite sample upper bounds, and asymptotic properties. The effectiveness of our approach is demonstrated through simulations and a synthetic data generated based on MovieLens data.
History-Aware Planning for Risk-free Autonomous Navigation on Unknown Uneven Terrain
Wang, Yinchuan, Du, Nianfei, Qin, Yongsen, Zhang, Xiang, Song, Rui, Wang, Chaoqun
It is challenging for the mobile robot to achieve autonomous and mapless navigation in the unknown environment with uneven terrain. In this study, we present a layered and systematic pipeline. At the local level, we maintain a tree structure that is dynamically extended with the navigation. This structure unifies the planning with the terrain identification. Besides, it contributes to explicitly identifying the hazardous areas on uneven terrain. In particular, certain nodes of the tree are consistently kept to form a sparse graph at the global level, which records the history of the exploration. A series of subgoals that can be obtained in the tree and the graph are utilized for leading the navigation. To determine a subgoal, we develop an evaluation method whose input elements can be efficiently obtained on the layered structure. We conduct both simulation and real-world experiments to evaluate the developed method and its key modules. The experimental results demonstrate the effectiveness and efficiency of our method. The robot can travel through the unknown uneven region safely and reach the target rapidly without a preconstructed map.