Goto

Collaborating Authors

 Chen, Xiaowei


MPPO: Multi Pair-wise Preference Optimization for LLMs with Arbitrary Negative Samples

arXiv.org Artificial Intelligence

Aligning Large Language Models (LLMs) with human feedback is crucial for their development. Existing preference optimization methods such as DPO and KTO, while improved based on Reinforcement Learning from Human Feedback (RLHF), are inherently derived from PPO, requiring a reference model that adds GPU memory resources and relies heavily on abundant preference data. Meanwhile, current preference optimization research mainly targets single-question scenarios with two replies, neglecting optimization with multiple replies, which leads to a waste of data in the application. This study introduces the MPPO algorithm, which leverages the average likelihood of model responses to fit the reward function and maximizes the utilization of preference data. Through a comparison of Point-wise, Pair-wise, and List-wise implementations, we found that the Pair-wise approach achieves the best performance, significantly enhancing the quality of model responses. Experimental results demonstrate MPPO's outstanding performance across various benchmarks. On MT-Bench, MPPO outperforms DPO, ORPO, and SimPO. Notably, on Arena-Hard, MPPO surpasses DPO and ORPO by substantial margins. These achievements underscore the remarkable advantages of MPPO in preference optimization tasks.


Pricing Catastrophe Bonds -- A Probabilistic Machine Learning Approach

arXiv.org Artificial Intelligence

Catastrophe (CAT) bonds have become increasingly vital in managing and transferring catastrophic risk. These bonds offer a source of capital to cover losses arising from natural disasters, allowing investors to diversify their portfolios while helping issuers mitigate potentially devastating financial consequences. Understanding the pricing dynamics of CAT bonds is essential, both for investors seeking informed decisions and for issuers optimizing their risk management strategies. This paper introduces a probabilistic machine-learning-based predictive framework for the pricing of CAT bonds, aiming to enhance empirical pricing accuracy and discover previously undetected nonlinear dependence between the key risk factors and CAT bond spreads. Early research by Lane (2000) laid the groundwork for CAT bond pricing literature, proposing a log-linear regression model employing conditional expected loss and probability of first loss as predictors. Subsequent studies expanded on this linear framework, incorporating additional predictors and examining pricing under diverse conditions. Gรผrtler et al. (2016) incorporated bond characteristics like trigger type and bond rating, while Braun (2016) integrated market condition indices, such as the Lane Synthetic Rate on Line index and the BB corporate bond spread. Gรถtze and Gรผrtler (2020a) explored sponsor-related pricing inefficiencies across different market conditions, and Morana and Sbrana (2019) focused on the impact of climate change on CAT bond returns. Further extending the research scope, Zhao and Yu (2020) utilized actual catastrophe data to forecast CAT bond prices using market-based methods, Braun et al. (2022) developed factor pricing models for cross-sectional CAT bond returns, and Herrmann and Hibbeln (2023) investigated liquidity premiums in the secondary market.


Distance and Hop-wise Structures Encoding Enhanced Graph Attention Networks

arXiv.org Artificial Intelligence

Many works have proven that existing neighbor-averaging Graph Neural Networks cannot efficiently catch structure information, such GNNs cannot even catch degree features in some cases. The reason is intuitive: as the neighbor-averaging GNNs can only combine neighbor's feature vectors for every node, if the neighbor's feature vectors contains no structure information, the hop-wise neighbor-averaging GNNs can only catch degree information at best([1];[2];[3]). So, as an intuitive idea, injecting structure information into feature vectors may improve the performance of GNNs. Numerous works have shown that injecting structure, distance, position or spatial information can significantly improve performance of neighbor-averaging GNNs([4];[5];[6];[7];[8];[9];[10]). However, existing works have their problems. Some of them has very high computation complexity which can not apply to large-scale graph(MotifNet[4]). Some of them simply concatenate structure information with intrinsic feature vector (ID-GNN[6]; P-GNN[8]; DE-GNN[9]), which may confuse the signals of different feature. For example, in ogbn-arxiv dataset, the intrinsic feature is semantic embedding of headline or abstract, which provides total different signal with structure information. Some of them are graph-level-task oriented and only deal with small graph(Graphormer[7]; SubGNN[10]).


Multi-layered Network Exploration via Random Walks: From Offline Optimization to Online Learning

arXiv.org Artificial Intelligence

Multi-layered network exploration (MuLaNE) problem is an important problem abstracted from many applications. In MuLaNE, there are multiple network layers where each node has an importance weight and each layer is explored by a random walk. The MuLaNE task is to allocate total random walk budget $B$ into each network layer so that the total weights of the unique nodes visited by random walks are maximized. We systematically study this problem from offline optimization to online learning. For the offline optimization setting where the network structure and node weights are known, we provide greedy based constant-ratio approximation algorithms for overlapping networks, and greedy or dynamic-programming based optimal solutions for non-overlapping networks. For the online learning setting, neither the network structure nor the node weights are known initially. We adapt the combinatorial multi-armed bandit framework and design algorithms to learn random walk related parameters and node weights while optimizing the budget allocation in multiple rounds, and prove that they achieve logarithmic regret bounds. Finally, we conduct experiments on a real-world social network dataset to validate our theoretical results.


Community Exploration: From Offline Optimization to Online Learning

Neural Information Processing Systems

We introduce the community exploration problem that has various real-world applications such as online advertising. In the problem, an explorer allocates limited budget to explore communities so as to maximize the number of members he could meet. We provide a systematic study of the community exploration problem, from offline optimization to online learning. For the offline setting where the sizes of communities are known, we prove that the greedy methods for both of non-adaptive exploration and adaptive exploration are optimal. For the online setting where the sizes of communities are not known and need to be learned from the multi-round explorations, we propose an ``upper confidence'' like algorithm that achieves the logarithmic regret bounds. By combining the feedback from different rounds, we can achieve a constant regret bound.


Community Exploration: From Offline Optimization to Online Learning

Neural Information Processing Systems

We introduce the community exploration problem that has various real-world applications such as online advertising. In the problem, an explorer allocates limited budget to explore communities so as to maximize the number of members he could meet. We provide a systematic study of the community exploration problem, from offline optimization to online learning. For the offline setting where the sizes of communities are known, we prove that the greedy methods for both of non-adaptive exploration and adaptive exploration are optimal. For the online setting where the sizes of communities are not known and need to be learned from the multi-round explorations, we propose an ``upper confidence'' like algorithm that achieves the logarithmic regret bounds. By combining the feedback from different rounds, we can achieve a constant regret bound.


Community Exploration: From Offline Optimization to Online Learning

arXiv.org Machine Learning

We introduce the community exploration problem that has many real-world applications such as online advertising. In the problem, an explorer allocates limited budget to explore communities so as to maximize the number of members he could meet. We provide a systematic study of the community exploration problem, from offline optimization to online learning. For the offline setting where the sizes of communities are known, we prove that the greedy methods for both of non-adaptive exploration and adaptive exploration are optimal. For the online setting where the sizes of communities are not known and need to be learned from the multi-round explorations, we propose an `upper confidence' like algorithm that achieves the logarithmic regret bounds. By combining the feedback from different rounds, we can achieve a constant regret bound.