Goto

Collaborating Authors

 Search


A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

Neural Information Processing Systems

Parametric policy search algorithms are one of the methods of choice for the optimisation of Markov Decision Processes, with Expectation Maximisation and natural gradient ascent being popular methods in this field. In this article we provide a unifying perspective of these two algorithms by showing that their searchdirections in the parameter space are closely related to the search-direction of an approximate Newton method. This analysis leads naturally to the consideration of this approximate Newton method as an alternative optimisation method for Markov Decision Processes. We are able to show that the algorithm has numerous desirable properties, absent in the naive application of Newton's method, that make it a viable alternative to either Expectation Maximisation or natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties, performing strongly in comparison to both Expectation Maximisation and natural gradient ascent.


How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal

Neural Information Processing Systems

We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor. In the Black-Scholes option pricing model from 1973, the Investor can continuously hedge the risk of an option by trading the underlying asset, assuming that the asset's price fluctuates according to Geometric Brownian Motion (GBM). We consider a worst-case model, in which Nature chooses a sequence of price fluctuations under a cumulative quadratic volatility constraint, and the Investor can make a sequence of hedging decisions. Our main result is to show that the value of our proposed game, which is the "regret" of hedging strategy, converges to the Black-Scholes option price. We use significantly weaker assumptions than previous work--for instance, we allow large jumps in the asset price--and show that the Black-Scholes hedging strategy is near-optimal for the Investor even in this non-stochastic framework.


fe709c654eac84d5239d1a12a4f71877-Reviews.html

Neural Information Processing Systems

The main idea is to sample several determinations of the system in the form of roll-out trees where each state/action pair has only one sampled successor. A combination of breadth-first and best-first search is used to explore the deterministic trees, and then they are recombined to create a stochastic model from which a policy can be calculated. The algorithm is proven to be consistent (as the number of trees and number of nodes in each tree both approach infinity, the value at the root can be arbitrarily approximated with high probability). The algorithm is empirically compared to an planning algorithm that requires a full transition model and performs well in comparison.


Learning to Prune in Metric and Non-Metric Spaces

Neural Information Processing Systems

Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and "stretching" of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality.


A Multi-population Integrated Approach for Capacitated Location Routing

arXiv.org Artificial Intelligence

The capacitated location-routing problem involves determining the depots from a set of candidate capacitated depot locations and finding the required routes from the selected depots to serve a set of customers whereas minimizing a cost function that includes the cost of opening the chosen depots, the fixed utilization cost per vehicle used, and the total cost (distance) of the routes. This paper presents a multi-population integrated framework in which a multi-depot edge assembly crossover generates promising offspring solutions from the perspective of both depot location and route edge assembly. The method includes an effective neighborhood-based local search, a feasibility-restoring procedure and a diversification-oriented mutation. Of particular interest is the multi-population scheme which organizes the population into multiple subpopulations based on depot configurations. Extensive experiments on 281 benchmark instances from the literature show that the algorithm performs remarkably well, by improving 101 best-known results (new upper bounds) and matching 84 best-known results. Additional experiments are presented to gain insight into the role of the key elements of the algorithm.


Surrogate Assisted Monte Carlo Tree Search in Combinatorial Optimization

arXiv.org Artificial Intelligence

Industries frequently adjust their facilities network by opening new branches in promising areas and closing branches in areas where they expect low profits. In this paper, we examine a particular class of facility location problems. Our objective is to minimize the loss of sales resulting from the removal of several retail stores. However, estimating sales accurately is expensive and time-consuming. To overcome this challenge, we leverage Monte Carlo Tree Search (MCTS) assisted by a surrogate model that computes evaluations faster. Results suggest that MCTS supported by a fast surrogate function can generate solutions faster while maintaining a consistent solution compared to MCTS that does not benefit from the surrogate function.


Counter-Samples: A Stateless Strategy to Neutralize Black Box Adversarial Attacks

arXiv.org Artificial Intelligence

Our paper presents a novel defence against black box attacks, where attackers use the victim model as an oracle to craft their adversarial examples. Unlike traditional preprocessing defences that rely on sanitizing input samples, our stateless strategy counters the attack process itself. For every query we evaluate a counter-sample instead, where the counter-sample is the original sample optimized against the attacker's objective. By countering every black box query with a targeted white box optimization, our strategy effectively introduces an asymmetry to the game to the defender's advantage. This defence not only effectively misleads the attacker's search for an adversarial example, it also preserves the model's accuracy on legitimate inputs and is generic to multiple types of attacks. We demonstrate that our approach is remarkably effective against state-of-the-art black box attacks and outperforms existing defences for both the CIFAR-10 and ImageNet datasets. Additionally, we also show that the proposed defence is robust against strong adversaries as well.


MMSR: Symbolic Regression is a Multimodal Task

arXiv.org Artificial Intelligence

Mathematical formulas are the crystallization of human wisdom in exploring the laws of nature for thousands of years. Describing the complex laws of nature with a concise mathematical formula is a constant pursuit of scientists and a great challenge for artificial intelligence. This field is called symbolic regression. Symbolic regression was originally formulated as a combinatorial optimization problem, and GP and reinforcement learning algorithms were used to solve it. However, GP is sensitive to hyperparameters, and these two types of algorithms are inefficient. To solve this problem, researchers treat the mapping from data to expressions as a translation problem. And the corresponding large-scale pre-trained model is introduced. However, the data and expression skeletons do not have very clear word correspondences as the two languages do. Instead, they are more like two modalities (e.g., image and text). Therefore, in this paper, we proposed MMSR. The SR problem is solved as a pure multimodal problem, and contrastive learning is also introduced in the training process for modal alignment to facilitate later modal feature fusion. It is worth noting that in order to better promote the modal feature fusion, we adopt the strategy of training contrastive learning loss and other losses at the same time, which only needs one-step training, instead of training contrastive learning loss first and then training other losses. Because our experiments prove training together can make the feature extraction module and feature fusion module running-in better. Experimental results show that compared with multiple large-scale pre-training baselines, MMSR achieves the most advanced results on multiple mainstream datasets including SRBench.


Efficient Combinatorial Optimization via Heat Diffusion

arXiv.org Machine Learning

Combinatorial optimization problems are widespread but inherently challenging due to their discrete nature.The primary limitation of existing methods is that they can only access a small fraction of the solution space at each iteration, resulting in limited efficiency for searching the global optimal. To overcome this challenge, diverging from conventional efforts of expanding the solver's search scope, we focus on enabling information to actively propagate to the solver through heat diffusion. By transforming the target function while preserving its optima, heat diffusion facilitates information flow from distant regions to the solver, providing more efficient navigation. Utilizing heat diffusion, we propose a framework for solving general combinatorial optimization problems. The proposed methodology demonstrates superior performance across a range of the most challenging and widely encountered combinatorial optimizations. Echoing recent advancements in harnessing thermodynamics for generative artificial intelligence, our study further reveals its significant potential in advancing combinatorial optimization.