Goto

Collaborating Authors

 Optimization


Metagame Autobalancing for Competitive Multiplayer Games

arXiv.org Artificial Intelligence

Automated game balancing has often focused on single-agent scenarios. In this paper we present a tool for balancing multi-player games during game design. Our approach requires a designer to construct an intuitive graphical representation of their meta-game target, representing the relative scores that high-level strategies (or decks, or character types) should experience. This permits more sophisticated balance targets to be defined beyond a simple requirement of equal win chances. We then find a parameterization of the game that meets this target using simulation-based optimization to minimize the distance to the target graph. We show the capabilities of this tool on examples inheriting from Rock-Paper-Scissors, and on a more complex asymmetric fighting game.


Integer Programming for Multi-Robot Planning: A Column Generation Approach

arXiv.org Artificial Intelligence

In this paper, we tackle multi-robot planning (MRP), which aims to route a fleet of robots in a warehouse so as to achieve the maximum reward in a limited amount of time, while not having the robots collide and obeying the constraints of individual robots. In MRP, individual robots may make multiple trips over a given time window and may carry multiple items on each trip. We optimize the efficiency of the warehouse, not the makespan, since we expect new orders to be continuously added. Our contributions are that (1) we adapt the integer linear programming (ILP) formulation and column generation (CG) approach for (prize collecting) vehicle routing (Desrochers et al. 1992, Stenger et al. 2013) to MRP and (2) adapt the seminal work of (Boland et al. 2017) to permit efficient optimization by avoiding consideration of every time increment. Routing problems for a fleet of robots in a warehouse are often treated as Multi-Agent Pathfinding problems (MAPF) (Stern et al. 2019).


Exact and heuristic methods for the discrete parallel machine scheduling location problem

arXiv.org Artificial Intelligence

Scheduling and facility location represent two classes of well-studied combinatorial optimization problems. The main motivation for studying them relies on the broad range of applications (e.g., in public services, industry, logistics, project management, production planning, data processing, etc.), as well as on the challenge in providing efficient solutions, since many of these problems are classified as NPhard (see, e.g., Pinedo 2009, Pinedo 2016, Drezner and Hamacher 2002, and Laporte et al. 2015). Since the 1960s, many works on these topics have been published, but only a few of them has focused on studying these problems in an integrated fashion. Due to the limited capacity of the computers of two decades ago, it was usual to solve integrated combinatorial optimization problems using sequential approaches, i.e., solving each problem separately in such a way that the solution of one represents an input to the other. However, this strategy does not guarantee the optimality of the overall solution and, in addition, the input solutions may not be feasible for the successor problems. With the recent advances in technology, especially in the computational field, solving integrated combinatorial optimization problems using integrated approaches is becoming more accessible. In this context, the ScheLoc problem combines the job scheduling and facility location in a single and integrated problem.


Procrustean Orthogonal Sparse Hashing

arXiv.org Machine Learning

Hashing is one of the most popular methods for similarity search because of its speed and efficiency. Dense binary hashing is prevalent in the literature. Recently, insect olfaction was shown to be structurally and functionally analogous to sparse hashing [6]. Here, we prove that this biological mechanism is the solution to a well-posed optimization problem. Furthermore, we show that orthogonality increases the accuracy of sparse hashing. Next, we present a novel method, Procrustean Orthogonal Sparse Hashing (POSH), that unifies these findings, learning an orthogonal transform from training data compatible with the sparse hashing mechanism. We provide theoretical evidence of the shortcomings of Optimal Sparse Lifting (OSL) [22] and BioHash [30], two related olfaction-inspired methods, and propose two new methods, Binary OSL and SphericalHash, to address these deficiencies. We compare POSH, Binary OSL, and SphericalHash to several state-of-the-art hashing methods and provide empirical results for the superiority of the proposed methods across a wide range of standard benchmarks and parameter settings.


Tightening Exploration in Upper Confidence Reinforcement Learning

arXiv.org Machine Learning

The upper confidence reinforcement learning (UCRL2) strategy introduced in (Jaksch et al., 2010) is a popular method to perform regret minimization in unknown discrete Markov Decision Processes under the average-reward criterion. Despite its nice and generic theoretical regret guarantees, this strategy and its variants have remained until now mostly theoretical as numerical experiments on simple environments exhibit long burn-in phases before the learning takes place. Motivated by practical efficiency, we present UCRL3, following the lines of UCRL2, but with two key modifications: First, it uses state-of-the-art time-uniform concentration inequalities, to compute confidence sets on the reward and transition distributions for each state-action pair. To further tighten exploration, we introduce an adaptive computation of the support of each transition distributions. This enables to revisit the extended value iteration procedure to optimize over distributions with reduced support by disregarding low probability transitions, while still ensuring near-optimism. We demonstrate, through numerical experiments on standard environments, that reducing exploration this way yields a substantial numerical improvement compared to UCRL2 and its variants. On the theoretical side, these key modifications enable to derive a regret bound for UCRL3 improving on UCRL2, that for the first time makes appear a notion of local diameter and effective support, thanks to variance-aware concentration bounds.


Learning pose variations within shape population by constrained mixtures of factor analyzers

arXiv.org Machine Learning

Mining and learning the shape variability of underlying population has benefited the applications including parametric shape modeling, 3D animation, and image segmentation. The current statistical shape modeling method works well on learning unstructured shape variations without obvious pose changes (relative rotations of the body parts). Studying the pose variations within a shape population involves segmenting the shapes into different articulated parts and learning the transformations of the segmented parts. This paper formulates the pose learning problem as mixtures of factor analyzers. The segmentation is obtained by components posterior probabilities and the rotations in pose variations are learned by the factor loading matrices. To guarantee that the factor loading matrices are composed by rotation matrices, constraints are imposed and the corresponding closed form optimal solution is derived. Based on the proposed method, the pose variations are automatically learned from the given shape populations. The method is applied in motion animation where new poses are generated by interpolating the existing poses in the training set. The obtained results are smooth and realistic.


Tuning a variational autoencoder for data accountability problem in the Mars Science Laboratory ground data system

arXiv.org Machine Learning

The Mars Curiosity rover is frequently sending back engineering and science data that goes through a pipeline of systems before reaching its final destination at the mission operations center making it prone to volume loss and data corruption. A ground data system analysis (GDSA) team is charged with the monitoring of this flow of information and the detection of anomalies in that data in order to request a re-transmission when necessary. This work presents $\Delta$-MADS, a derivative-free optimization method applied for tuning the architecture and hyperparameters of a variational autoencoder trained to detect the data with missing patches in order to assist the GDSA team in their mission.


Frank-Wolfe optimization for deep networks

arXiv.org Machine Learning

Deep neural networks is today one of the most popular choices in classification, regression and function approximation. However, the training of such deep networks is far from trivial as there are often millions of parameters to tune. Typically, one use some optimization method that hopefully converges towards some minimum. The most popular and successful methods are based on gradient descent. In this paper, another optimization method, Frank-Wolfe optimization, is applied to a small deep network and compared to gradient descent. Although the optimization does converge, it does so slowly and not close to the speed of gradient descent. Further, in a stochastic setting, the optimization becomes very unstable and does not seem to converge unless one uses a line search approach.


SONIA: A Symmetric Blockwise Truncated Optimization Algorithm

arXiv.org Machine Learning

This work presents a new algorithm for empirical risk minimization. The algorithm bridges the gap between first- and second-order methods by computing a search direction that uses a second-order-type update in one subspace, coupled with a scaled steepest descent step in the orthogonal complement. To this end, partial curvature information is incorporated to help with ill-conditioning, while simultaneously allowing the algorithm to scale to the large problem dimensions often encountered in machine learning applications. Theoretical results are presented to confirm that the algorithm converges to a stationary point in both the strongly convex and nonconvex cases. A stochastic variant of the algorithm is also presented, along with corresponding theoretical guarantees. Numerical results confirm the strengths of the new approach on standard machine learning problems.


Constrained Reinforcement Learning for Dynamic Optimization under Uncertainty

arXiv.org Machine Learning

Dynamic real-time optimization (DRTO) is a challenging task due to the fact that optimal operating conditions must be computed in real time. The main bottleneck in the industrial application of DRTO is the presence of uncertainty. Many stochastic systems present the following obstacles: 1) plant-model mismatch, 2) process disturbances, 3) risks in violation of process constraints. To accommodate these difficulties, we present a constrained reinforcement learning (RL) based approach. RL naturally handles the process uncertainty by computing an optimal feedback policy. However, no state constraints can be introduced intuitively. To address this problem, we present a chance-constrained RL methodology. We use chance constraints to guarantee the probabilistic satisfaction of process constraints, which is accomplished by introducing backoffs, such that the optimal policy and backoffs are computed simultaneously. Backoffs are adjusted using the empirical cumulative distribution function to guarantee the satisfaction of a joint chance constraint. The advantage and performance of this strategy are illustrated through a stochastic dynamic bioprocess optimization problem, to produce sustainable high-value bioproducts.