Goto

Collaborating Authors

 Optimization


Learning to shortcut and shortlist order fulfillment deciding

arXiv.org Artificial Intelligence

With the increase of order fulfillment options and business objectives taken into consideration in the deciding process, order fulfillment deciding is becoming more and more complex. For example, with the advent of ship from store retailers now have many more fulfillment nodes to consider, and it is now common to take into account many and varied business goals in making fulfillment decisions. With increasing complexity, efficiency of the deciding process can become a real concern. Finding the optimal fulfillment assignments among all possible ones may be too costly to do for every order especially during peak times. In this work, we explore the possibility of exploiting regularity in the fulfillment decision process to reduce the burden on the deciding system. By using data mining we aim to find patterns in past fulfillment decisions that can be used to efficiently predict most likely assignments for future decisions. Essentially, those assignments that can be predicted with high confidence can be used to shortcut, or bypass, the expensive deciding process, or else a set of most likely assignments can be used for shortlisting -- sending a much smaller set of candidates for consideration by the fulfillment deciding system.


Optimal Placement of Roadside Infrastructure Sensors towards Safer Autonomous Vehicle Deployments

arXiv.org Artificial Intelligence

Vehicles with driving automation are increasingly being developed for deployment across the world. However, the onboard sensing and perception capabilities of such automated or autonomous vehicles (AV) may not be sufficient to ensure safety under all scenarios and contexts. Infrastructure-augmented environment perception using roadside infrastructure sensors can be considered as an effective solution, at least for selected regions of interest such as urban road intersections or curved roads that present occlusions to the AV. However, they incur significant costs for procurement, installation and maintenance. Therefore these sensors must be placed strategically and optimally to yield maximum benefits in terms of the overall safety of road users. In this paper, we propose a novel methodology towards obtaining an optimal placement of V2X (Vehicle-to-everything) infrastructure sensors, which is particularly attractive to urban AV deployments, with various considerations including costs, coverage and redundancy. We combine the latest advances made in raycasting and linear optimization literature to deliver a tool for urban city planners, traffic analysis and AV deployment operators. Through experimental evaluation in representative environments, we prove the benefits and practicality of our approach.


Causality and Generalizability: Identifiability and Learning Methods

arXiv.org Machine Learning

This PhD thesis contains several contributions to the field of statistical causal modeling. Statistical causal models are statistical models embedded with causal assumptions that allow for the inference and reasoning about the behavior of stochastic systems affected by external manipulation (interventions). This thesis contributes to the research areas concerning the estimation of causal effects, causal structure learning, and distributionally robust (out-of-distribution generalizing) prediction methods. We present novel and consistent linear and non-linear causal effects estimators in instrumental variable settings that employ data-dependent mean squared prediction error regularization. Our proposed estimators show, in certain settings, mean squared error improvements compared to both canonical and state-of-the-art estimators. We show that recent research on distributionally robust prediction methods has connections to well-studied estimators from econometrics. This connection leads us to prove that general K-class estimators possess distributional robustness properties. We, furthermore, propose a general framework for distributional robustness with respect to intervention-induced distributions. In this framework, we derive sufficient conditions for the identifiability of distributionally robust prediction methods and present impossibility results that show the necessity of several of these conditions. We present a new structure learning method applicable in additive noise models with directed trees as causal graphs. We prove consistency in a vanishing identifiability setup and provide a method for testing substructure hypotheses with asymptotic family-wise error control that remains valid post-selection. Finally, we present heuristic ideas for learning summary graphs of nonlinear time-series models.


Efficiency, Fairness, and Stability in Non-Commercial Peer-to-Peer Ridesharing

arXiv.org Artificial Intelligence

Unlike commercial ridesharing, non-commercial peer-to-peer (P2P) ridesharing has been subject to limited research -- although it can promote viable solutions in non-urban communities. This paper focuses on the core problem in P2P ridesharing: the matching of riders and drivers. We elevate users' preferences as a first-order concern and introduce novel notions of fairness and stability in P2P ridesharing. We propose algorithms for efficient matching while considering user-centric factors, including users' preferred departure time, fairness, and stability. Results suggest that fair and stable solutions can be obtained in reasonable computational times and can improve baseline outcomes based on system-wide efficiency exclusively.


Artificial intelligence for Sustainable Energy: A Contextual Topic Modeling and Content Analysis

arXiv.org Artificial Intelligence

Parallel to the rising debates over sustainable energy and artificial intelligence solutions, the world is currently discussing the ethics of artificial intelligence and its possible negative effects on society and the environment. In these arguments, sustainable AI is proposed, which aims at advancing the pathway toward sustainability, such as sustainable energy. In this paper, we offered a novel contextual topic modeling combining LDA, BERT, and Clustering. We then combined these computational analyses with content analysis of related scientific publications to identify the main scholarly topics, sub-themes, and cross-topic themes within scientific research on sustainable AI in energy. Our research identified eight dominant topics including sustainable buildings, AI-based DSSs for urban water management, climate artificial intelligence, Agriculture 4, the convergence of AI with IoT, AI-based evaluation of renewable technologies, smart campus and engineering education, and AI-based optimization. We then recommended 14 potential future research strands based on the observed theoretical gaps. Theoretically, this analysis contributes to the existing literature on sustainable AI and sustainable energy, and practically, it intends to act as a general guide for energy engineers and scientists, AI scientists, and social scientists to widen their knowledge of sustainability in AI and energy convergence research.


Surrogate-Based Black-Box Optimization Method for Costly Molecular Properties

arXiv.org Artificial Intelligence

AI-assisted molecular optimization is a very active research field as it is expected to provide the next-generation drugs and molecular materials. An important difficulty is that the properties to be optimized rely on costly evaluations. Machine learning methods are investigated with success to predict these properties, but show generalization issues on less known areas of the chemical space. We propose here a surrogate-based black box optimization method, to tackle jointly the optimization and machine learning problems. It consists in optimizing the expected improvement of the surrogate of a molecular property using an evolutionary algorithm. The surrogate is defined as a Gaussian Process Regression (GPR) model, learned on a relevant area of the search space with respect to the property to be optimized. We show that our approach can successfully optimize a costly property of interest much faster than a purely metaheuristic approach.


New Evolutionary Computation Models and their Applications to Machine Learning

arXiv.org Artificial Intelligence

Automatic Programming is one of the most important areas of computer science research today. Hardware speed and capability have increased exponentially, but the software is years behind. The demand for software has also increased significantly, but it is still written in old fashion: by using humans. There are multiple problems when the work is done by humans: cost, time, quality. It is costly to pay humans, it is hard to keep them satisfied for a long time, it takes a lot of time to teach and train them and the quality of their output is in most cases low (in software, mostly due to bugs). The real advances in human civilization appeared during the industrial revolutions. Before the first revolution, most people worked in agriculture. Today, very few percent of people work in this field. A similar revolution must appear in the computer programming field. Otherwise, we will have so many people working in this field as we had in the past working in agriculture. How do people know how to write computer programs? Very simple: by learning. Can we do the same for software? Can we put the software to learn how to write software? It seems that is possible (to some degree) and the term is called Machine Learning. It was first coined in 1959 by the first person who made a computer perform a serious learning task, namely, Arthur Samuel. However, things are not so easy as in humans (well, truth to be said - for some humans it is impossible to learn how to write software). So far we do not have software that can learn perfectly to write software. We have some particular cases where some programs do better than humans, but the examples are sporadic at best. Learning from experience is difficult for computer programs. Instead of trying to simulate how humans teach humans how to write computer programs, we can simulate nature.


Bilevel stochastic methods for optimization and machine learning: Bilevel stochastic descent and DARTS

arXiv.org Machine Learning

Two-level stochastic optimization formulations have become instrumental in a number of machine learning contexts such as neural architecture search, continual learning, adversarial learning, and hyperparameter tuning. Practical stochastic bilevel optimization problems become challenging in optimization or learning scenarios where the number of variables is high or there are constraints. The goal of this paper is twofold. First, we aim at promoting the use of bilevel optimization in large-scale learning and we introduce a practical bilevel stochastic gradient method (BSG-1) that requires neither lower level second-order derivatives nor system solves (and dismisses any matrix-vector products). Our BSG-1 method is close to first-order principles, which allows it to achieve a performance better than those that are not, such as DARTS. Second, we develop bilevel stochastic gradient descent for bilevel problems with lower level constraints, and we introduce a convergence theory that covers the unconstrained and constrained cases and abstracts as much as possible from the specifics of the bilevel gradient calculation.


Simulated annealing for optimization of graphs and sequences

arXiv.org Artificial Intelligence

Optimization of discrete structures aims at generating a new structure with the better property given an existing one, which is a fundamental problem in machine learning. Different from the continuous optimization, the realistic applications of discrete optimization (e.g., text generation) are very challenging due to the complex and long-range constraints, including both syntax and semantics, in discrete structures. In this work, we present SAGS, a novel Simulated Annealing framework for Graph and Sequence optimization. The key idea is to integrate powerful neural networks into metaheuristics (e.g., simulated annealing, SA) to restrict the search space in discrete optimization. We start by defining a sophisticated objective function, involving the property of interest and pre-defined constraints (e.g., grammar validity). SAGS searches from the discrete space towards this objective by performing a sequence of local edits, where deep generative neural networks propose the editing content and thus can control the quality of editing. We evaluate SAGS on paraphrase generation and molecule generation for sequence optimization and graph optimization, respectively. Extensive results show that our approach achieves state-of-the-art performance compared with existing paraphrase generation methods in terms of both automatic and human evaluations. Further, SAGS also significantly outperforms all the previous methods in molecule generation.


Sparse Quadratic Optimisation over the Stiefel Manifold with Application to Permutation Synchronisation

arXiv.org Machine Learning

We address the non-convex optimisation problem of finding a sparse matrix on the Stiefel manifold (matrices with mutually orthogonal columns of unit length) that maximises (or minimises) a quadratic objective function. Optimisation problems on the Stiefel manifold occur for example in spectral relaxations of various combinatorial problems, such as graph matching, clustering, or permutation synchronisation. Although sparsity is a desirable property in such settings, it is mostly neglected in spectral formulations since existing solvers, e.g. based on eigenvalue decomposition, are unable to account for sparsity while at the same time maintaining global optimality guarantees. We fill this gap and propose a simple yet effective sparsity-promoting modification of the Orthogonal Iteration algorithm for finding the dominant eigenspace of a matrix. By doing so, we can guarantee that our method finds a Stiefel matrix that is globally optimal with respect to the quadratic objective function, while in addition being sparse. As a motivating application we consider the task of permutation synchronisation, which can be understood as a constrained clustering problem that has particular relevance for matching multiple images or 3D shapes in computer vision, computer graphics, and beyond. We demonstrate that the proposed approach outperforms previous methods in this domain.