When problems are scaled to "big data," researchers must often come up with new solutions, leveraging ideas from multiple research areas--as we frequently witness in today's big data techniques and tools for machine learning, bioinformatics, and data visualization. Beyond these heavily studied topics, there exist other classes of general problems that must be rethought at scale. One such problem is that of large-scale signal reconstruction:4 taking a set of observations of relatively low dimensionality, and using them to reconstruct a high-dimensional, unknown signal. This class of problems arises when we can only observe a subset of a complex environment that we are seeking to model--for instance, placing a few sensors and using their readings to reconstruct an environment's temperature, or monitoring multiple points in a network and using the readings to estimate end-to-end network traffic, or using 2D slices to reconstruct a 3D image. The following paper is notable because it scalably addresses an underserved problem with practical impact, and does so in a clean, insightful, and systematic way. This signal reconstruction problem (SRP) is typically approached as an optimization task, in which we search for the high-dimensional signal that minimizes a loss function comparing it to the known properties of the signal.
Efficient Global Optimization (EGO) is the canonical form of Bayesian optimization that has been successfully applied to solve global optimization of expensive-to-evaluate black-box problems. However, EGO struggles to scale with dimension, and offers limited theoretical guarantees. In this work, we propose and analyze a trust-region-like EGO method (TREGO). TREGO alternates between regular EGO steps and local steps within a trust region. By following a classical scheme for the trust region (based on a sufficient decrease condition), we demonstrate that our algorithm enjoys strong global convergence properties, while departing from EGO only for a subset of optimization steps. Using extensive numerical experiments based on the well-known COCO benchmark, we first analyze the sensitivity of TREGO to its own parameters, then show that the resulting algorithm is consistently outperforming EGO and getting competitive with other state-of-the-art global optimization methods.
As a new generation of Public Bicycle-sharing Systems (PBS), the dockless PBS (DL-PBS) is an important application of cyber-physical systems and intelligent transportation. How to use AI to provide efficient bicycle dispatching solutions based on dynamic bicycle rental demand is an essential issue for DL-PBS. In this paper, we propose a dynamic bicycle dispatching algorithm based on multi-objective reinforcement learning (MORL-BD) to provide the optimal bicycle dispatching solution for DL-PBS. We model the DL-PBS system from the perspective of CPS and use deep learning to predict the layout of bicycle parking spots and the dynamic demand of bicycle dispatching. We define the multi-route bicycle dispatching problem as a multi-objective optimization problem by considering the optimization objectives of dispatching costs, dispatch truck's initial load, workload balance among the trucks, and the dynamic balance of bicycle supply and demand. On this basis, the collaborative multi-route bicycle dispatching problem among multiple dispatch trucks is modeled as a multi-agent MORL model. All dispatch paths between parking spots are defined as state spaces, and the reciprocal of dispatching costs is defined as a reward. Each dispatch truck is equipped with an agent to learn the optimal dispatch path in the dynamic DL-PBS network. We create an elite list to store the Pareto optimal solutions of bicycle dispatch paths found in each action, and finally, get the Pareto frontier. Experimental results on the actual DL-PBS systems show that compared with existing methods, MORL-BD can find a higher quality Pareto frontier with less execution time.
This paper proposes a novel machine-learning approach for predicting AC-OPF solutions that features a fast and scalable training. It is motivated by the two critical considerations: (1) the fact that topology optimization and the stochasticity induced by renewable energy sources may lead to fundamentally different AC-OPF instances; and (2) the significant training time needed by existing machine-learning approaches for predicting AC-OPF. The proposed approach is a 2-stage methodology that exploits a spatial decomposition of the power network that is viewed as a set of regions. The first stage learns to predict the flows and voltages on the buses and lines coupling the regions, and the second stage trains, in parallel, the machine-learning models for each region. Experimental results on the French transmission system (up to 6,700 buses and 9,000 lines) demonstrate the potential of the approach. Within a short training time, the approach predicts AC-OPF solutions with very high fidelity and minor constraint violations, producing significant improvements over the state-of-the-art. The results also show that the predictions can seed a load flow optimization to return a feasible solution within 0.03% of the AC-OPF objective, while reducing running times significantly.
Kinodynamic Motion Planning (KMP) is to find a robot motion subject to concurrent kinematics and dynamics constraints. To date, quite a few methods solve KMP problems and those that exist struggle to find near-optimal solutions and exhibit high computational complexity as the planning space dimensionality increases. To address these challenges, we present a scalable, imitation learning-based, Model-Predictive Motion Planning Networks framework that quickly finds near-optimal path solutions with worst-case theoretical guarantees under kinodynamic constraints for practical underactuated systems. Our framework introduces two algorithms built on a neural generator, discriminator, and a parallelizable Model Predictive Controller (MPC). The generator outputs various informed states towards the given target, and the discriminator selects the best possible subset from them for the extension. The MPC locally connects the selected informed states while satisfying the given constraints leading to feasible, near-optimal solutions. We evaluate our algorithms on a range of cluttered, kinodynamically constrained, and underactuated planning problems with results indicating significant improvements in computation times, path qualities, and success rates over existing methods.
Large scale machine learning and deep models are extremely data-hungry. Unfortunately, obtaining large amounts of labeled data is expensive, and training state-of-the-art models (with hyperparameter tuning) requires significant computing resources and time. Secondly, real-world data is noisy and imbalanced. As a result, several recent papers try to make the training process more efficient and robust. However, most existing work either focuses on robustness or efficiency, but not both. In this work, we introduce Glister, a GeneraLIzation based data Subset selecTion for Efficient and Robust learning framework. We formulate Glister as a mixed discrete-continuous bi-level optimization problem to select a subset of the training data, which maximizes the log-likelihood on a held-out validation set. Next, we propose an iterative online algorithm Glister-Online, which performs data selection iteratively along with the parameter updates and can be applied to any loss-based learning algorithm. We then show that for a rich class of loss functions including cross-entropy, hinge-loss, squared-loss, and logistic-loss, the inner discrete data selection is an instance of (weakly) submodular optimization, and we analyze conditions for which Glister-Online reduces the validation loss and converges. Finally, we propose Glister-Active, an extension to batch active learning, and we empirically demonstrate the performance of Glister on a wide range of tasks including, (a) data selection to reduce training time, (b) robust learning under label noise and imbalance settings, and (c) batch-active learning with several deep and shallow models. We show that our framework improves upon state of the art both in efficiency and accuracy (in cases (a) and (c)) and is more efficient compared to other state-of-the-art robust learning algorithms in case (b).
In the continual effort to improve product quality and decrease operations costs, computational modeling is increasingly being deployed to determine feasibility of product designs or configurations. Surrogate modeling of these computer experiments via local models, which induce sparsity by only considering short range interactions, can tackle huge analyses of complicated input-output relationships. However, narrowing focus to local scale means that global trends must be re-learned over and over again. In this article, we propose a framework for incorporating information from a global sensitivity analysis into the surrogate model as an input rotation and rescaling preprocessing step. We discuss the relationship between several sensitivity analysis methods based on kernel regression before describing how they give rise to a transformation of the input variables. Specifically, we perform an input warping such that the "warped simulator" is equally sensitive to all input directions, freeing local models to focus on local dynamics. Numerical experiments on observational data and benchmark test functions, including a high-dimensional computer simulator from the automotive industry, provide empirical validation.
We propose Preferential MoE, a novel human-ML mixture-of-experts model that augments human expertise in decision making with a data-based classifier only when necessary for predictive performance. Our model exhibits an interpretable gating function that provides information on when human rules should be followed or avoided. The gating function is maximized for using human-based rules, and classification errors are minimized. We propose solving a coupled multi-objective problem with convex subproblems. We develop approximate algorithms and study their performance and convergence. Finally, we demonstrate the utility of Preferential MoE on two clinical applications for the treatment of Human Immunodeficiency Virus (HIV) and management of Major Depressive Disorder (MDD).
Bayesian optimization is a popular method for optimizing expensive black-box functions. The objective functions of hard real world problems are oftentimes characterized by a fluctuated landscape of many local optima. Bayesian optimization risks in over-exploiting such traps, remaining with insufficient query budget for exploring the global landscape. We introduce Coordinate Backoff Bayesian optimization (CobBO) to alleviate those challenges. CobBO captures a smooth approximation of the global landscape by interpolating the values of queried points projected to randomly selected promising coordinate subspaces. Thus also a smaller query budget is required for the Gaussian process regressions applied over the lower dimensional subspaces. This approach can be viewed as a variant of coordinate ascent, tailored for Bayesian optimization, using a stopping rule for backing off from a certain subspace and switching to another coordinate subset. Additionally, adaptive trust regions are dynamically formed to expedite the convergence, and stagnant local optima are escaped by switching trust regions. Further smoothness and acceleration are achieved by filtering out clustered queried points. Through comprehensive evaluations over a wide spectrum of benchmarks, CobBO is shown to consistently find comparable or better solutions, with a reduced trial complexity compared to the state-of-the-art methods in both low and high dimensions.
This paper proposes a discrete knowledge graph (KG) embedding (DKGE) method, which projects KG entities and relations into the Hamming space based on a computationally tractable discrete optimization algorithm, to solve the formidable storage and computation cost challenges in traditional continuous graph embedding methods. The convergence of DKGE can be guaranteed theoretically. Extensive experiments demonstrate that DKGE achieves superior accuracy than classical hashing functions that map the effective continuous embeddings into discrete codes. Besides, DKGE reaches comparable accuracy with much lower computational complexity and storage compared to many continuous graph embedding methods.