Optimization
On Kernel Derivative Approximation with Random Fourier Features
Szabo, Zoltan, Sriperumbudur, Bharath K.
Random Fourier features (RFF) represent one of the most popular and wide-spread techniques in machine learning to scale up kernel algorithms. Despite the numerous successful applications of RFFs, unfortunately, quite little is understood theoretically on their optimality and limitations of their performance. To the best of our knowledge, the only existing areas where precise statistical-computational trade-offs have been established are approximation of kernel values, kernel ridge regression, and kernel principal component analysis. Our goal is to spark the investigation of optimality of RFF-based approximations in tasks involving not only function values but derivatives, which naturally lead to optimization problems with kernel derivatives. Particularly, in this paper, we focus on the approximation quality of RFFs for kernel derivatives and prove that the existing finite-sample guarantees can be improved exponentially in terms of the domain where they hold, using recent tools from unbounded empirical process theory. Our result implies that the same approximation guarantee is achievable for kernel derivatives using RFF as for kernel values.
how_optimization_works_1.html
Optimization is a fancy word for "finding the best way." We can see how it works if we take a closer look at drinking tea. There is a best temperature for tea. If your tea is too hot, it will scald your tongue and you won't be able to taste anything for three days. There is a sweet spot in the middle where it is comfortably hot, warming you from the inside out all the way down your throat and radiating through your belly.
Learning-based Application-Agnostic 3D NoC Design for Heterogeneous Manycore Systems
Joardar, Biresh Kumar, Kim, Ryan Gary, Doppa, Janardhan Rao, Pande, Partha Pratim, Marculescu, Diana, Marculescu, Radu
The rising use of deep learning and other big-data algorithms has led to an increasing demand for hardware platforms that are computationally powerful, yet energy-efficient. Due to the amount of data parallelism in these algorithms, high-performance 3D manycore platforms that incorporate both CPUs and GPUs present a promising direction. However, as systems use heterogeneity (e.g., a combination of CPUs, GPUs, and accelerators) to improve performance and efficiency, it becomes more pertinent to address the distinct and likely conflicting communication requirements (e.g., CPU memory access latency or GPU network throughput) that arise from such heterogeneity. Unfortunately, it is difficult to quickly explore the hardware design space and choose appropriate tradeoffs between these heterogeneous requirements. To address these challenges, we propose the design of a 3D Network-on-Chip (NoC) for heterogeneous manycore platforms that considers the appropriate design objectives for a 3D heterogeneous system and explores various tradeoffs using an efficient ML-based multi-objective optimization technique. The proposed design space exploration considers the various requirements of its heterogeneous components and generates a set of 3D NoC architectures that efficiently trades off these design objectives. Our findings show that by jointly considering these requirements (latency, throughput, temperature, and energy), we can achieve 9.6% better Energy-Delay Product on average at nearly iso-temperature conditions when compared to a thermally-optimized design for 3D heterogeneous NoCs. More importantly, our results suggest that our 3D NoCs optimized for a few applications can be generalized for unknown applications as well. Our results show that these generalized 3D NoCs only incur a 1.8% (36-tile system) and 1.1% (64-tile system) average performance loss compared to application-specific NoCs.
Using machine learning and optimization to improve refugee integration
IMAGE: Andrew Trapp, an associate professor in the Foisie Business School at Worcester Polytechnic Institute (WPI), and PhD student Narges Ahani are working on an NSF-funded grant to develop software to... view more Built upon ongoing work with an international team of computer scientists and economists, the tool integrates machine learning and optimization algorithms, along with complex computation of data, to match refugees to communities where they will find appropriate resources, including employment opportunities. "There is a great deal of information to consider when helping a refugee begin a new life in the United States," said Trapp, associate professor in the Foisie School and lead investigator for the project. "It is a labor-intensive process whose ultimate goal is to help a refugee land in a place where he or she has as many opportunities as possible to successfully integrate and contribute to the community. Technological solutions can have a profound societal impact." Each year, tens of thousands of refugees--many fleeing war, violence, and persecution--are resettled in dozens of host countries around the world.
Gradient Agreement as an Optimization Objective for Meta-Learning
Eshratifar, Amir Erfan, Eigen, David, Pedram, Massoud
This paper presents a novel optimization method for maximizing generalization over tasks in meta-learning. The goal of meta-learning is to learn a model for an agent adapting rapidly when presented with previously unseen tasks. Tasks are sampled from a specific distribution which is assumed to be similar for both seen and unseen tasks. We focus on a family of meta-learning methods learning initial parameters of a base model which can be fine-tuned quickly on a new task, by few gradient steps (MAML). Our approach is based on pushing the parameters of the model to a direction in which tasks have more agreement upon. If the gradients of a task agree with the parameters update vector, then their inner product will be a large positive value. As a result, given a batch of tasks to be optimized for, we associate a positive (negative) weight to the loss function of a task, if the inner product between its gradients and the average of the gradients of all tasks in the batch is a positive (negative) value. Therefore, the degree of the contribution of a task to the parameter updates is controlled by introducing a set of weights on the loss function of the tasks. Our method can be easily integrated with the current meta-learning algorithms for neural networks. Our experiments demonstrate that it yields models with better generalization compared to MAML and Reptile.
Approximate Dynamic Programming for Planning a Ride-Sharing System using Autonomous Fleets of Electric Vehicles
Al-Kanj, Lina, Nascimento, Juliana, Powell, Warren B.
Within a decade, almost every major auto company, along with fleet operators such as Uber, have announced plans to put autonomous vehicles on the road. At the same time, electric vehicles are quickly emerging as a next-generation technology that is cost effective, in addition to offering the benefits of reducing the carbon footprint. The combination of a centrally managed fleet of driverless vehicles, along with the operating characteristics of electric vehicles, is creating a transformative new technology that offers significant cost savings with high service levels. This problem involves a dispatch problem for assigning riders to cars, a planning problem for deciding on the fleet size, and a surge pricing problem for deciding on the price per trip. In this work, we propose to use approximate dynamic programming to develop high-quality operational dispatch strategies to determine which car (given the battery level) is best for a particular trip (considering its length and destination), when a car should be recharged, and when it should be re-positioned to a different zone which offers a higher density of trips. We then discuss surge pricing using an adaptive learning approach to decide on the price for each trip. Finally, we discuss the fleet size problem which depends on the previous two problems.
A Variable Neighborhood Search for Flying Sidekick Traveling Salesman Problem
Freitas, Julia C., Penna, Puca Huachi V.
The efficiency and dynamism of Unmanned Aerial Vehicles (UAVs), or drones, present substantial application opportunities in several industries in the last years. Notably, the logistic companies gave close attention to these vehicles envisioning reduce delivery time and operational cost. A variant of the Traveling Salesman Problem (TSP) called Flying Sidekick Traveling Salesman Problem (FSTSP) was introduced involving drone-assisted parcel delivery. The drone is launched from the truck, proceeds to deliver parcels to a customer and then is recovered by the truck in a third location. While the drone travels through a trip, the truck delivers parcels to other customers as long as the drone has enough battery to hover waiting for the truck. This work proposes a hybrid heuristic that the initial solution is created from the optimal TSP solution reached by a TSP solver. Next, an implementation of the General Variable Neighborhood Search is used to obtain the delivery routes of truck and drone. Computational experiments show the potential of the algorithm to improve the delivery time significantly. Furthermore, we provide a new set of instances based on well-known TSPLIB instances.
ProMP: Proximal Meta-Policy Search
Rothfuss, Jonas, Lee, Dennis, Clavera, Ignasi, Asfour, Tamim, Abbeel, Pieter
Credit assignment in Meta-reinforcement learning (Meta-RL) is still poorly understood. Existing methods either neglect credit assignment to pre-adaptation behavior or implement it naively. This leads to poor sample-efficiency during meta-training as well as ineffective task identification strategies. This paper provides a theoretical analysis of credit assignment in gradient-based Meta-RL. Building on the gained insights we develop a novel meta-learning algorithm that overcomes both the issue of poor credit assignment and previous difficulties in estimating meta-policy gradients. By controlling the statistical distance of both pre-adaptation and adapted policies during meta-policy search, the proposed algorithm endows efficient and stable meta-learning. Our approach leads to superior pre-adaptation policy behavior and consistently outperforms previous Meta-RL algorithms in sample-efficiency, wall-clock time, and asymptotic performance.
On Statistical Learning of Simplices: Unmixing Problem Revisited
Najafi, Amir, Ilchi, Saeed, Saberi, Amir H., Motahari, Abolfazl, Khalaj, Babak H., Rabiee, Hamid R.
Learning of high-dimensional simplices from uniformly-sampled observations, generally known as the "unmixing problem", is a long-studied task in computer science. More recently, a significant interest is focused on this problem from other areas, such as computational biology and remote sensing. In this paper, we have studied the Probably Approximately Correct (PAC)-learnability of simplices with a focus on sample complexity. Our analysis shows that a sufficient sample size for PAC-learning of $K$-simplices is only $O\left(K^2\log K\right)$, yielding a huge improvement over the existing results, i.e. $O\left(K^{22}\right)$. Moreover, a novel continuously-relaxed optimization scheme is proposed which is guaranteed to achieve a PAC-approximation of the simplex, followed by a corresponding scalable algorithm whose performance is extensively tested on synthetic and real-world datasets. Experimental results show that not only being comparable to other existing strategies on noiseless samples, our method is superior to the state-of-the-art in noisy cases. The overall proposed framework is backed with solid theoretical guarantees and provides a rigorous framework for future research in this area.
A Proximal Zeroth-Order Algorithm for Nonconvex Nonsmooth Problems
Abstract-- In this paper, we focus on solving an important class of nonconvex optimization problems which includes many problems for example signal processing over a networked multi-agent system and distributed learning over networks. Motivated by many applications in which the local objective function is the sum of smooth but possibly nonconvex part, and non-smooth but convex part subject to a linear equality constraint, this paper proposes a proximal zeroth-order primal dual algorithm (PZO-PDA) that accounts for the information structure of the problem. This algorithm only utilize the zeroth-order information (i.e., the functional values) of smooth functions, yet the flexibility is achieved for applications that only noisy information of the objective function is accessible, where classical methods cannot be applied. We prove convergence and rate of convergence for PZO-PDA. Numerical experiments are provided to validate the theoretical results.