Goto

Collaborating Authors

 Optimization


Fast Moving Natural Evolution Strategy for High-Dimensional Problems

arXiv.org Machine Learning

In this work, we propose a new variant of natural evolution strategies (NES) for high-dimensional black-box optimization problems. The proposed method, CR-FM-NES, extends a recently proposed state-of-the-art NES, Fast Moving Natural Evolution Strategy (FM-NES), in order to be applicable in high-dimensional problems. CR-FM-NES builds on an idea using a restricted representation of a covariance matrix instead of using a full covariance matrix, while inheriting an efficiency of FM-NES. The restricted representation of the covariance matrix enables CR-FM-NES to update parameters of a multivariate normal distribution in linear time and space complexity, which can be applied to high-dimensional problems. Our experimental results reveal that CR-FM-NES does not lose the efficiency of FM-NES, and on the contrary, CR-FM-NES has achieved significant speedup compared to FM-NES on some benchmark problems. Furthermore, our numerical experiments using 200, 600, and 1000-dimensional benchmark problems demonstrate that CR-FM-NES is effective over scalable baseline methods, VD-CMA and Sep-CMA.


DiGamma: Domain-aware Genetic Algorithm for HW-Mapping Co-optimization for DNN Accelerators

arXiv.org Artificial Intelligence

The design of DNN accelerators includes two key parts: HW resource configuration and mapping strategy. Intensive research has been conducted to optimize each of them independently. Unfortunately, optimizing for both together is extremely challenging due to the extremely large cross-coupled search space. To address this, in this paper, we propose a HW-Mapping co-optimization framework, an efficient encoding of the immense design space constructed by HW and Mapping, and a domain-aware genetic algorithm, named DiGamma, with specialized operators for improving search efficiency. We evaluate DiGamma with seven popular DNNs models with different properties. Our evaluations show DiGamma can achieve (geomean) 3.0x and 10.0x speedup, comparing to the best-performing baseline optimization algorithms, in edge and cloud settings.


DNNFuser: Generative Pre-Trained Transformer as a Generalized Mapper for Layer Fusion in DNN Accelerators

arXiv.org Artificial Intelligence

Dataflow/mapping decides the compute and energy efficiency of DNN accelerators. Many mappers have been proposed to tackle the intra-layer map-space. However, mappers for inter-layer map-space (aka layer-fusion map-space), have been rarely discussed. In this work, we propose a mapper, DNNFuser, specifically focusing on this layer-fusion map-space. While existing SOTA DNN mapping explorations rely on search-based mappers, this is the first work, to the best of our knowledge, to propose a one-shot inference-based mapper. We leverage a famous language model GPT as our DNN architecture to learn layer-fusion optimization as a sequence modeling problem. Further, the trained DNNFuser can generalize its knowledge and infer new solutions for unseen conditions. Within one inference pass, DNNFuser can infer solutions with compatible performance to the ones found by a highly optimized search-based mapper while being 66x-127x faster.


Minimax Demographic Group Fairness in Federated Learning

arXiv.org Artificial Intelligence

Machine learning models are being increasingly adopted to make decisions in a range of domains, such as finance, insurance, medical diagnosis, recruitment, and many more [2]. Therefore, we are often confronted with the need - sometimes imposed by regulatory bodies - to ensure that such machine learning models do not lead to decisions that discriminate individuals from a certain demographic group. The development of machine learning models that are fair across different (demographic) groups has been well studied in traditional learning setups where there is a single entity responsible for learning a model based on a local dataset holding data from individuals of the various groups. However, there are settings where the data representing different demographic groups is spread across multiple entities rather than concentrated on a single entity/server. For example, consider a scenario where various hospitals wish to learn a diagnostic machine learning model that is fair (or performs reasonably well) across different demographic groups but each hospital may only contain training data from certain groups because - in view of its geo-location - it serves predominantly individuals of a given demographic [5]. This new setup along with the conventional centralized one are depicted in Figure 1.


The First AI4TSP Competition: Learning to Solve Stochastic Routing Problems

arXiv.org Artificial Intelligence

The TSP is one of the classical combinatorial optimization problems, with many variants inspired by real-world applications. This first competition asked the participants to develop algorithms to solve a time-dependent orienteering problem with stochastic weights and time windows (TD-OPSWTW). It focused on two types of learning approaches: surrogate-based optimization and deep reinforcement learning. In this paper, we describe the problem, the setup of the competition, the winning methods, and give an overview of the results. The winning methods described in this work have advanced the state-of-the-art in using AI for stochastic routing problems. Overall, by organizing this competition we have introduced routing problems as an interesting problem setting for AI researchers. The simulator of the problem has been made open-source and can be used by other researchers as a benchmark for new AI methods.


Safe AI -- How is this Possible?

arXiv.org Artificial Intelligence

A new generation of increasingly autonomous and self-learning cyber-physical systems (CPS) is being developed for control applications in the real world. These systems are AI-based in that they leverage techniques from the field of Artificial intelligence (AI) to flexibly cope with imprecision, inconsistency, incompleteness, to have an inherent ability to learn from experience, and to adapt according to changing and even unforeseen situations. This extra flexibility of AI systems makes it harder to predict their behavior. Moreover, AI systems usually are safety-critical in that they may be causing real harm in (and to) the real world. Consequently, the central question regarding the development of such systems is how to handle or even overcome this basic dichotomy between unpredictable and safe behavior of AI systems. In other words, how can we best construct systems that exploit AI techniques, without incurring the frailties of "AI-like" behavior?


AI-Aided Integrated Terrestrial and Non-Terrestrial 6G Solutions for Sustainable Maritime Networking

arXiv.org Artificial Intelligence

The maritime industry is experiencing a technological revolution that affects shipbuilding, operation of both seagoing and inland vessels, cargo management, and working practices in harbors. This ongoing transformation is driven by the ambition to make the ecosystem more sustainable and cost-efficient. Digitalization and automation help achieve these goals by transforming shipping and cruising into a much more cost- and energy-efficient, and decarbonized industry segment. The key enablers in these processes are always-available connectivity and content delivery services, which can not only aid shipping companies in improving their operational efficiency and reducing carbon emissions but also contribute to enhanced crew welfare and passenger experience. Due to recent advancements in integrating high-capacity and ultra-reliable terrestrial and non-terrestrial networking technologies, ubiquitous maritime connectivity is becoming a reality. To cope with the increased complexity of managing these integrated systems, this article advocates the use of artificial intelligence and machine learning-based approaches to meet the service requirements and energy efficiency targets in various maritime communications scenarios.


Convex Analysis of the Mean Field Langevin Dynamics

arXiv.org Machine Learning

As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics attracts attention due to its connection to (noisy) gradient descent on infinitely wide neural networks in the mean field regime, and hence the convergence property of the dynamics is of great theoretical interest. In this work, we give a simple and self-contained convergence rate analysis of the mean field Langevin dynamics with respect to the (regularized) objective function in both continuous and discrete time settings. The key ingredient of our proof is a proximal Gibbs distribution $p_q$ associated with the dynamics, which, in combination of techniques in [Vempala and Wibisono (2019)], allows us to develop a convergence theory parallel to classical results in convex optimization. Furthermore, we reveal that $p_q$ connects to the duality gap in the empirical risk minimization setting, which enables efficient empirical evaluation of the algorithm convergence.


Sharpness-Aware Minimization

#artificialintelligence

This post deals with a recent optimizing method for training neural networks described in the paper Sharpness-Aware Minimization for Efficiently Improving Generalization by P. Foret et al. (December 2020). Honestly, the first time I read about the paper details, I really thought the procedure therein described (or something similar) had already been explored many years before by tons of people… I was even surprised to read that it worked in some contexts. Modern models train through optimization methods relying just on the training loss. These models can easily memorize the training data and are prone to overfitting. They have more parameters than needed and this large number of parameters provides no guarantee of proper generalization to the test set.


Evolutionary Computation for Expensive Optimization: A Survey - Machine Intelligence Research

#artificialintelligence

Expensive optimization problem (EOP) widely exists in various significant real-world applications. However, EOP requires expensive or even unaffordable costs for evaluating candidate solutions, which is expensive for the algorithm to find a satisfactory solution. Moreover, due to the fast-growing application demands in the economy and society, such as the emergence of the smart cities, the internet of things, and the big data era, solving EOP more efficiently has become increasingly essential in various fields, which poses great challenges on the problem-solving ability of optimization approach for EOP. Among various optimization approaches, evolutionary computation (EC) is a promising global optimization tool widely used for solving EOP efficiently in the past decades. Given the fruitful advancements of EC for EOP, it is essential to review these advancements in order to synthesize and give previous research experiences and references to aid the development of relevant research fields and real-world applications.