Goto

Collaborating Authors

 Optimization


Admissible Policy Teaching through Reward Design

arXiv.org Artificial Intelligence

We study reward design strategies for incentivizing a reinforcement learning agent to adopt a policy from a set of admissible policies. The goal of the reward designer is to modify the underlying reward function cost-efficiently while ensuring that any approximately optimal deterministic policy under the new reward function is admissible and performs well under the original reward function. This problem can be viewed as a dual to the problem of optimal reward poisoning attacks: instead of forcing an agent to adopt a specific policy, the reward designer incentivizes an agent to avoid taking actions that are inadmissible in certain states. Perhaps surprisingly, and in contrast to the problem of optimal reward poisoning attacks, we first show that the reward design problem for admissible policy teaching is computationally challenging, and it is NP-hard to find an approximately optimal reward modification. We then proceed by formulating a surrogate problem whose optimal solution approximates the optimal solution to the reward design problem in our setting, but is more amenable to optimization techniques and analysis. For this surrogate problem, we present characterization results that provide bounds on the value of the optimal solution. Finally, we design a local search algorithm to solve the surrogate problem and showcase its utility using simulation-based experiments.


Super-Reparametrizations of Weighted CSPs: Properties and Optimization Perspective

arXiv.org Artificial Intelligence

The notion of reparametrizations of Weighted CSPs (WCSPs) (also known as equivalence-preserving transformations of WCSPs) is well-known and finds its use in many algorithms to approximate or bound the optimal WCSP value. In contrast, the concept of super-reparametrizations (which are changes of the weights that keep or increase the WCSP objective for every assignment) was already proposed but never studied in detail. To fill this gap, we present a number of theoretical properties of super-reparametrizations and compare them to those of reparametrizations. Furthermore, we propose a framework for computing upper bounds on the optimal value of the (maximization version of) WCSP using super-reparametrizations. We show that it is in principle possible to employ arbitrary (under some technical conditions) constraint propagation rules to improve the bound. For arc consistency in particular, the method reduces to the known Virtual AC (VAC) algorithm. Newly, we implemented the method for singleton arc consistency (SAC) and compared it to other strong local consistencies in WCSPs on a public benchmark. The results show that the bounds obtained from SAC are superior for many instance groups.


Jointly Learning Environments and Control Policies with Projected Stochastic Gradient Ascent

Journal of Artificial Intelligence Research

We consider the joint design and control of discrete-time stochastic dynamical systems over a finite time horizon. We formulate the problem as a multi-step optimization problem under uncertainty seeking to identify a system design and a control policy that jointly maximize the expected sum of rewards collected over the time horizon considered. The transition function, the reward function and the policy are all parametrized, assumed known and differentiable with respect to their parameters. We then introduce a deep reinforcement learning algorithm combining policy gradient methods with model-based optimization techniques to solve this problem. In essence, our algorithm iteratively approximates the gradient of the expected return via Monte-Carlo sampling and automatic differentiation and takes projected gradient ascent steps in the space of environment and policy parameters. This algorithm is referred to as Direct Environment and Policy Search (DEPS). We assess the performance of our algorithm in three environments concerned with the design and control of a mass-spring-damper system, a small-scale off-grid power system and a drone, respectively. In addition, our algorithm is benchmarked against a state-of-the-art deep reinforcement learning algorithm used to tackle joint design and control problems. We show that DEPS performs at least as well or better in all three environments, consistently yielding solutions with higher returns in fewer iterations. Finally, solutions produced by our algorithm are also compared with solutions produced by an algorithm that does not jointly optimize environment and policy parameters, highlighting the fact that higher returns can be achieved when joint optimization is performed.


GLAN: A Graph-based Linear Assignment Network

arXiv.org Artificial Intelligence

Differentiable solvers for the linear assignment problem (LAP) have attracted much research attention in recent years, which are usually embedded into learning frameworks as components. However, previous algorithms, with or without learning strategies, usually suffer from the degradation of the optimality with the increment of the problem size. In this paper, we propose a learnable linear assignment solver based on deep graph networks. Specifically, we first transform the cost matrix to a bipartite graph and convert the assignment task to the problem of selecting reliable edges from the constructed graph. Subsequently, a deep graph network is developed to aggregate and update the features of nodes and edges. Finally, the network predicts a label for each edge that indicates the assignment relationship. The experimental results on a synthetic dataset reveal that our method outperforms state-of-the-art baselines and achieves consistently high accuracy with the increment of the problem size. Furthermore, we also embed the proposed solver, in comparison with state-of-the-art baseline solvers, into a popular multi-object tracking (MOT) framework to train the tracker in an end-to-end manner. The experimental results on MOT benchmarks illustrate that the proposed LAP solver improves the tracker by the largest margin.


Dynamic GPU Energy Optimization for Machine Learning Training Workloads

arXiv.org Artificial Intelligence

GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find a trade-off between execution time and energy consumption. We evaluate the GPOEO by applying it to 71 machine learning workloads from two AI benchmark suites running on an NVIDIA RTX3080Ti GPU. Compared with the NVIDIA default scheduling strategy, GPOEO delivers a mean energy saving of 16.2% with a modest average execution time increase of 5.1%.


Learning Control Policies for Fall prevention and safety in bipedal locomotion

arXiv.org Artificial Intelligence

The ability to recover from an unexpected external perturbation is a fundamental motor skill in bipedal locomotion. An effective response includes the ability to not just recover balance and maintain stability but also to fall in a safe manner when balance recovery is physically infeasible. For robots associated with bipedal locomotion, such as humanoid robots and assistive robotic devices that aid humans in walking, designing controllers which can provide this stability and safety can prevent damage to robots or prevent injury related medical costs. This is a challenging task because it involves generating highly dynamic motion for a high-dimensional, non-linear and under-actuated system with contacts. Despite prior advancements in using model-based and optimization methods, challenges such as requirement of extensive domain knowledge, relatively large computational time and limited robustness to changes in dynamics still make this an open problem. In this thesis, to address these issues we develop learning-based algorithms capable of synthesizing push recovery control policies for two different kinds of robots : Humanoid robots and assistive robotic devices that assist in bipedal locomotion. Our work can be branched into two closely related directions : 1) Learning safe falling and fall prevention strategies for humanoid robots and 2) Learning fall prevention strategies for humans using a robotic assistive devices. To achieve this, we introduce a set of Deep Reinforcement Learning (DRL) algorithms to learn control policies that improve safety while using these robots.


A integrating critic-waspas group decision making method under interval-valued q-rung orthogonal fuzzy enviroment

arXiv.org Artificial Intelligence

This paper provides a new tool for multi-attribute multi-objective group decision-making with unknown weights and attributes' weights. An interval-valued generalized orthogonal fuzzy group decision-making method is proposed based on the Yager operator and CRITIC-WASPAS method with unknown weights. The method integrates Yager operator, CRITIC, WASPAS, and interval value generalized orthogonal fuzzy group. Its merits lie in allowing decision-makers greater freedom, avoiding bias due to decision-makers' weight, and yielding accurate evaluation. The research includes: expanding the interval value generalized distance measurement method for comparison and application of similarity measurement and decision-making methods; developing a new scoring function for comparing the size of interval value generalized orthogonal fuzzy numbers,and further existing researches. The proposed interval-valued Yager weighted average operator (IVq-ROFYWA) and Yager weighted geometric average operator (IVq-ROFYWG) are used for information aggregation. The CRITIC-WASPAS combines the advantages of CRITIC and WASPAS, which not only work in the single decision but also serve as the basis of the group decision. The in-depth study of the decision-maker's weight matrix overcomes the shortcomings of taking the decision as a whole, and weighs the decision-maker's information aggregation. Finally, the group decision algorithm is used for hypertension risk management. The results are consistent with decision-makers' opinions. Practice and case analysis have proved the effectiveness of the method proposed in this paper. At the same time, it is compared with other operators and decision-making methods, which proves the method effective and feasible.


Vision Transformer Slimming: Multi-Dimension Searching in Continuous Optimization Space

arXiv.org Artificial Intelligence

This paper explores the feasibility of finding an optimal sub-model from a vision transformer and introduces a pure vision transformer slimming (ViT-Slim) framework that can search such a sub-structure from the original model end-to-end across multiple dimensions, including the input tokens, MHSA and MLP modules with state-of-the-art performance. Our method is based on a learnable and unified l1 sparsity constraint with pre-defined factors to reflect the global importance in the continuous searching space of different dimensions. The searching process is highly efficient through a single-shot training scheme. For instance, on DeiT-S, ViT-Slim only takes ~43 GPU hours for searching process, and the searched structure is flexible with diverse dimensionalities in different modules. Then, a budget threshold is employed according to the requirements of accuracy-FLOPs trade-off on running devices, and a re-training process is performed to obtain the final models. The extensive experiments show that our ViT-Slim can compress up to 40% of parameters and 40% FLOPs on various vision transformers while increasing the accuracy by ~0.6% on ImageNet. We also demonstrate the advantage of our searched models on several downstream datasets. Our source code will be publicly available.


Feature Selection-based Intrusion Detection System Using Genetic Whale Optimization Algorithm and Sample-based Classification

arXiv.org Artificial Intelligence

Preventing and detecting intrusions and attacks on wireless networks has become an important and serious challenge. On the other hand, due to the limited resources of wireless nodes, the use of monitoring nodes for permanent monitoring in wireless sensor networks in order to prevent and detect intrusion and attacks in this type of network is practically non-existent. Therefore, the solution to overcome this problem today is the discussion of remote-control systems and has become one of the topics of interest in various fields. Remote monitoring of node performance and behavior in wireless sensor networks, in addition to detecting malicious nodes within the network, can also predict malicious node behavior in future. In present research, a network intrusion detection system using feature selection based on a combination of Whale optimization algorithm (WOA) and genetic algorithm (GA) and sample-based classification is proposed. In this research, the standard data set KDDCUP1999 has been used in which the characteristics related to healthy nodes and types of malicious nodes are stored based on the type of attacks in the network. The proposed method is based on the combination of feature selection based on Whale optimization algorithm and genetic algorithm with KNN classification in terms of accuracy criteria, has better results than other previous methods. Based on this, it can be said that the Whale optimization algorithm and the genetic algorithm have extracted the features related to the class label well, and the KNN method has been able to well detect the misconduct nodes in the intrusion detection data set in wireless networks.


HarmoFL: Harmonizing Local and Global Drifts in Federated Learning on Heterogeneous Medical Images

arXiv.org Artificial Intelligence

Multiple medical institutions collaboratively training a model using federated learning (FL) has become a promising solution for maximizing the potential of data-driven models, yet the non-independent and identically distributed (non-iid) data in medical images is still an outstanding challenge in real-world practice. The feature heterogeneity caused by diverse scanners or protocols introduces a drift in the learning process, in both local (client) and global (server) optimizations, which harms the convergence as well as model performance. Many previous works have attempted to address the non-iid issue by tackling the drift locally or globally, but how to jointly solve the two essentially coupled drifts is still unclear. In this work, we concentrate on handling both local and global drifts and introduce a new harmonizing framework called HarmoFL. First, we propose to mitigate the local update drift by normalizing amplitudes of images transformed into the frequency domain to mimic a unified imaging setting, in order to generate a harmonized feature space across local clients. Second, based on harmonized features, we design a client weight perturbation guiding each local model to reach a flat optimum, where a neighborhood area of the local optimal solution has a uniformly low loss. Without any extra communication cost, the perturbation assists the global model to optimize towards a converged optimal solution by aggregating several local flat optima. We have theoretically analyzed the proposed method and empirically conducted extensive experiments on three medical image classification and segmentation tasks, showing that HarmoFL outperforms a set of recent state-of-the-art methods with promising convergence behavior. Code is available at https://github.com/med-air/HarmoFL.