Goto

Collaborating Authors

 Search


On Learning Intrinsic Rewards for Policy Gradient Methods

arXiv.org Artificial Intelligence

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et.al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based learning agents. We compare the performance of an augmented agent that uses our algorithm to provide additive intrinsic rewards to an A2C-based policy learner (for Atari games) and a PPO-based policy learner (for Mujoco domains) with a baseline agent that uses the same policy learners but with only extrinsic rewards. Our results show improved performance on most but not all of the domains.


Simplifying the minimax disparity model for determining OWA weights in large-scale problems

arXiv.org Artificial Intelligence

In the context of multicriteria decision making, the ordered weighted averaging (OWA) functions play a crucial role in aggregating multiple criteria evaluations into an overall assessment supporting the decision makers' choice. Determining OWA weights, therefore, is an essential part of this process. Available methods for determining OWA weights, however, often require heavy computational loads in real-life large-scale optimization problems. In this paper, we propose a new approach to simplify the well-known minimax disparity model for determining OWA weights. For this purpose, we use to the binomial decomposition framework in which natural constraints can be imposed on the level of complexity of the weight distribution. The original problem of determining OWA weights is thereby transformed into a smaller scale optimization problem, formulated in terms of the coefficients in the binomial decomposition. Our preliminary results show that a small set of these coefficients can encode for an appropriate full-dimensional set of OWA weights.


SPSA-FSR: Simultaneous Perturbation Stochastic Approximation for Feature Selection and Ranking

arXiv.org Machine Learning

This manuscript presents the following: (1) an improved version of the Binary Simultaneous Perturbation Stochastic Approximation (SPSA) Method for feature selection in machine learning (Aksakalli and Malekipirbazari, Pattern Recognition Letters, Vol. 75, 2016) based on non-monotone iteration gains computed via the Barzilai and Borwein (BB) method, (2) its adaptation for feature ranking, and (3) comparison against popular methods on public benchmark datasets. The improved method, which we call SPSA-FSR, dramatically reduces the number of iterations required for convergence without impacting solution quality. SPSA-FSR can be used for feature ranking and feature selection both for classification and regression problems. After a review of the current state-of-the-art, we discuss our improvements in detail and present three sets of computational experiments: (1) comparison of SPSA-FS as a (wrapper) feature selection method against sequential methods as well as genetic algorithms, (2) comparison of SPSA-FS as a feature ranking method in a classification setting against random forest importance, chi-squared, and information main methods, and (3) comparison of SPSA-FS as a feature ranking method in a regression setting against minimum redundancy maximum relevance (MRMR), RELIEF, and linear correlation methods. The number of features in the datasets we use range from a few dozens to a few thousands. Our results indicate that SPSA-FS converges to a good feature set in no more than 100 iterations and therefore it is quite fast for a wrapper method. SPSA-FS also outperforms popular feature selection as well as feature ranking methods in majority of test cases, sometimes by a large margin, and it stands as a promising new feature selection and ranking method.


Structural Learning of Probabilistic Graphical Models of Cumulative Phenomena

arXiv.org Artificial Intelligence

One of the critical issues when adopting Bayesian networks (BNs) to model dependencies among random variables is to "learn" their structure. This is a well-known NP-hard problem in its most general and classical formulation, which is furthermore complicated by known pitfalls such as the issue of I-equivalence among different structures. In this work we restrict the investigation to a specific class of networks, i.e., those representing the dynamics of phenomena characterized by the monotonic accumulation of events. Such phenomena allow to set specific structural constraints based on Suppes' theory of probabilistic causation and, accordingly, to define constrained BNs, named Suppes-Bayes Causal Networks (SBCNs). Within this framework, we study the structure learning of SBCNs via extensive simulations with various state-of-the-art search strategies, such as canonical local search techniques and Genetic Algorithms. This investigation is intended to be an extension and an in-depth clarification of our previous works on SBCN structure learning. Among the main results, we show that Suppes' constraints do simplify the learning task, by reducing the solution search space and providing a temporal ordering on the variables, which simplifies the complications derived by I-equivalent structures. Finally, we report on tradeoffs among different optimization techniques that can be used to learn SBCNs.


Roster Evaluation Based on Classifiers for the Nurse Rostering Problem

arXiv.org Artificial Intelligence

The personnel scheduling problem is a well-known NP-hard combinatorial problem. Due to the complexity of this problem and the size of the real-world instances, it is not possible to use exact methods, and thus heuristics, meta-heuristics, or hyper-heuristics must be employed. The majority of heuristic approaches are based on iterative search, where the quality of intermediate solutions must be calculated. Unfortunately, this is computationally highly expensive because these problems have many constraints and some are very complex. In this study, we propose a machine learning technique as a tool to accelerate the evaluation phase in heuristic approaches. The solution is based on a simple classifier, which is able to determine whether the changed solution (more precisely, the changed part of the solution) is better than the original or not. This decision is made much faster than a standard cost-oriented evaluation process. However, the classification process cannot guarantee 100% correctness. Therefore, our approach, which is illustrated using a tabu search algorithm in this study, includes a filtering mechanism, where the classifier rejects the majority of the potentially bad solutions and the remaining solutions are then evaluated in a standard manner. We also show how the boosting algorithms can improve the quality of the final solution compared with a simple classifier. We verified our proposed approach and premises, based on standard and real-world benchmark instances, to demonstrate the significant speedup obtained with comparable solution quality.


Graph Search, Shortest Paths, and Data Structures Coursera

#artificialintelligence

About this course: The primary topics in this part of the specialization are: data structures (heaps, balanced search trees, hash tables, bloom filters), graph primitives (applications of breadth-first and depth-first search, connectivity, shortest paths), and their applications (ranging from deduplication to social network analysis).


Regularized Greedy Column Subset Selection

arXiv.org Artificial Intelligence

The Column Subset Selection Problem provides a natural framework for unsupervised feature selection. Despite being a hard combinatorial optimization problem, there exist efficient algorithms that provide good approximations. The drawback of the problem formulation is that it incorporates no form of regularization, and is therefore very sensitive to noise when presented with scarce data. In this paper we propose a regularized formulation of this problem, and derive a correct greedy algorithm that is similar in efficiency to existing greedy methods for the unregularized problem. We study its adequacy for feature selection and propose suitable formulations. Additionally, we derive a lower bound for the error of the proposed problems. Through various numerical experiments on real and synthetic data, we demonstrate the significantly increased robustness and stability of our method, as well as the improved conditioning of its output, all while remaining efficient for practical use.


A Topological Approach to Meta-heuristics: Analytical Results on the BFS vs. DFS Algorithm Selection Problem

arXiv.org Artificial Intelligence

Search is a central problem in artificial intelligence, and breadth-first search (BFS) and depth-first search (DFS) are the two most fundamental ways to search. In this paper we derive estimates for average BFS and DFS runtime. The average runtime estimates can be used to allocate resources or judge the hardness of a problem. They can also be used for selecting the best graph representation, and for selecting the faster algorithm out of BFS and DFS. They may also form the basis for an analysis of more advanced search methods. The paper treats both tree search and graph search. For tree search, we employ a probabilistic model of goal distribution; for graph search, the analysis depends on an additional statistic of path redundancy and average branching factor. As an application, we use the results to predict BFS and DFS runtime on two concrete grammar problems and on the N-puzzle. Experimental verification shows that our analytical approximations come close to empirical reality.


Learning Path: Artificial Intelligence for Apps and Games

@machinelearnbot

With the emergence of big data and modern technologies, artificial intelligence has acquired a lot of relevance in many domains. The increase in demand for automation has generated many applications for artificial intelligence in fields such as robotics, predictive analytics, finance, and many more. So, if you're a developer who wants to upgrade your normal applications to smart and intelligent versions, then go for this Learning Path. Packt's Video Learning Path is a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it. Let's take a quick look at your learning journey.


Towards Quicker Probabilistic Recognition with Multiple Goal Heuristic Search

AAAI Conferences

Referred to as an approach for either plan or goal recognition, the original method proposed by Ramirez and Geffner introduced a domain-based approach that did not need a library containing specific plan instances. This introduced a more generalizable means of representing tasks to be recognized, but was also very slow due to its need to run simulations via multiple executions of an off-the-shelf classical planner. Several variations have since been proposed for quicker recognition, but each one uses a drastically different approach that must sacrifice other qualities useful for processing the recognition results in more complex systems. We present work in progress that takes advantage of the shared state space between planner executions to perform multiple goal heuristic search. This single execution of a planner will potentially speed up the recognition process using the original method, which also maintains the sacrificed properties and improves some of the assumptions made by Ramirez and Geffner.