Goto

Collaborating Authors

 plp



SupplementaryMaterial

Neural Information Processing Systems

This is the appendix for "A general approximation lower bound inLp norm, with applications to feed-forwardneuralnetworks". Layer L consists of a single node: the output neuron. Note that skip connections are allowed, i.e., there can be connections between non-consecutivelayers. We now explain how to derive Proposition 1 (with an arbitrary range[a,b]) as a straightforward consequenceofProposition7. Proof(ofProposition1). In order to apply Proposition 7, we reduce the problem from[a,b] to [0,1] by translating and rescaling every function inG.



Cross-Domain Policy Transfer by Representation Alignment via Multi-Domain Behavioral Cloning

arXiv.org Artificial Intelligence

Transferring learned skills across diverse situations remains a fundamental challenge for autonomous agents, particularly when agents are not allowed to interact with an exact target setup. While prior approaches have predominantly focused on learning domain translation, they often struggle with handling significant domain gaps or out-of-distribution tasks. In this paper, we present a simple approach for cross-domain policy transfer that learns a shared latent representation across domains and a common abstract policy on top of it. Our approach leverages multi-domain behavioral cloning on unaligned trajectories of proxy tasks and employs maximum mean discrepancy (MMD) as a regularization term to encourage cross-domain alignment. The MMD regularization better preserves structures of latent state distributions than commonly used domain-discriminative distribution matching, leading to higher transfer performance. Moreover, our approach involves training only one multi-domain policy, which makes extension easier than existing methods. Empirical evaluations demonstrate the efficacy of our method across various domain shifts, especially in scenarios where exact domain translation is challenging, such as cross-morphology or cross-viewpoint settings. Our ablation studies further reveal that multi-domain behavioral cloning implicitly contributes to representation alignment alongside domain-adversarial regularization. Humans have an astonishing ability to learn skills in a highly transferable way. Once we learn a route from home to the station, for example, we can get to the destination using various modes of transportation (e.g., walking, cycling, or driving) in different environments (e.g., on a map or in the real world), disregarding irrelevant perturbations (e.g., weather, time, or traffic conditions). We identify the underlying structural similarities across situations, perceive the world, and accumulate knowledge in our way of abstraction. Such abstract knowledge can be readily employed in diverse similar situations. However, it is not easy for autonomous agents. Agents trained with reinforcement learning (RL) or imitation learning (IL) often struggle to transfer knowledge acquired in a specific situation to another. This is because the learned policies are strongly tied to the representations obtained under a particular training configuration, which is not robust to changes in an agent or an environment. Previous studies have attempted to address this problem through various approaches. Domain randomization (Tobin et al., 2017; Peng et al., 2018; Andrychowicz et al., 2020) aims to learn a policy that is robust to environmental changes by utilizing multiple training domains. However, it is unable to handle significant domain gaps that go beyond the assumed domain distribution during training, such as drastically different observations or agent morphologies. Numerous methods have been proposed to overcome such domain discrepancies.


Learning Linear Programs from Optimal Decisions

arXiv.org Machine Learning

We propose a flexible gradient-based framework for learning linear programs from optimal decisions. Linear programs are often specified by hand, using prior knowledge of relevant costs and constraints. In some applications, linear programs must instead be learned from observations of optimal decisions. Learning from optimal decisions is a particularly challenging bi-level problem, and much of the related inverse optimization literature is dedicated to special cases. We tackle the general problem, learning all parameters jointly while allowing flexible parametrizations of costs, constraints, and loss functions. We also address challenges specific to learning linear programs, such as empty feasible regions and non-unique optimal decisions. Experiments show that our method successfully learns synthetic linear programs and minimum-cost multi-commodity flow instances for which previous methods are not directly applicable. We also provide a fast batch-mode PyTorch implementation of the homogeneous interior point algorithm, which supports gradients by implicit differentiation or backpropagation.


Exact and Metaheuristic Approaches for the Production Leveling Problem

arXiv.org Artificial Intelligence

In this paper we introduce a new problem in the field of production planning which we call the Production Leveling Problem. The task is to assign orders to production periods such that the load in each period and on each production resource is balanced, capacity limits are not exceeded and the orders' priorities are taken into account. Production Leveling is an important intermediate step between long-term planning and the final scheduling of orders within a production period, as it is responsible for selecting good subsets of orders to be scheduled within each period. A formal model of the problem is proposed and NP-hardness is shown by reduction from Bin Backing. As an exact method for solving moderately sized instances we introduce a MIP formulation. For solving large problem instances, metaheuristic local search is investigated. A greedy heuristic and two neighborhood structures for local search are proposed, in order to apply them using Variable Neighborhood Descent and Simulated Annealing. Regarding exact techniques, the main question of research is, up to which size instances are solvable within a fixed amount of time. For the metaheuristic approaches the aim is to show that they produce near-optimal solutions for smaller instances, but also scale well to very large instances. A set of realistic problem instances from an industrial partner is contributed to the literature, as well as random instance generators. The experimental evaluation conveys that the proposed MIP model works well for instances with up to 250 orders. Out of the investigated metaheuristic approaches, Simulated Annealing achieves the best results. It is shown to produce solutions with less than 3% average optimality gap on small instances and to scale well up to thousands of orders and dozens of periods and products. The presented metaheuristic methods are already being used in the industry.


Value of Information in Probabilistic Logic Programs

arXiv.org Artificial Intelligence

In medical decision making, we have to choose among several expensive diagnostic tests such that the certainty about a patient's health is maximized while remaining within the bounds of resources like time and money. The expected increase in certainty in the patient's condition due to performing a test is called the value of information (VoI) for that test. In general, VoI relates to acquiring additional information to improve decision-making based on probabilistic reasoning in an uncertain system. This paper presents a framework for acquiring information based on VoI in uncertain systems modeled as Probabilistic Logic Programs (PLPs). Optimal decision-making in uncertain systems modeled as PLPs have already been studied before. But, acquiring additional information to further improve the results of making the optimal decision has remained open in this context. We model decision-making in an uncertain system with a PLP and a set of top-level queries, with a set of utility measures over the distributions of these queries. The PLP is annotated with a set of atoms labeled as "observable"; in the medical diagnosis example, the observable atoms will be results of diagnostic tests. Each observable atom has an associated cost. This setting of optimally selecting observations based on VoI is more general than that considered by any prior work. Given a limited budget, optimally choosing observable atoms based on VoI is intractable in general. We give a greedy algorithm for constructing a "conditional plan" of observations: a schedule where the selection of what atom to observe next depends on earlier observations. We show that, preempting the algorithm anytime before completion provides a usable result, the result improves over time, and, in the absence of a well-defined budget, converges to the optimal solution.


Few-Shot Bayesian Imitation Learning with Logic over Programs

arXiv.org Artificial Intelligence

We describe an expressive class of policies that can be efficiently learned from a few demonstrations. Policies are represented as logical combinations of programs drawn from a small domain-specific language (DSL). We define a prior over policies with a probabilistic grammar and derive an approximate Bayesian inference algorithm to learn policies from demonstrations. In experiments, we study five strategy games played on a 2D grid with one shared DSL. After a few demonstrations of each game, the inferred policies generalize to new game instances that differ substantially from the demonstrations. We argue that the proposed method is an apt choice for policy learning tasks that have scarce training data and feature significant, structured variation between task instances.


On the Semantics and Complexity of Probabilistic Logic Programs

Journal of Artificial Intelligence Research

We examine the meaning and the complexity of probabilistic logic programs that consist of a set of rules and a set of independent probabilistic facts (that is, programs based on Sato's distribution semantics). We focus on two semantics, respectively based on stable and on well-founded models. We show that the semantics based on stable models (referred to as the "credal semantics") produces sets of probability measures that dominate infinitely monotone Choquet capacities; we describe several useful consequences of this result. We then examine the complexity of inference with probabilistic logic programs. We distinguish between the complexity of inference when a probabilistic program and a query are given (the inferential complexity), and the complexity of inference when the probabilistic program is fixed and the query is given (the query complexity, akin to data complexity as used in database theory). We obtain results on the inferential and query complexity for acyclic, stratified, and normal propositional and relational programs; complexity reaches various levels of the counting hierarchy and even exponential levels.


A Syntax-based Framework for Merging Imprecise Probabilistic Logic Programs

AAAI Conferences

In this paper, we address the problem of merging multiple imprecise probabilistic beliefs represented as Probabilistic Logic Programs (PLPs) obtained from multiple sources. Beliefs in each PLP are modeled as conditional events attached with probability bounds. The major task of syntax-based merging is to obtain the most rational probability bound for each conditional event from the original PLPs to form a new PLP. We require the minimal change principle to be followed so that each source gives up its beliefs as little as possible. Some instantiated merging operators are derived from our merging framework. Furthermore, we propose a set of postulates for merging PLPs, some of which extend the postulates for merging classical knowledge bases, whilst others are specific to the merging of probabilistic beliefs.