Plotting


Belief-State Query Policies for User-Aligned POMDPs

Neural Information Processing Systems

Planning in real-world settings often entails addressing partial observability while aligning with users' requirements. We present a novel framework for expressing users' constraints and preferences about agent behavior in a partially observable setting using parameterized belief-state query (BSQ) policies in the setting of goaloriented partially observable Markov decision processes (gPOMDPs). We present the first formal analysis of such constraints and prove that while the expected cost function of a parameterized BSQ policy w.r.t its parameters is not convex, it is piecewise constant and yields an implicit discrete parameter search space that is finite for finite horizons. This theoretical result leads to novel algorithms that optimize gPOMDP agent behavior with guaranteed user alignment. Analysis proves that our algorithms converge to the optimal user-aligned behavior in the limit. Empirical results show that parameterized BSQ policies provide a computationally feasible approach for user-aligned planning in partially observable settings.



A Theory of Optimistically Universal Online Learnability for General Concept Classes

Neural Information Processing Systems

We provide a full characterization of the concept classes that are optimistically universally online learnable with {0, 1} labels. The notion of optimistically universal online learning was defined in [Hanneke, 2021] in order to understand learnability under minimal assumptions. In this paper, following the philosophy behind that work, we investigate two questions, namely, for every concept class: (1) What are the minimal assumptions on the data process admitting online learnability?


Appendix A Optimal Path Search

Neural Information Processing Systems

For the optimal route search we want to find the shortest path in a modified graph. We adapt the standard Dijkstra's algorithm for the same. A priority queue is maintained for all the unvisited nodes. Each unvisited node is mapped to the minimum cost to reach that node from the source node, using the visited ones. In each iteration the minimum element is popped from the queue and it's neighbours are updated using the negative log likelihoods found from the model and added to the queue (lines 7-11).


MLR: Robust & Reliable Route Recommendation on Road Networks

Neural Information Processing Systems

Predicting the most likely route from a source location to a destination is a core functionality in mapping services. Although the problem has been studied in the literature, two key limitations remain to be addressed. First, our study reveals that a significant portion of the routes recommended by existing methods fail to reach the destination. Second, existing techniques are transductive in nature; hence, they fail to recommend routes if unseen roads are encountered at inference time.



e366d105cfd734677897aaccf51e97a3-AuthorFeedback.pdf

Neural Information Processing Systems

Reviewer 1: Thanks for the feedback and for the suggestions as to how to make the paper clearer and the examples less intimidating. Re "...how decomposing the polytope now allows it to be mapped?" If you meant "how does the decomposition help map the problem of computing an optimal correlated We'll take all of them into account. Re "broader impact" Thanks for the feedback, we agree with all your points. As you correctly recognized, we use the term "social welfare" to mean the sum of utilities of the players as is typical in the game The maximum payoff is 15. Gurobi is freely available for academic use, but we'll also mention the open-source We are definitely the first to compute optimal EFCE in it. We strongly disagree that "this paper just tells us that the work in Farina et al. [12] is Extending the construction by Farina et al. to handle the more general We strongly disagree with that.



Supplement to ' Autoencoders that don't overfit towards the Identity '

Neural Information Processing Systems

This supplement provides in Section 2, the proof of the Theorem in the paper, in Section 3, the derivation of the ADMM equations for optimizing Eq. 10 in the paper, and in Section 4, the derivation of the update-equations for optimizing Eq. 11 in the paper, and in Section 5, the generalization of Section 3 in the paper to dropout at different layers in a deep network. This first section of the proof provides an overview, where we start with the objective function of Eq. 1 in the paper (re-stated in Eq. 2 below), and show that it is equal to the objective function in the Theorem in the paper (see Eq. 8 below) up to the factor ap + bq, which is an irrelevant constant when optimizing for B In the following, we provide the detailed steps. We first provide the sequence of manipulations at once, and then describe each step in the text below. We start by re-stating Eq. 1 in the paper (X Line 5 states the analytic simplifications obtained for the parts (a) and (b), respectively, when the number n of training-epochs approaches infinity (for convergence). The details are outlined in Sections 2.2 and 2.3 below.