Goto

Collaborating Authors

 perp


Replay-Guided Adversarial Environment Design

Neural Information Processing Systems

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD). Crucially, DCD includes both PLR and a popular UED algorithm, PAIRED, as special cases and inherits similar theoretical guarantees. This connection allows us to develop novel theory for PLR, providing a version with a robustness guarantee at Nash equilibria. Furthermore, our theory suggests a highly counterintuitive improvement to PLR: by stopping the agent from updating its policy on uncurated levels (training on less data), we can improve the convergence to Nash equilibria. Indeed, our experiments confirm that our new method, PLR$^{\perp}$, obtains better results on a suite of out-of-distribution, zero-shot transfer tasks, in addition to demonstrating that PLR$^{\perp}$ improves the performance of PAIRED, from which it inherited its theoretical framework.


Replay-Guided Adversarial Environment Design

Neural Information Processing Systems

Deep reinforcement learning (RL) agents may successfully generalize to new settings if trained on an appropriately diverse set of environment and task configurations. Unsupervised Environment Design (UED) is a promising self-supervised RL paradigm, wherein the free parameters of an underspecified environment are automatically adapted during training to the agent's capabilities, leading to the emergence of diverse training environments. Here, we cast Prioritized Level Replay (PLR), an empirically successful but theoretically unmotivated method that selectively samples randomly-generated training levels, as UED. We argue that by curating completely random levels, PLR, too, can generate novel and complex levels for effective training. This insight reveals a natural class of UED methods we call Dual Curriculum Design (DCD).


Reviews: On Neuronal Capacity

Neural Information Processing Systems

The main result of the paper is giving tight bounds on the number of n-variable boolean functions that can be expressed as degree d-polynomial threshold functions for any fixed d (d growing very mildly with n) also works. To me the rest of the results while interesting seem to be mostly easy applications of the key result to other more fashionable neural network models. However, the tightness of the main result does not translate to tight bounds when other neuroidal models are considered, because of the kind of non-linearities or weight constraints involved. The main result is highly non-trivial, the proof quite lengthy though elegant, and resolves a 25 year old open problem. Although the proof uses a lot of heavy mathematics, the key contribution seems to be generalizing random matrix theory to a random tensor theory -- the key result being that a large number of stochastically independent random vectors in low-dimension (and hence clearly not linearly indpendent) still yield a high degree of linear independence when tensorized.


Convergence analysis of t-SNE as a gradient flow for point cloud on a manifold

Jeong, Seonghyeon, Wu, Hau-Tieng

arXiv.org Artificial Intelligence

We present a theoretical foundation regarding the boundedness of the t-SNE algorithm. t-SNE employs gradient descent iteration with Kullback-Leibler (KL) divergence as the objective function, aiming to identify a set of points that closely resemble the original data points in a high-dimensional space, minimizing KL divergence. Investigating t-SNE properties such as perplexity and affinity under a weak convergence assumption on the sampled dataset, we examine the behavior of points generated by t-SNE under continuous gradient flow. Demonstrating that points generated by t-SNE remain bounded, we leverage this insight to establish the existence of a minimizer for KL divergence.


Towards Automated Readable Proofs of Ruler and Compass Constructions

Marinković, Vesna, Šukilović, Tijana, Marić, Filip

arXiv.org Artificial Intelligence

Although there are several systems that successfully generate construction steps for ruler and compass construction problems, none of them provides readable synthetic correctness proofs for generated constructions. In the present work, we demonstrate how our triangle construction solver ArgoTriCS can cooperate with automated theorem provers for first order logic and coherent logic so that it generates construction correctness proofs, that are both human-readable and formal (can be checked by interactive theorem provers such as Coq or Isabelle/HOL). These proofs currently rely on many high-level lemmas and our goal is to have them all formally shown from the basic axioms of geometry.


PeRP: Personalized Residual Policies For Congestion Mitigation Through Co-operative Advisory Systems

Hasan, Aamir, Chakraborty, Neeloy, Chen, Haonan, Cho, Jung-Hoon, Wu, Cathy, Driggs-Campbell, Katherine

arXiv.org Artificial Intelligence

Intelligent driving systems can be used to mitigate congestion through simple actions, thus improving many socioeconomic factors such as commute time and gas costs. However, these systems assume precise control over autonomous vehicle fleets, and are hence limited in practice as they fail to account for uncertainty in human behavior. Piecewise Constant (PC) Policies address these issues by structurally modeling the likeness of human driving to reduce traffic congestion in dense scenarios to provide action advice to be followed by human drivers. However, PC policies assume that all drivers behave similarly. To this end, we develop a co-operative advisory system based on PC policies with a novel driver trait conditioned Personalized Residual Policy, PeRP. PeRP advises drivers to behave in ways that mitigate traffic congestion. We first infer the driver's intrinsic traits on how they follow instructions in an unsupervised manner with a variational autoencoder. Then, a policy conditioned on the inferred trait adapts the action of the PC policy to provide the driver with a personalized recommendation. Our system is trained in simulation with novel driver modeling of instruction adherence. We show that our approach successfully mitigates congestion while adapting to different driver behaviors, with 4 to 22% improvement in average speed over baselines.