Plotting

Unsupervised State Representation Learning in Atari Evan Racah

Neural Information Processing Systems

State representation learning, or the ability to capture latent generative factors of an environment, is crucial for building intelligent agents that can perform a wide variety of tasks. Learning such representations without supervision from rewards is a challenging open problem. We introduce a method that learns state representations by maximizing mutual information across spatially and temporally distinct features of a neural encoder of the observations. We also introduce a new benchmark based on Atari 2600 games where we evaluate representations based on how well they capture the ground truth state variables. We believe this new framework for evaluating representation learning models will be crucial for future representation learning research. Finally, we compare our technique with other state-of-the-art generative and contrastive representation learning methods.




Efficient Probabilistic Inference in the Quest for Physics Beyond the Standard Model

Neural Information Processing Systems

We present a novel probabilistic programming framework that couples directly to existing large-scale simulators through a cross-platform probabilistic execution protocol, which allows general-purpose inference engines to record and control random number draws within simulators in a language-agnostic way. The execution of existing simulators as probabilistic programs enables highly interpretable posterior inference in the structured model defined by the simulator code base. We demonstrate the technique in particle physics, on a scientifically accurate simulation of the ฯ„ (tau) lepton decay, which is a key ingredient in establishing the properties of the Higgs boson. Inference efficiency is achieved via inference compilation where a deep recurrent neural network is trained to parameterize proposal distributions and control the stochastic simulator in a sequential importance sampling scheme, at a fraction of the computational cost of a Markov chain Monte Carlo baseline.


Supplementary Materials for On the Effects of Data Scale on Computer Control Agents

Neural Information Processing Systems

For completeness, in the following we include a datasheet based on the format of [1]. For what purpose was the dataset created? Was there a specific task in mind? Who created the dataset (e.g., which team, research group) and on behalf of which entity What do the instances that comprise the dataset represent (e.g., documents, photos, people, How many instances are there in total (of each type, if appropriate)? What data does each instance consist of?


On the Effects of Data Scale on UI Control Agents

Neural Information Processing Systems

Autonomous agents that control user interfaces to accomplish human tasks are emerging. Leveraging LLMs to power such agents has been of special interest, but unless fine-tuned on human-collected task demonstrations, performance is still relatively low. In this work we study whether fine-tuning alone is a viable approach for building real-world UI control agents.


Rethinking Generative Mode Coverage: A Pointwise Guaranteed Approach

Neural Information Processing Systems

Many generative models have to combat missing modes. The conventional wisdom to this end is by reducing through training a statistical distance (such as f-divergence) between the generated distribution and provided data distribution. But this is more of a heuristic than a guarantee. The statistical distance measures a global, but not local, similarity between two distributions. Even if it is small, it does not imply a plausible mode coverage.


Adding comparison with other uncoupled regression: Since Reviewers 1 and 4 share the concern on the SVMRank

Neural Information Processing Systems

We thank the reviewers for their thoughtful and useful feedback. We will also fix minor typos in the final version of the paper. However, we may obtain some linear model by using the methods discussed in Hsu et al. [1] or Pananjady As you suspect, the discussion should have gone for Figure 2. We will fix it in the final version. The error bars seem strange: It is because we used log-plot in figures. Hence, we compare our methods to the SVMRank benchmark, which is the closest to our setting.


Multivariate Triangular Quantile Maps for Novelty Detection Jingjing Wang 1, Sun Sun 2 University of Waterloo 1

Neural Information Processing Systems

Novelty detection, a fundamental task in machine learning, has drawn a lot of recent attention due to its wide-ranging applications and the rise of neural approaches. In this work, we present a general framework for neural novelty detection that centers around a multivariate extension of the univariate quantile function. Our framework unifies and extends many classical and recent novelty detection algorithms, and opens the way to exploit recent advances in flow-based neural density estimation. We adapt the multiple gradient descent algorithm to obtain the first efficient endto-end implementation of our framework that is free of tuning hyperparameters. Extensive experiments over a number of real datasets confirm the efficacy of our proposed method against state-of-the-art alternatives.


Sampling Sketches for Concave Sublinear Functions of Frequencies

Neural Information Processing Systems

We consider massive distributed datasets that consist of elements modeled as keyvalue pairs and the task of computing statistics or aggregates where the contribution of each key is weighted by a function of its frequency (sum of values of its elements). This fundamental problem has a wealth of applications in data analytics and machine learning, in particular, with concave sublinear functions of the frequencies that mitigate the disproportionate effect of keys with high frequency. The family of concave sublinear functions includes low frequency moments ( 1), capping, logarithms, and their compositions. A common approach is to sample keys, ideally, proportionally to their contributions and estimate statistics from the sample. A simple but costly way to do this is by aggregating the data to produce a table of keys and their frequencies, apply our function to the frequency values, and then apply a weighted sampling scheme. Our main contribution is the design of composable sampling sketches that can be tailored to any concave sublinear function of the frequencies. Our sketch structure size is very close to the desired sample size and our samples provide statistical guarantees on the estimation quality that are very close to that of an ideal sample of the same size computed over aggregated data. Finally, we demonstrate experimentally the simplicity and effectiveness of our methods.