movi
Next state prediction gives rise to entangled, yet compositional representations of objects
Saanum, Tankred, Buschoff, Luca M. Schulze, Dayan, Peter, Schulz, Eric
A BSTRACT Compositional representations are thought to enable humans to generalize across combinatorially vast state spaces. Models with learnable object slots, which encode information about objects in separate latent codes, have shown promise for this type of generalization but rely on strong architectural priors. Models with distributed representations, on the other hand, use overlapping, potentially entangled neural codes, and their ability to support compositional generalization remains underexplored. In this paper we examine whether distributed models can develop linearly separable representations of objects, like slotted models, through unsupervised training on videos of object interactions. We show that, surprisingly, models with distributed representations often match or outperform models with object slots in downstream prediction tasks. Furthermore, we find that linearly separable object representations can emerge without object-centric priors, with auxiliary objectives like next-state prediction playing a key role. Finally, we observe that distributed models' object representations are never fully disentangled, even if they are linearly separable: Multiple objects can be encoded through partially overlapping neural populations while still being highly separable with a linear classifier. We hypothesize that maintaining partially shared codes enables distributed models to better compress object dynamics, potentially enhancing generalization. 1 I NTRODUCTION Humans naturally decompose scenes, events and processes in terms of the objects that feature in them (Tenenbaum et al., 2011; Lake et al., 2017). These object-centric construals have been argued to explain humans' ability to reason and generalize successfully (Goodman et al., 2008; Lake et al., 2015; Schulze Buschoff et al., 2023). It has therefore long been a chief aim in machine learning research to design models and agents that learn to represent the world compositionally, e.g. in terms of the building blocks that compose it.
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Heuristic Search for Multi-Objective Probabilistic Planning
Chen, Dillon, Trevizan, Felipe, Thiébaux, Sylvie
Heuristic search is a powerful approach that has successfully been applied to a broad class of planning problems, including classical planning, multi-objective planning, and probabilistic planning modelled as a stochastic shortest path (SSP) problem. Here, we extend the reach of heuristic search to a more expressive class of problems, namely multi-objective stochastic shortest paths (MOSSPs), which require computing a coverage set of non-dominated policies. We design new heuristic search algorithms MOLAO* and MOLRTDP, which extend well-known SSP algorithms to the multi-objective case. We further construct a spectrum of domain-independent heuristic functions differing in their ability to take into account the stochastic and multi-objective features of the problem to guide the search. Our experiments demonstrate the benefits of these algorithms and the relative merits of the heuristics.
- North America > United States > Oklahoma > Payne County > Cushing (0.04)
- Europe > United Kingdom (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
SparCA: Sparse Compressed Agglomeration for Feature Extraction and Dimensionality Reduction
Barnard, Leland, Ali, Farwa, Botha, Hugo, Jones, David T.
The most effective dimensionality reduction procedures produce interpretable features from the raw input space while also providing good performance for downstream supervised learning tasks. For many methods, this requires optimizing one or more hyperparameters for a specific task, which can limit generalizability. In this study we propose sparse compressed agglomeration (SparCA), a novel dimensionality reduction procedure that involves a multistep hierarchical feature grouping, compression, and feature selection process. We demonstrate the characteristics and performance of the SparCA method across heterogenous synthetic and real-world datasets, including images, natural language, and single cell gene expression data. Our results show that SparCA is applicable to a wide range of data types, produces highly interpretable features, and shows compelling performance on downstream supervised learning tasks without the need for hyperparameter tuning.
- Health & Medicine > Therapeutic Area > Neurology (0.94)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.91)
- Media > Film (0.69)
- Health & Medicine > Therapeutic Area > Oncology > Brain Cancer (0.47)
Conditional Object-Centric Learning from Video
Kipf, Thomas, Elsayed, Gamaleldin F., Mahendran, Aravindh, Stone, Austin, Sabour, Sara, Heigold, Georg, Jonschkowski, Rico, Dosovitskiy, Alexey, Greff, Klaus
Object-centric representations are a promising path toward more systematic generalization by providing flexible abstractions upon which compositional world models can be built. Recent work on simple 2D and 3D datasets has shown that models with object-centric inductive biases can learn to segment and represent meaningful objects from the statistical structure of the data alone without the need for any supervision. However, such fully-unsupervised methods still fail to scale to diverse realistic data, despite the use of increasingly complex inductive biases such as priors for the size of objects or the 3D geometry of the scene. In this paper, we instead take a weakly-supervised approach and focus on how 1) using the temporal dynamics of video data in the form of optical flow and 2) conditioning the model on simple object location cues can be used to enable segmenting and tracking objects in significantly more realistic synthetic data. We introduce a sequential extension to Slot Attention which we train to predict optical flow for realistic looking synthetic scenes and show that conditioning the initial state of this model on a small set of hints, such as center of mass of objects in the first frame, is sufficient to significantly improve instance segmentation. These benefits generalize beyond the training distribution to novel objects, novel backgrounds, and to longer video sequences. We also find that such initial-state-conditioning can be used during inference as a flexible interface to query the model for specific objects or parts of objects, which could pave the way for a range of weakly-supervised approaches and allow more effective interaction with trained models.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
Momentum in Reinforcement Learning
Vieillard, Nino, Scherrer, Bruno, Pietquin, Olivier, Geist, Matthieu
We adapt the optimization's concept of momentum to reinforcement learning. Seeing the state-action value functions as an analog to the gradients in optimization, we interpret momentum as an average of consecutive $q$-functions. We derive Momentum Value Iteration (MoVI), a variation of Value Iteration that incorporates this momentum idea. Our analysis shows that this allows MoVI to average errors over successive iterations. We show that the proposed approach can be readily extended to deep learning. Specifically, we propose a simple improvement on DQN based on MoVI, and experiment it on Atari games.
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)
MOVI: A Model-Free Approach to Dynamic Fleet Management
Modern vehicle fleets, e.g., for ridesharing platforms and taxi companies, can reduce passengers' waiting times by proactively dispatching vehicles to locations where pickup requests are anticipated in the future. Yet it is unclear how to best do this: optimal dispatching requires optimizing over several sources of uncertainty, including vehicles' travel times to their dispatched locations, as well as coordinating between vehicles so that they do not attempt to pick up the same passenger. While prior works have developed models for this uncertainty and used them to optimize dispatch policies, in this work we introduce a model-free approach. Specifically, we propose MOVI, a Deep Q-network (DQN)-based framework that directly learns the optimal vehicle dispatch policy. Since DQNs scale poorly with a large number of possible dispatches, we streamline our DQN training and suppose that each individual vehicle independently learns its own optimal policy, ensuring scalability at the cost of less coordination between vehicles. We then formulate a centralized receding-horizon control (RHC) policy to compare with our DQN policies. To compare these policies, we design and build MOVI as a large-scale realistic simulator based on 15 million taxi trip records that simulates policy-agnostic responses to dispatch decisions. We show that the DQN dispatch policy reduces the number of unserviced requests by 76% compared to without dispatch and 20% compared to the RHC approach, emphasizing the benefits of a model-free approach and suggesting that there is limited value to coordinating vehicle actions. This finding may help to explain the success of ridesharing platforms, for which drivers make individual decisions.
- North America > United States > New York (0.05)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)