AITopics | state representation

Learning State Representations from Random Deep Action-conditional Predictions

Neural Information Processing SystemsApr-27-2026, 00:28:22 GMT

Our main contribution in this work is an empirical finding that random General Value Functions (GVFs), i.e., deep action-conditional predictions--random both in what feature of observations they predict as well as in the sequence of actions the predictions are conditioned upon--form good auxiliary tasks for reinforcement learning (RL) problems. In particular, we show that random deep action-conditional predictions when used as auxiliary tasks yield state representations that produce control performance competitive with state-of-the-art hand-crafted auxiliary tasks like value prediction, pixel control, and CURL in both Atari and DeepMind Lab tasks. In another set of experiments we stop the gradients from the RL part of the network to the state representation learning part of the network and show, perhaps surprisingly, that the auxiliary tasks alone are sufficient to learn state representations good enough to outperform an end-to-end trained actor-critic baseline.

machine learning, natural language, reinforcement learning, (19 more...)

Neural Information Processing Systems

Country:

North America > United States (1.00)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

454cecc4829279e64d624cd8a8c9ddf1-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 16:05:06 GMT

However, in domains where precise and succinct expert state information is available, agents trained onsuchexpert state features usually outperform agents trained onrichobservations.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

26405399c51ad7b13b504e74eb7c696c-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 04:20:31 GMT

artificial intelligence, machine learning, nursing report, (16 more...)

Neural Information Processing Systems

Genre: Workflow (0.50)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Health Care Providers & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Mitigating Partial Observability in Sequential Decision Processes via the Lambda Discrepancy

Neural Information Processing SystemsMar-21-2026, 03:42:34 GMT

Reinforcement learning algorithms typically rely on the assumption that the environment dynamics and value function can be expressed in terms of a Markovian state representation. However, when state information is only partially observable, how can an agent learn such a state representation, and how can it detect when it has found one? We introduce a metric that can accomplish both objectives, without requiring access to---or knowledge of---an underlying, unobservable state space. Our metric, the λ-discrepancy, is the difference between two distinct temporal difference (TD) value estimates, each computed using TD(λ) with a different value of λ. Since TD(λ=0) makes an implicit Markov assumption and TD(λ=1) does not, a discrepancy between these estimates is a potential indicator of a non-Markovian state representation. Indeed, we prove that the λ-discrepancy is exactly zero for all Markov decision processes and almost always non-zero for a broad class of partially observable environments. We also demonstrate empirically that, once detected, minimizing the λ-discrepancy can help with learning a memory function to mitigate the corresponding partial observability. We then train a reinforcement learning agent that simultaneously constructs two recurrent value networks with different λ parameters and minimizes the difference between them as an auxiliary loss. The approach scales to challenging partially observable domains, where the resulting agent frequently performs significantly better (and never performs worse) than a baseline recurrent agent with only a single value network.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Completing State Representations using Spectral Learning

Neural Information Processing SystemsMar-16-2026, 20:54:41 GMT

A central problem in dynamical system modeling is state discovery--that is, finding a compact summary of the past that captures the information needed to predict the future. Predictive State Representations (PSRs) enable clever spectral methods for state discovery; however, while consistent in the limit of infinite data, these methods often suffer from poor performance in the low data regime. In this paper we develop a novel algorithm for incorporating domain knowledge, in the form of an imperfect state representation, as side information to speed spectral learning for PSRs. We prove theoretical results characterizing the relevance of a user-provided state representation, and design spectral algorithms that can take advantage of a relevant representation. Our algorithm utilizes principal angles to extract the relevant components of the representation, and is robust to misspecification. Empirical evaluation on synthetic HMMs, an aircraft identification domain, and a gene splice dataset shows that, even with weak domain knowledge, the algorithm can significantly outperform standard PSR learning.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

Mask-basedLatentReconstructionforReinforcement Learning

Neural Information Processing SystemsFeb-19-2026, 09:10:12 GMT

Second, visual signals usually havehigh information densities, which may contain distractions and redundancies for policylearning.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

Unified Off-Policy Learning to Rank: a Reinforcement Learning Perspective

Neural Information Processing SystemsFeb-19-2026, 08:00:47 GMT

Our key insight is that the user's examination and click behavior

information retrieval, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Oregon (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

A Architectures, Hyper-parameters and Algorithms

Neural Information Processing SystemsFeb-17-2026, 04:43:28 GMT

Our approach, named ORDER, uses a three-step training process. In the next parts of this section, we'll explain the methods, structures, and settings we use in each of After that, we'll talk about how we set up and carried out our experiments. In this section, we'll break down the design of the state encoder, how we decided on the best We used a grid search strategy to find the optimal hyper-parameters for our experiments. This allowed each observation dimension to match up with a state factor. We summarize the training process in Algorithm 1.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

Neural Information Processing Systems

Technology: