AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control

Sai Qian Zhang, Qi Zhang, Jieyu Lin

Neural Information Processing SystemsOct-2-2025, 04:32:21 GMT

Multi-agent reinforcement learning (MARL) has recently received considerable attention due to its applicability to a wide range of real-world applications.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Learning dynamic polynomial proofs

Alhussein Fawzi, Mateusz Malinowski, Hamza Fawzi, Omar Fawzi

Neural Information Processing SystemsOct-2-2025, 04:31:44 GMT

Polynomial inequalities lie at the heart of many mathematical disciplines.

logic & formal reasoning, machine learning, polynomial, (20 more...)

Neural Information Processing Systems

Country: Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Table 3 List of key terms for reinforcement learning

Neural Information Processing SystemsOct-2-2025, 03:32:40 GMT

Linear Regression (LR), and Support V ector Machine (SVM) are the ML algorithms in comparison. We use data from Dow Jones 30 constituent stocks to construct the environment.

finrl-meta, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre: Workflow (0.46)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (1.00)
Education > Educational Setting > Online (0.46)
Education > Educational Technology > Educational Software > Computer Based Training (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

12ffb0968f2f56e51a59a6beb37b2859-Paper.pdf

Neural Information Processing SystemsOct-2-2025, 03:32:22 GMT

The choice of the model's prediction horizon constitutes

machine learning, neural information processing system, reinforcement learning, (11 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

12ffb0968f2f56e51a59a6beb37b2859-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 03:32:11 GMT

We thank the reviewers for their insights and suggestions. Answers below will be included in expanded discussions in future versions of the paper. In the case of R3's car example, as long as states from 10 steps into the future are sampled This is discussed in L211-L215 in Section 6 "Practical Training of γ -Models". The only Monte Carlo trajectory estimates are in the final column for comparison.

experiment, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.32)

Add feedback

0be44cc1d459731928501cae5699f57a-Paper-Conference.pdf

Neural Information Processing SystemsOct-2-2025, 03:23:14 GMT

auxiliary loss, evolutionary algorithm, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Quebec > Montreal (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)
(2 more...)

Add feedback

Discovery of Useful Questions as Auxiliary Tasks

Vivek Veeriah, Matteo Hessel, Zhongwen Xu, Janarthanan Rajendran, Richard L. Lewis, Junhyuk Oh, Hado P. van Hasselt, David Silver, Satinder Singh

Neural Information Processing SystemsOct-2-2025, 03:10:28 GMT

Arguably, intelligent agents ought to be able to discover their own questions so that in learning answers for them they learn unanticipated useful knowledge and skills; this departs from the focus in much of machine learning on agents learning answers to externally defined questions. We present a novel method for a reinforcement learning (RL) agent to discover questions formulated as general value functions or GVFs, a fairly rich form of knowledge representation. Specifically, our method uses non-myopic meta-gradients to learn GVF-questions such that learning answers to them, as an auxiliary task, induces useful representations for the main task faced by the RL agent. We demonstrate that auxiliary tasks based on the discovered GVFs are sufficient, on their own, to build representations that support main task learning, and that they do so better than popular hand-designed auxiliary tasks from the literature. Furthermore, we show, in the context of Atari 2600 videogames, how such auxiliary tasks, meta-learned alongside the main task, can improve the data efficiency of an actor-critic agent.

auxiliary task, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.46)
North America > United States (0.28)

Genre: Research Report (0.88)

Industry: Leisure & Entertainment > Games > Computer Games (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs

Max Simchowitz, Kevin G. Jamieson

Neural Information Processing SystemsOct-2-2025, 03:03:17 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Add feedback

Sample Complexity of Episodic Fixed-Horizon Reinforcement Learning

Neural Information Processing SystemsOct-2-2025, 03:01:38 GMT

Recently, there has been significant progress in understanding reinforcement learning in discounted infinite-horizon Markov decision processes (MDPs) by deriving tight sample complexity bounds. However, in many real-world applications, an interactive learning agent operates for a fixed or bounded period of time, for example tutoring students for exams or handling customer service requests. Such scenarios can often be better treated as episodic fixed-horizon MDPs, for which only looser bounds on the sample complexity exist. A natural notion of sample complexity in this setting is the number of episodes required to guarantee a certain performance with high probability (P AC guarantee).

probability, reinforcement learning, sample complexity, (12 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

Add feedback

Learning to Dispatch for Job Shop Scheduling via Deep Reinforcement Learning Cong Zhang 1, Wen Song

Neural Information Processing SystemsOct-2-2025, 02:53:11 GMT

In the paper, we adopt the Proximal Policy Optimization (PPO) algorithm [36] to train our agent. Here we provide details of our algorithm in terms of pseudo code, as shown in Algorithm 1. Similar In this section, we show how the baseline PDRs compute the priority index for the operations. Here we present the complete results on Taillard's benchmark. In Table S.1, we report the results of In Table S.2, we report the generalization performance of our polices trained on The "UB" column is the best solution from The "UB" column is the best solution from Similar conclusion can be drawn from results on DMU benchmark. In Table S.3, we report results In Table S.4 which focuses on The "UB" column is the best solution from The "UB" column is the best solution from We show training curves for all problems in Figure.1.

artificial intelligence, machine learning, reinforcement learning cong zhang 1, (9 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.86)

Add feedback