AITopics | Reinforcement Learning

Collaborating Authors

Reinforcement Learning

"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them."
– Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.

News Overviews Instructional Materials AI-Alerts Classics

NetworkGym: Reinforcement Learning Environments

Neural Information Processing SystemsOct-10-2025, 15:31:30 GMT

We make use of four internal 12 GB NVIDIA TIT AN Xp GPUs to perform our experiments. At initialization of each environment, four UEs are randomly stationed 1.5 meters above the The L TE base station lies at ( x, z) = (40 m, 3m) . We use random seed values from 0 to 63, inclusive, for this parameter. Do not distribute. of four We train PTD3 for 10,000 steps, instead of 1,000,000 steps, which we do for TD3+BC.

algorithm, offline dataset, reinforcement learning environment, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County > San Diego (0.05)

Industry: Education (0.51)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)

Add feedback

NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation Momin Haider UC, Santa Barbara Ming Yin

Neural Information Processing SystemsOct-10-2025, 15:31:27 GMT

Mobile devices such as smartphones, laptops, and tablets can often connect to multiple access networks (e.g., Wi-Fi, L TE, and 5G) simultaneously. Recent advancements facilitate seamless integration of these connections below the transport layer, enhancing the experience for apps that lack inherent multi-path support. This optimization hinges on dynamically determining the traffic distribution across networks for each device, a process referred to as multi-access traffic splitting. This paper introduces NetworkGym, a high-fidelity network environment simulator that facilitates generating multiple network traffic flows and multi-access traffic splitting.

algorithm, dataset, offline rl algorithm, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.93)
Telecommunications (0.88)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

bec8b667016a73bb195b611aa1f41026-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 15:22:16 GMT

algorithm, dbmr-bpi, optimal policy, (13 more...)

Neural Information Processing Systems

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Information Technology (0.92)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(7 more...)

Add feedback

E-Motion: Future Motion Simulation via Event Sequence Diffusion Song Wu

Neural Information Processing SystemsOct-10-2025, 15:22:06 GMT

Forecasting a typical object's future motion is a critical task for interpreting and

diffusion model, information, sequence, (14 more...)

Neural Information Processing Systems

Country:

Asia > China > Hong Kong (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Self-Labeling the Job Shop Scheduling Problem

Neural Information Processing SystemsOct-10-2025, 15:21:55 GMT

An obstacle in applying supervised paradigms to such problems is the need for costly target solutions often produced with exact solvers.

algorithm, information, opération, (16 more...)

Neural Information Processing Systems

Country: Europe > Italy (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Reinforcing LLM Agents via Policy Optimization with Action Decomposition

Neural Information Processing SystemsOct-10-2025, 15:00:45 GMT

Beginning with the simplification of flattening all actions, we theoretically explore the discrepancies between action-level optimization and this naive token-level optimization.

agent, language model, optimization, (12 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education > Curriculum > Subject-Specific Education (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Integrating Suboptimal Human Knowledge with Hierarchical Reinforcement Learning for Large-Scale Multiagent Systems

Neural Information Processing SystemsOct-10-2025, 14:40:24 GMT

While agents' learning is data-driven, sampling from millions

agent, human knowledge, knowledge, (13 more...)

Neural Information Processing Systems

Country: Asia > Japan (0.04)

Genre:

Research Report > Experimental Study (0.93)
Overview (0.67)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

b76bec34ef5e0c0ceedff6edfbefc9f5-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 14:21:01 GMT

algorithm, dynamic programming, representation, (16 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

b6e271e596574f2b2dfadec6b3ba22a4-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 14:13:48 GMT

algorithm, equilibrium, nash equilibrium, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > United States > Hawaii (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

SustainDC: Benchmarking for Sustainable Data Center Control Supplementary Information

Neural Information Processing SystemsOct-10-2025, 14:12:41 GMT

E-14 F Reward Evaluation and Customization F-19 F.1 Load Shifting Penalty ( LS F-19 F.2 Default Reward Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-19 F.3 Customization of Reward Formulations . . . . . . . . . . . . . . . . . . . . . . . Current Workload - The current workload level, which includes both flexible and non-flexible components. The data center modeled is illustrated in Figure 1. The hot air exits the cabinets and returns to the CRAH via the ceiling.

agent, battery, energy consumption, (14 more...)

Neural Information Processing Systems

Country: