AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Neural Information Processing SystemsFeb-10-2026, 12:32:10 GMT

a7f0d2b95c60161b3f3c82f764b1d1c9-Supplemental.pdf

agent, reward function, xp rd, (13 more...)

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (0.67)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Neural Information Processing SystemsAug-20-2025, 09:32:59 GMT

f69041d874533096748e2d77480c1fea-AuthorFeedback.pdf

algorithm, efficiency, reward function, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.34)

Neural Information Processing SystemsAug-16-2025, 15:13:16 GMT

Explicable Reward Design for Reinforcement Learning Agents

A reward function plays the central role during the learning/training process of a reinforcement learning (RL) agent. Given a "task" the agent is expected to perform (i.e., the desired learning outcome), there are typically many different reward specifications under which an optimal policy

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Saarland > Saarbrücken (0.04)

Genre: Research Report (0.67)

Industry: Education (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Adamczyk, Jacob, Makarenko, Volodymyr, Tiomkin, Stas, Kulkarni, Rahul V.

Bootstrapped Reward Shaping

arXiv.org Artificial IntelligenceJan-1-2025

In reinforcement learning, especially in sparse-reward domains, many environment steps are required to observe reward information. In order to increase the frequency of such observations, "potential-based reward shaping" (PBRS) has been proposed as a method of providing a more dense reward signal while leaving the optimal policy invariant. However, the required "potential function" must be carefully designed with task-dependent knowledge to not deter training performance. In this work, we propose a "bootstrapped" method of reward shaping, termed BSRS, in which the agent's current estimate of the state-value function acts as the potential function for PBRS. We provide convergence proofs for the tabular setting, give insights into training dynamics for deep RL, and show that the proposed method improves training speed in the Atari suite.

potential function, reward function, value function, (13 more...)

2501.00989

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Texas (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Pöllabauer, Thomas, Knauthe, Volker, Boller, André, Kuijper, Arjan, Fellner, Dieter

Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying

arXiv.org Artificial IntelligenceMay-13-2024

Deep Neural Networks (DNNs) require large amounts of annotated training data for a good performance. Often this data is generated using manual labeling (error-prone and time-consuming) or rendering (requiring geometry and material information). Both approaches make it difficult or uneconomic to apply them to many small-scale applications. A fast and straightforward approach of acquiring the necessary training data would allow the adoption of deep learning to even the smallest of applications. Chroma keying is the process of replacing a color (usually blue or green) with another background. Instead of chroma keying, we propose luminance keying for fast and straightforward training image acquisition. We deploy a black screen with high light absorption (99.99\%) to record roughly 1-minute long videos of our target objects, circumventing typical problems of chroma keying, such as color bleeding or color overlap between background color and object color. Next we automatically mask our objects using simple brightness thresholding, saving the need for manual annotation. Finally, we automatically place the objects on random backgrounds and train a 2D object detector. We do extensive evaluation of the performance on the widely-used YCB-V object set and compare favourably to other conventional techniques such as rendering, without needing 3D meshes, materials or any other information of our target objects and in a fraction of the time needed for other approaches. Our work demonstrates highly accurate training data acquisition allowing to start training state-of-the-art networks within minutes.

background, background replacement, chroma, (12 more...)

2405.07653

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
North America > United States > Ohio (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Forbes, Grant C., Gupta, Nitish, Villalobos-Arias, Leonardo, Potts, Colin M., Jhala, Arnav, Roberts, David L.

Potential-Based Reward Shaping For Intrinsic Motivation

arXiv.org Artificial IntelligenceFeb-12-2024

Recently there has been a proliferation of intrinsic motivation (IM) reward-shaping methods to learn in complex and sparse-reward environments. These methods can often inadvertently change the set of optimal policies in an environment, leading to suboptimal behavior. Previous work on mitigating the risks of reward shaping, particularly through potential-based reward shaping (PBRS), has not been applicable to many IM methods, as they are often complex, trainable functions themselves, and therefore dependent on a wider set of variables than the traditional reward functions that PBRS was developed for. We present an extension to PBRS that we prove preserves the set of optimal policies under a more general set of functions than has been previously proven. We also present {\em Potential-Based Intrinsic Motivation} (PBIM), a method for converting IM rewards into a potential-based form that is useable without altering the set of optimal policies. Testing in the MiniGrid DoorKey and Cliff Walking environments, we demonstrate that PBIM successfully prevents the agent from converging to a suboptimal policy and can speed up training.

agent, optimal policy, time step, (13 more...)

2402.07411

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)

arXiv.org Artificial IntelligenceJul-19-2023

Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

Jeon, Se Hwan, Heim, Steve, Khazoom, Charles, Kim, Sangbae

The main challenge in developing effective reinforcement learning (RL) pipelines is often the design and tuning the reward functions. Well-designed shaping reward can lead to significantly faster learning. Naively formulated rewards, however, can conflict with the desired behavior and result in overfitting or even erratic performance if not properly tuned. In theory, the broad class of potential based reward shaping (PBRS) can help guide the learning process without affecting the optimal policy. Although several studies have explored the use of potential based reward shaping to accelerate learning convergence, most have been limited to grid-worlds and low-dimensional systems, and RL in robotics has predominantly relied on standard forms of reward shaping. In this paper, we benchmark standard forms of shaping with PBRS for a humanoid robot. We find that in this high-dimensional system, PBRS has only marginal benefits in convergence speed. However, the PBRS reward terms are significantly more robust to scaling than typical reward shaping approaches, and thus easier to tune.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

doi: 10.1109/ICRA48891.2023.10160885

2307.10142

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Chen, Zhiang, Keating, Devin, Shethwala, Yash, Saravanakumaran, Aravind Adhith Pandian, Arrowsmith, Ramon, Kottke, Albert, Wittich, Christine, Das, Jnaneshwar

Shakebot: A Low-cost, Open-source Robotic Shake Table for Earthquake Research and Education

arXiv.org Artificial IntelligenceFeb-28-2023

Shake tables provide a critical tool for simulating earthquake events and testing the response of structures to seismic forces. However, existing shake tables are either expensive or proprietary. This paper presents the design and implementation of a low-cost, open-source shake table named Shakebot for earthquake engineering research and education, built using Robot Operating System (ROS) and robotic concepts. The Shakebot adapts affordable and high-accuracy components from 3D printers, particularly a closed-loop stepper motor for actuation and a toothed belt for transmission. The stepper motor enables the bed to reach a maximum horizontal acceleration of 11.8 m/s^2 (1.2 g), and velocity of 0.5 m/s, with a 2 kg specimen. The Shakebot is equipped with an accelerometer and a high frame-rate camera for bed motion estimation. The low cost and easy use make the Shakebot accessible to a wide range of users, including students, educators, and researchers in low-resource settings. An important application of the Shakebot is to examine the dynamics of precariously balanced rocks (PBRs), which are negative indicators of earthquakes in nature. Our earlier research built a virtual shake robot in simulation for the PBR study. The Shakebot provides an approach to validate the simulation through physical experiments. The ROS-based perception and motion software facilitates the code transition from our virtual shake robot to the physical Shakebot. The reuse of the control programs ensures that the implemented ground motions are consistent for both the simulation and physical experiments, which is critical to validate our simulation experiments.

artificial intelligence, displacement, experiment, (13 more...)

2212.10763

Country: North America > United States > California (0.29)

Genre: Research Report (0.82)

Industry: Energy (0.89)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

#artificialintelligenceJun-18-2021, 02:09:47 GMT

The Future of AI in Call Centers

Perhaps one of the most sophisticated and valuable elements of AI in call centers is something called predictive behavioral routing (PBR). PBR first came about in 2014 and is designed to connect consumers with agents most equipped to handle certain personality types. The technology listens to a customer's words and tone. It creates a customer profile, which then allows it to route the call to a specific agent rather than a random one, which ultimately leads to a better customer experience. The more times PBR is used, the more customer profiles it's able to create, thus allowing businesses to match customer profiles with the right employee. This creates positive, natural, and tailored interactions to a customer's personality, so they're more likely to feel helped.

call center, customer profile, pbr, (1 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > The Future (0.40)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.40)