shaping
OASIS: Conditional Distribution Shaping for Offline Safe Reinforcement Learning
Offline safe reinforcement learning (RL) aims to train a policy that satisfies constraints using a pre-collected dataset. Most current methods struggle with the mismatch between imperfect demonstrations and the desired safe and rewarding performance. In this paper, we mitigate this issue from a data-centric perspective and introduce OASIS (cOnditionAl diStributIon Shaping), a new paradigm in offline safe RL designed to overcome these critical limitations. OASIS utilizes a conditional diffusion model to synthesize offline datasets, thus shaping the data distribution toward a beneficial target domain. Our approach makes compliance with safety constraints through effective data utilization and regularization techniques to benefit offline safe RL training. Comprehensive evaluations on public benchmarks and varying datasets showcase OASIS's superiority in benefiting offline safe RL agents to achieve high-reward behavior while satisfying the safety constraints, outperforming established baselines. Furthermore, OASIS exhibits high data efficiency and robustness, making it suitable for real-world applications, particularly in tasks where safety is imperative and high-quality demonstrations are scarce. More details are available at the website https://sites.google.com/view/saferl-oasis/home.
How LLMs are Shaping the Future of Virtual Reality
Özkaya, Süeda, Berrezueta-Guzman, Santiago, Wagner, Stefan
The integration of Large Language Models (LLMs) into Virtual Reality (VR) games marks a paradigm shift in the design of immersive, adaptive, and intelligent digital experiences. This paper presents a comprehensive review of recent research at the intersection of LLMs and VR, examining how these models are transforming narrative generation, non-player character (NPC) interactions, accessibility, personalization, and game mastering. Drawing from an analysis of 62 peer reviewed studies published between 2018 and 2025, we identify key application domains ranging from emotionally intelligent NPCs and procedurally generated storytelling to AI-driven adaptive systems and inclusive gameplay interfaces. We also address the major challenges facing this convergence, including real-time performance constraints, memory limitations, ethical risks, and scalability barriers. Our findings highlight that while LLMs significantly enhance realism, creativity, and user engagement in VR environments, their effective deployment requires robust design strategies that integrate multimodal interaction, hybrid AI architectures, and ethical safeguards. The paper concludes by outlining future research directions in multimodal AI, affective computing, reinforcement learning, and open-source development, aiming to guide the responsible advancement of intelligent and inclusive VR systems.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Hardware (1.00)
- (2 more...)
Shaping the future with adaptive production
As efforts to revive and modernize local manufacturing accelerate in regions around the world, including North America and Europe, adaptive production could help manufacturers overcome some of their biggest obstacles--firstly, attracting and retaining talent. Nearly 60% of manufacturers cited this as their top challenge in a 2024 US-based survey. Highly automated, technology-led adaptive production methods hold new promise for attracting talent to roles that are safer, less repetitive, and better paid. "The ideal scenario is one where AI enhances human capabilities, leads to new task creation, and empowers the people who are most at risk from automation's impact on certain jobs, particularly those without college degrees," says Simon Johnson, co-director of MIT's Shaping the Future of Work Initiative. Secondly, the digitalization of manufacturing--embedded in the very foundation of adaptive production technologies--allows companies to better address complex sustainability challenges through process and resource optimization and a better understanding of data.
- North America (0.27)
- Europe (0.27)
Shaping the distribution of neural responses with interneurons in a recurrent circuit model
Efficient coding theory posits that sensory circuits transform natural signals into neural representations that maximize information transmission subject to resource constraints. Local interneurons are thought to play an important role in these transformations, shaping patterns of circuit activity to facilitate and direct information flow. However, the relationship between these coordinated, nonlinear, circuit-level transformations and the properties of interneurons (e.g., connectivity, activation functions) remains unknown. Here, we propose a normative computational model that establishes such a relationship. Our model is derived from an optimal transport objective that conceptualizes the circuit's input-response function as transforming the inputs to achieve a target response distribution.
Shaping embodied agent behavior with activity-context priors from egocentric video
Complex physical tasks entail a sequence of object interactions, each with its own preconditions -- which can be difficult for robotic agents to learn efficiently solely through their own experience. We introduce an approach to discover activity-context priors from in-the-wild egocentric video captured with human worn cameras. For a given object, an activity-context prior represents the set of other compatible objects that are required for activities to succeed (e.g., a knife and cutting board brought together with a tomato are conducive to cutting). We encode our video-based prior as an auxiliary reward function that encourages an agent to bring compatible objects together before attempting an interaction. In this way, our model translates everyday human experience into embodied agent skills.
BAMDP Shaping: a Unified Theoretical Framework for Intrinsic Motivation and Reward Shaping
Lidayan, Aly, Dennis, Michael, Russell, Stuart
Intrinsic motivation (IM) and reward shaping are common methods for guiding the exploration of reinforcement learning (RL) agents by adding pseudo-rewards. Designing these rewards is challenging, however, and they can counter-intuitively harm performance. To address this, we characterize them as reward shaping in Bayes-Adaptive Markov Decision Processes (BAMDPs), which formalizes the value of exploration by formulating the RL process as updating a prior over possible MDPs through experience. RL algorithms can be viewed as BAMDP policies; instead of attempting to find optimal algorithms by solving BAMDPs directly, we use it at a theoretical framework for understanding how pseudo-rewards guide suboptimal algorithms. By decomposing BAMDP state value into the value of the information collected plus the prior value of the physical state, we show how psuedo-rewards can help by compensating for RL algorithms' misestimation of these two terms, yielding a new typology of IM and reward shaping approaches. We carefully extend the potential-based shaping theorem to BAMDPs to prove that when pseudo-rewards are BAMDP Potential-based shaping Functions (BAMPFs), they preserve optimal, or approximately optimal, behavior of RL algorithms; otherwise, they can corrupt even optimal learners. We finally give guidance on how to design or convert existing pseudo-rewards to BAMPFs by expressing assumptions about the environment as potential functions on BAMDP states.
Shaping the Outlook for the Autonomy Economy
The Autonomy Economy represents a transformative phase in our society, driven by the integration of autonomous machines such as autonomous vehicles, delivery robots, drones, and more into the provision of goods and services. Central to this revolution is Autonomous Machine Computing (AMC), the computing technological backbone enabling these diverse autonomous systems. This article delves into AMC's critical role in fostering the Autonomy Economy. Originally confined to basic robotics and industrial applications, these autonomous machines now permeate everyday life, signaling a move towards the Autonomy Economy era. For example, in China, when you check into a hotel, it's likely that a delivery robot is going to bring what you need to your room.
- Transportation (0.58)
- Banking & Finance > Economy (0.38)
Policy Shaping: Integrating Human Feedback with Reinforcement Learning
A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)
Scaling Opponent Shaping to High Dimensional Games
Khan, Akbir, Willi, Timon, Kwan, Newton, Tacchetti, Andrea, Lu, Chris, Grefenstette, Edward, Rocktäschel, Tim, Foerster, Jakob
In multi-agent settings with mixed incentives, methods developed for zero-sum games have been shown to lead to detrimental outcomes. To address this issue, opponent shaping (OS) methods explicitly learn to influence the learning dynamics of co-players and empirically lead to improved individual and collective outcomes. However, OS methods have only been evaluated in low-dimensional environments due to the challenges associated with estimating higher-order derivatives or scaling model-free meta-learning. Alternative methods that scale to more complex settings either converge to undesirable solutions or rely on unrealistic assumptions about the environment or co-players. In this paper, we successfully scale an OS-based approach to general-sum games with temporally-extended actions and long-time horizons for the first time. After analysing the representations of the meta-state and history used by previous algorithms, we propose a simplified version called Shaper. We show empirically that Shaper leads to improved individual and collective outcomes in a range of challenging settings from literature. We further formalize a technique previously implicit in the literature, and analyse its contribution to opponent shaping. We show empirically that this technique is helpful for the functioning of prior methods in certain environments. Lastly, we show that previous environments, such as the CoinGame, are inadequate for analysing temporally-extended general-sum interactions.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Leisure & Entertainment > Games (0.93)
- Education (0.67)
Informativeness of Reward Functions in Reinforcement Learning
Devidze, Rati, Kamalaruban, Parameswaran, Singla, Adish
Reward functions are central in specifying the task we want a reinforcement learning agent to perform. Given a task and desired optimal behavior, we study the problem of designing informative reward functions so that the designed rewards speed up the agent's convergence. In particular, we consider expert-driven reward design settings where an expert or teacher seeks to provide informative and interpretable rewards to a learning agent. Existing works have considered several different reward design formulations; however, the key challenge is formulating a reward informativeness criterion that adapts w.r.t. the agent's current policy and can be optimized under specified structural constraints to obtain interpretable rewards. In this paper, we propose a novel reward informativeness criterion, a quantitative measure that captures how the agent's current policy will improve if it receives rewards from a specific reward function. We theoretically showcase the utility of the proposed informativeness criterion for adaptively designing rewards for an agent. Experimental results on two navigation tasks demonstrate the effectiveness of our adaptive reward informativeness criterion.
- Europe > Germany > Saarland > Saarbrücken (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)