safety
A Lyapunov-based Approach to Safe Reinforcement Learning
In many real-world reinforcement learning (RL) problems, besides optimizing the main objective function, an agent must concurrently avoid violating a number of constraints. In particular, besides optimizing performance, it is crucial to guarantee the safety of an agent during training as well as deployment (e.g., a robot should avoid taking actions - exploratory or not - which irrevocably harm its hardware). To incorporate safety in RL, we derive algorithms under the framework of constrained Markov decision processes (CMDPs), an extension of the standard Markov decision processes (MDPs) augmented with constraints on expected cumulative costs.
Extending the reward structure in reinforcement learning: an interview with Tanmay Ambadkar
In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. Tanmay Ambadkar is researching the reward structure in reinforcement learning, with the goal of providing generalizable solutions that can provide robust guarantees and are easily deployable. We caught up with Tanmay to find out more about his research, and in particular, the constrained reinforcement learning framework he has been working on. Tell us a bit about your PhD - where are you studying, and what is the topic of your research? I am a 4th year PhD candidate at The Pennsylvania State University, PA, USA.
- North America > United States > Pennsylvania (0.25)
- Asia > Singapore (0.05)
Starmer 'appeasing' big tech firms, says online safety campaigner
Starmer'appeasing' big tech firms, says online safety campaigner A leading campaigner has accused the prime minister of appeasing big tech companies and being late to the party in regulating social media and artificial intelligence. Crossbench peer Baroness Kidron told the BBC Sir Keir Starmer needed to get on with it rather than launching more consultations. She also criticised the PM for citing his own experience as a father of two teenage children on social media, arguing that this did not make him an expert on the subject and that his family were sheltered compared to others. The government rejected the claims, with a spokesperson saying it had already introduced some of the strongest online safety protections in the world. Sir Keir has launched a consultation on banning under-16s from social media and promised to crackdown on the addictive elements of the apps.
- North America > Central America (0.15)
- Oceania > Australia (0.06)
- Europe > United Kingdom > Wales (0.05)
- (12 more...)
- Leisure & Entertainment (1.00)
- Government > Regional Government > Europe Government > United Kingdom Government (0.72)
- Media > Film (0.71)
- North America > United States > Missouri > St. Louis County > St. Louis (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.40)
- North America > United States > California (0.14)
- North America > United States > Alaska (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Heidelberg (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- (7 more...)
- Transportation (1.00)
- Leisure & Entertainment (0.92)
- Information Technology (0.68)
- Automobiles & Trucks (0.68)
e197fe307eb3467035f892dc100d570a-Supplemental-Conference.pdf
The process for calculating these metrics is described in Appendix C. Moreover, to ensure the comparability between prediction performance metrics and driving performance metrics in the radar plot, we normalize all metrics to the scale of [0, 1]. In the subsequent section, we provide an overview of the DESPOT planner. These two values can only be inferred from history. The safety is represented by the normalized collision rate.