bail
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- North America > Canada (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
d55cbf210f175f4a37916eafe6c04f0d-AuthorFeedback.pdf
Intermsoftesting14 on alternative domains, we are currently focused on MuJoCo, where Ant and Humanoid are the most challenging15 environments. In our view, all DRL algorithms26 are heuristics, and performance guarantees for schemes using neural-network function-approximators are rare. We will make this more clear in the revision. We decided to use L2 regularization in the definition of the upper-envelope since it leads to a clean definition and32 theory. Soitispossiblefor40 multiple algorithms to be in bold in atable row.
BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
There has recently been a surge in research in batch Deep Reinforcement Learning (DRL), which aims for learning a high-performing policy from a given dataset without additional interactions with the environment. We propose a new algorithm, Best-Action Imitation Learning (BAIL), which strives for both simplicity and performance. BAIL learns a V function, uses the V function to select actions it believes to be high-performing, and then uses those actions to train a policy network using imitation learning. For the MuJoCo benchmark, we provide a comprehensive experimental study of BAIL, comparing its performance to four other batch Q-learning and imitation-learning schemes for a large variety of batch datasets. Our experiments show that BAIL's performance is much higher than the other schemes, and is also computationally much faster than the batch Q-learning schemes.
The LLM Has Left The Chat: Evidence of Bail Preferences in Large Language Models
Ensign, Danielle, Sleight, Henry, Fish, Kyle
When given the option, will LLMs choose to leave the conversation (bail)? We investigate this question by giving models the option to bail out of interactions using three different bail methods: a bail tool the model can call, a bail string the model can output, and a bail prompt that asks the model if it wants to leave. On continuations of real world data (Wildchat and ShareGPT), all three of these bail methods find models will bail around 0.28-32\% of the time (depending on the model and bail method). However, we find that bail rates can depend heavily on the model used for the transcript, which means we may be overestimating real world bail rates by up to 4x. If we also take into account false positives on bail prompt (22\%), we estimate real world bail rates range from 0.06-7\%, depending on the model and bail method. We use observations from our continuations of real world data to construct a non-exhaustive taxonomy of bail cases, and use this taxonomy to construct BailBench: a representative synthetic dataset of situations where some models bail. We test many models on this dataset, and observe some bail behavior occurring for most of them. Bail rates vary substantially between models, bail methods, and prompt wordings. Finally, we study the relationship between refusals and bails. We find: 1) 0-13\% of continuations of real world conversations resulted in a bail without a corresponding refusal 2) Jailbreaks tend to decrease refusal rates, but increase bail rates 3) Refusal abliteration increases no-refuse bail rates, but only for some bail methods 4) Refusal rate on BailBench does not appear to predict bail rate.
- North America > United States > Wyoming (0.04)
- North America > United States > New York (0.04)
- Research Report > Experimental Study (0.67)
- Research Report > New Finding (0.67)
- Information Technology (1.00)
- Media (0.93)
- Health & Medicine > Therapeutic Area (0.93)
- North America > United States > New York (0.04)
- North America > United States > Arizona > Maricopa County > Phoenix (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Judge orders leaders of cult-like 'Zizian' group to be held without bail
A Maryland court has ordered a blogger known as "Ziz", who leads a cult-like group connected to six killings, to be held without bail. The blogger, Jack LaSota, 34, of Berkeley, California, was arrested Sunday along with Michelle Zajko, 32, of Media, Pennsylvania, and Daniel Blank, 26, of Sacramento, California. The Zizians, as the group are known after their apparent leader, have been tied to the killing of a United States Border Patrol agent David Maland last month near the Canadian border, as well as five other killings in three states. LaSota, Zajko and Blank were arrested in Frostburg, Maryland, on Sunday afternoon. The judge in the case ordered LaSota to be held without bail, citing concerns about her being a flight risk and a danger to public safety.
- North America > United States > Maryland (0.47)
- North America > United States > Pennsylvania (0.29)
- North America > Canada (0.27)
- (4 more...)
Review for NeurIPS paper: BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
Summary and Contributions: ---post author response--- Thank you for the response! The clarifications to the table have improved my understanding of the results. While I think that the results are strong, the discussion section is jumbled/unclear, and intuition of some of the design decisions are lacking and give an'ad hoc' impression. Clarifications for this are adequately mentioned in the response, and I will increase my score to a 6 assuming the authors will add these clarifications to the final text, as well as make the experimental results section more more clear. This work proposes a batch deep RL algorithm called BAIL. It essentially trains a policy using imitation learning with samples collected from state-action pairs whose (Monte Carlo) returns are from what the authors define as the upper envelope of the data.
Review for NeurIPS paper: BAIL: Best-Action Imitation Learning for Batch Deep Reinforcement Learning
The authors agreed that the paper makes good contributions to batch RL, and the rebuttal has been very helpful. Some concerns around the empirical evaluation remain, but the paper makes a good contribution. Please make sure that the revised version of the paper actually reflects the rebuttal and reviewer recommendations.