AITopics | Wray, Kyle

Collaborating Authors

Wray, Kyle

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Entropy-regularized Point-based Value Iteration

Delecki, Harrison, Vazquez-Chanlatte, Marcell, Yel, Esen, Wray, Kyle, Arnon, Tomer, Witwicki, Stefan, Kochenderfer, Mykel J.

arXiv.org Artificial IntelligenceFeb-14-2024

Model-based planners for partially observable problems must accommodate both model uncertainty during planning and goal uncertainty during objective inference. However, model-based planners may be brittle under these types of uncertainty because they rely on an exact model and tend to commit to a single optimal behavior. Inspired by results in the model-free setting, we propose an entropy-regularized model-based planner for partially observable problems. Entropy regularization promotes policy robustness for planning and objective inference by encouraging policies to be no more committed to a single action than necessary. We evaluate the robustness and objective inference performance of entropy-regularized policies in three problem domains. Our results show that entropy-regularized policies outperform non-entropy-regularized baselines in terms of higher expected returns under modeling errors and higher accuracy during objective inference.

alpha vector, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2402.09388

Country: North America > United States > California > Santa Clara County (0.29)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.79)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.57)

Add feedback

Decision Making in Non-Stationary Environments with Policy-Augmented Search

Pettet, Ava, Zhang, Yunuo, Luo, Baiting, Wray, Kyle, Baier, Hendrik, Laszka, Aron, Dubey, Abhishek, Mukhopadhyay, Ayan

arXiv.org Artificial IntelligenceJan-20-2024

Sequential decision-making under uncertainty is present in many important problems. Two popular approaches for tackling such problems are reinforcement learning and online search (e.g., Monte Carlo tree search). While the former learns a policy by interacting with the environment (typically done before execution), the latter uses a generative model of the environment to sample promising action trajectories at decision time. Decision-making is particularly challenging in non-stationary environments, where the environment in which an agent operates can change over time. Both approaches have shortcomings in such settings -- on the one hand, policies learned before execution become stale when the environment changes and relearning takes both time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce \textit{Policy-Augmented Monte Carlo tree search} (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove theoretical results showing conditions under which PA-MCTS selects the one-step optimal action and also bound the error accrued while following PA-MCTS as a policy. We compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments. Through extensive experiments, we show that under non-stationary settings with limited time constraints, PA-MCTS outperforms these baselines.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2401.03197

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report > New Finding (0.66)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Active teacher selection for reinforcement learning from human feedback

Freedman, Rachel, Svegliato, Justin, Wray, Kyle, Russell, Stuart

arXiv.org Artificial IntelligenceOct-23-2023

Specifying objective functions for machine learning systems is challenging, and misspecified objectives can be hacked [1, 2] or incentivise degenerate behavior [3, 4, 5]. Techniques such as reinforcement learning from human feedback (RLHF) enable ML systems to instead learn appropriate objectives from human feedback [6, 7, 8]. These techniques are widely used to finetune large language models [9, 10, 11, 12] and to train reinforcement learning agents to perform complex maneuvers in continuous control environments [6, 7]. However, while RLHF is relied upon to ensure that these systems are safe, helpful, and harmless [13], it still faces many limitations and unsolved challenges [14]. In particular, RLHF systems typically rely on the assumption that all feedback comes from a single human teacher, despite gathering feedback from a range of teachers with varying levels of rationality and expertise. For example, Stiennon et al. [8], Bai et al. [13] and Ouyang et al. [15] assume that all feedback comes from a single teacher, but find that annotators and researchers actually disagree 23% to 37% of the time. Reward learning has been shown to be highly sensitive to incorrect assumptions about the process that generates feedback [16, 17, 18, 19], so this single-teacher assumption exposes these systems to dangerous failures [20]. Ideally, RLHF systems should consider the differences between each teacher to improve their safety and reliability. To leverage multiple teachers in RLHF, we introduce a novel problem called a Hidden Utility Bandit (HUB), illustrated in Figure 1.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2310.15288

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback