Goto

Collaborating Authors

 berlekamp




Efficient Meta Reinforcement Learning for Preference-based Fast Adaptation

Ren, Zhizhou, Liu, Anji, Liang, Yitao, Peng, Jian, Ma, Jianzhu

arXiv.org Artificial Intelligence

Learning new task-specific skills from a few trials is a fundamental challenge for artificial intelligence. Meta reinforcement learning (meta-RL) tackles this problem by learning transferable policies that support few-shot adaptation to unseen tasks. Despite recent advances in meta-RL, most existing methods require the access to the environmental reward function of new tasks to infer the task objective, which is not realistic in many practical applications. To bridge this gap, we study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning. We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback. The agent can adapt to new tasks by querying human's preference between behavior trajectories instead of using per-step numeric rewards. By extending techniques from information theory, our approach can design query sequences to maximize the information gain from human interactions while tolerating the inherent error of non-expert human oracle. In experiments, we extensively evaluate our method, Adaptation with Noisy OracLE (ANOLE), on a variety of meta-RL benchmark tasks and demonstrate substantial improvement over baseline algorithms in terms of both feedback efficiency and error tolerance.


Game of Hex -- from Wolfram MathWorld

AITopics Original Links

Hex is a two-player game invented by Piet Hein in 1942 while a student at Niels Bohr's Institute for Theoretical Physics, and subsequently and independently by John Nash in 1948 while a mathematics graduate student at Princeton. The game was originally called Nash or John, with the latter name at the same time crediting its inventor and referring to the fact that it was frequently played on the tiled floors of bathrooms (Gardner 1959, pp. The name Hex was invented in 1952, when a commercial version was issued by the game company Parker Brothers. Hex is played on a diamond-shaped board made up of hexagons. The game is usually played on a boards of size 11 on a side, for a total of 121 hexagons, as illustrated above.


Claude Shannon, the Father of the Information Age, Turns 1100100

The New Yorker

Twelve years ago, Robert McEliece, a mathematician and engineer at Caltech, won the Claude E. Shannon Award, the highest honor in the field of information theory. During his acceptance lecture, at an international symposium in Chicago, he discussed the prize's namesake, who died in 2001. Claude Shannon: Born on the planet Earth (Sol III) in the year 1916 A.D. Generally regarded as the father of the information age, he formulated the notion of channel capacity in 1948 A.D. Within several decades, mathematicians and engineers had devised practical ways to communicate reliably at data rates within one per cent of the Shannon limit. As is sometimes the case with encyclopedias, the crisply worded entry didn't quite do justice to its subject's legacy. That humdrum phrase--"channel capacity"--refers to the maximum rate at which data can travel through a given medium without losing integrity.


Claude Shannon: Tinkerer, Prankster, and Father of Information Theory

#artificialintelligence

Editor's note: This month marks the centennial of the birth of Claude Shannon, the American mathematician and electrical engineer whose groundbreaking work laid out the theoretical foundation for modern digital communications. To celebrate the occasion, we're republishing online a memorable profile of Shannon that IEEE Spectrum ran in its April 1992 issue. Written by former Spectrum editor John Horgan, who interviewed Shannon at his home in Winchester, Mass., the profile reveals the many facets of Shannon's character: While best known as the father of information theory, Shannon was also an inventor, tinkerer, puzzle solver, and prankster. The 1992 profile included a portrait of Shannon taken by Boston-area photographer Stanley Rowin. On this page we're reproducing that portrait along with other Shannon photos by Rowin that Spectrum has never published. Shannon died in 2001 at age 84 after a long battle with Alzheimer's disease. He is regarded as one of the greatest electrical engineering heroes of all time.


TDS+: Improving Temperature Discovery Search

Zhang, Yeqin (University of Alberta) | Müller, Martin (University of Alberta)

AAAI Conferences

Temperature Discovery Search (TDS) is a forward search method for computing or approximating the temperature of a combinatorial game. Temperature and mean are important concepts in combinatorial game theory, which can be used to develop efficient algorithms for playing well in a sum of subgames. A new algorithm TDS+ with five enhancements of TDS is developed, which greatly speeds up both exact and approximate versions of TDS. Means and temperatures can be computed faster, and fixed-time approximations which are important for practical play can be computed with higher accuracy than before.