AITopics

2001.02192

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
(2 more...)

Thornton, Charles E., Buehrer, R. Michael, Martone, Anthony F., Sherbondy, Kelly D.

Experimental Analysis of Reinforcement Learning Techniques for Spectrum Sharing Radar

arXiv.org Machine LearningJan-6-2020

Abstract--In this work, we first describe a framework for the application of Reinforcement Learning (RL) control to a radar system that operates in a congested spectral setting. We then compare the utility of several RL algorithms through a discussion of experiments performed on Commercial off-the -shelf (COTS) hardware. Each RL technique is evaluated in terms of convergence, radar detection performance achieved in a con gested spectral environment, and the ability to share 100MHz spect rum with an uncooperative communications system. We examine po licy iteration, which solves an environment posed as a Markov Dec ision Process (MDP) by directly solving for a stochastic mapping between environmental states and radar waveforms, as well a s Deep RL techniques, which utilize a form of Q -Learning to approximate a parameterized function that is used by the rad ar to select optimal actions. We show that RL techniques are benefi cial over a Sense-and-A void (SAA) scheme and discuss the conditi ons under which each approach is most effective. The Third Generation Partnership Project (3GPP) has recently received FCC approval to support 5G New Radio (NR) operation in sub-6 GHz frequency bands that are heavily utilized by radar systems [1], [2]. Thus, there is a significa nt need for radar systems capable of dynamic spectrum sharing.

algorithm, radar, rl technique, (16 more...)

arXiv.org Machine Learning

2001.01799

Country:

North America > United States > Virginia (0.04)
North America > United States > Maryland > Prince George's County > Adelphi (0.04)

Genre: Research Report (0.40)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Shi, Zheyuan Ryan, Wang, Claire, Fang, Fei

Artificial Intelligence for Social Good: A Survey

knowledge discovery and data mining, twenty-eighth international joint conference, twenty-fourth international joint conference, (15 more...)

Its impact is drastic and real: Youtube's AIdriven recommendation system would present sports videos for days if one happens to watch a live baseball game on the platform [1]; email writing becomes much faster with machine learning (ML) based auto-completion [2]; many businesses have adopted natural language processing based chatbots as part of their customer services [3]. AI has also greatly advanced human capabilities in complex decision-making processes ranging from determining how to allocate security resources to protect airports [4] to games such as poker [5] and Go [6]. All such tangible and stunning progress suggests that an "AI summer" is happening. As some put it, "AI is the new electricity" [7]. Meanwhile, in the past decade, an emerging theme in the AI research community is the so-called "AI for social good" (AI4SG): researchers aim at developing AI methods and tools to address problems at the societal level and improve the wellbeing of the society.

2001.01818

Country:

Africa > Uganda (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(21 more...)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Transportation > Passenger (1.00)
Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
(26 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(12 more...)

Unger, Thomas A., Bruni, Elia

Incentivizing the Emergence of Grounded Discrete Communication Between General Agents

We converted the recently developed BabyAI grid world platform to a sender/receiver setup in order to test the hypothesis that established deep reinforcement learning techniques are sufficient to incentivize the emergence of a grounded discrete communication protocol between general agents. This is in contrast to previous experiments that employed straight-through estimation or tailored inductive biases. Our results show that these can indeed be avoided, by instead providing proper environmental incentives. Moreover, they show that a longer interval between communications in-centivized more abstract semantics. In some cases, the communicating agents adapted to new environments more quickly than monolithic agents, showcasing the potential of emergent discrete communication for transfer learning.

agent, communication, receiver, (13 more...)

2001.01772

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Del Verme, Manuel, da Silva, Bruno Castro, Baldassarre, Gianluca

Optimal Options for Multi-Task Reinforcement Learning Under Time Constraints

However, even to learn to solve simple tasks it can require millions of interactions. A promising approach to improve the learning speed relies on the options framework [6] An option is a'chunk of behaviour' that is formally defined as an initiation set, establishing in which states the option is available; a policy, indicating which actions to perform in each state; and a termination condition, establishing when the option execution is terminated. RL systems can benefit from the use of options to support faster exploration and learning especially when rewards are sparse or when the solution to a problem involves recurring behaviours. An important open problem is how can an agent autonomously learn options that are useful to solve tasks drawn from a given task distribution. Recent approaches have searched options for specific optimisation problems but they have not studied how optimal options are affected by different task features such as limited learning time budgets, task rewards, initial states, and the learning algorithm used.

agent, budget, time budget, (13 more...)

2001.0162

Country:

Europe > Italy > Lazio > Rome (0.05)
South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.85)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Garcia, Francisco M., Nota, Chris, Thomas, Philip S.

Learning Reusable Options for Multi-Task Reinforcement Learning

One of the main reasons why RL has worked so well in these applications is that we are able simulate millions of interactions with the environment in a relatively short period of time, allowing the agent to experience a large number of different situations in the environment and learn the consequences of its actions. In many real world applications, however, where the agent interacts with the physical world, it might not be easy to generate such a large number of interactions. The time and cost associated with training such systems could render RL an unfeasible approach for training in large scale. As a concrete example, consider training a large number of humanoid robots (agents) to move quickly, as in the Robocup competition [ Farchy et al., 2013 ] . Although the agents have similar dynamics, subtle variations mean that a single policy shared across all agents would not be an effective solution.

agent, probability, trajectory, (15 more...)

2001.01577

Country:

North America > United States > Massachusetts > Hampshire County > Amherst (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > Massachusetts > Plymouth County > Norwell (0.04)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

#artificialintelligenceJan-5-2020, 15:52:35 GMT

google/trax

Trax helps you understand deep learning. We start with basic maths and go through layers, models, supervised and reinforcement learning. We get to advanced deep learning results, including recent papers and state-of-the-art models. Trax is a successor to the Tensor2Tensor library and is actively used and maintained by researchers and engineers within the Google Brain team and a community of users. We're eager to collaborate with you too, so feel free to open an issue on GitHub or send along a pull request (see our contribution doc).

google trax, learning

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.33)

#artificialintelligenceJan-5-2020, 12:41:58 GMT

How To Build Your Own MuZero AI Using Python (Part 1/3)

If you want to learn how one of the most sophisticated AI systems ever built works, you've come to the right place. In this three part series, we'll explore the inner workings of the DeepMind MuZero model -- the younger (and even more impressive) brother of AlphaZero. We'll be walking through the pseudocode that accompanies the MuZero paper -- so grab yourself a cup of tea and a comfy chair and let's begin. On 19th November 2019 DeepMind released their latest model-based reinforcement learning algorithm to the world -- MuZero. This is the fourth in a line of DeepMind reinforcement learning papers that have continually smashed through the barriers of possibility, starting with AlphaGo in 2016.

alphazero, own muzero ai, pseudocode, (7 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Games (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.80)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Tasse, Geraud Nangue, James, Steven, Rosman, Benjamin

A Boolean Task Algebra for Reinforcement Learning

arXiv.org Machine LearningJan-5-2020

We propose a framework for defining a Boolean algebra over the space of tasks. This allows us to formulate new tasks in terms of the negation, disjunction and conjunction of a set of base tasks. We then show that by learning goal-oriented value functions and restricting the transition dynamics of the tasks, an agent can solve these new tasks with no further learning. We prove that by composing these value functions in specific ways, we immediately recover the optimal policies for all tasks expressible under the Boolean algebra. We verify our approach in two domains, including a high-dimensional video game environment requiring function approximation, where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks.

agent, composition, value function, (15 more...)

arXiv.org Machine Learning

2001.01394

Country:

North America > United States (0.04)
Europe > France > Hauts-de-France > Nord > Lille (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre: Research Report (0.82)

Industry: Leisure & Entertainment > Games > Computer Games (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Gebhardt, Christoph, Oulasvirta, Antti, Hilliges, Otmar

Hierarchical Reinforcement Learning as a Model of Human Task Interleaving

arXiv.org Artificial IntelligenceJan-4-2020

How do people decide how long to continue in a task, when to switch, and to which other task? Understanding the mechanisms that underpin task interleaving is a long-standing goal in the cognitive sciences. Prior work suggests greedy heuristics and a policy maximizing the marginal rate of return. However, it is unclear how such a strategy would allow for adaptation to everyday environments that offer multiple tasks with complex switch costs and delayed rewards. Here we develop a hierarchical model of supervisory control driven by reinforcement learning (RL). The supervisory level learns to switch using task-specific approximate utility estimates, which are computed on the lower level. A hierarchically optimal value function decomposition can be learned from experience, even in conditions with multiple tasks and arbitrary and uncertain reward and cost structures. The model reproduces known empirical effects of task interleaving. It yields better predictions of individual-level data than a myopic baseline in a six-task problem (N=211). The results support hierarchical RL as a plausible model of task interleaving.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2001.02122

Genre:

Research Report > Experimental Study (0.69)
Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)