BADGER: Learning to (Learn [Learning Algorithms] through Multi-Agent Communication)

Rosa, Marek, Afanasjeva, Olga, Andersson, Simon, Davidson, Joseph, Guttenberg, Nicholas, Hlubuček, Petr, Poliak, Martin, Vítku, Jaroslav, Feyereisl, Jan

arXiv.org Artificial Intelligence 

An architecture and a learning procedure where: An agent is made up of many experts All experts share the same communication policy (expert policy), but have different internal memory states There are two levels of learning, an inner loop (with a communication stage) and an outer lo op In ner loop - Agent's behavior and adaptation should emerge as a result of e xperts communicating between each other. Expert s send messag es (of any complexity) to each other and update their internal states based on observations/messages and their internal state fr om the previous time-step. Expert policy is fixed and does not c hange during the inner loop Inner loop loss need not even be a proper loss function. It can be any kind of structured feedback guiding the adaptation during th e age nt's lifetime Outer loop - An expert policy is discovered over generations of agents, ensuring that strategies that find solutions to prob lems in divers e environments can quickly emerge in the inner loop Agent's objective is to adapt fast to novel tasks Exhibiting the following novel properties: Roles of experts and connectivity among them assigned dynamically at inference time Learned communication protocol with context dependent messages of varied complexity Generalizes to different numbers and types of inputs/ou tputs Ca n be trained to handle variations in architecture during bot h training and testing Initial empirical results show generalization and scalability along the spectrum of learning types.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found