Goto

Collaborating Authors

 Agents


From the Periphery to the Center: Information Brokerage in an Evolving Network

arXiv.org Artificial Intelligence

Interpersonal ties are pivotal to individual efficacy, status and performance in an agent society. This paper explores three important and interrelated themes in social network theory: the center/periphery partition of the network; network dynamics; and social integration of newcomers. We tackle the question: How would a newcomer harness information brokerage to integrate into a dynamic network going from periphery to center? We model integration as the interplay between the newcomer and the dynamics network and capture information brokerage using a process of relationship building. We analyze theoretical guarantees for the newcomer to reach the center through tactics; proving that a winning tactic always exists for certain types of network dynamics. We then propose three tactics and show their superior performance over alternative methods on four real-world datasets and four network models. In general, our tactics place the newcomer to the center by adding very few new edges on dynamic networks with approximately 14000 nodes.


Learning with Opponent-Learning Awareness

arXiv.org Artificial Intelligence

Multi-agent settings are quickly gathering importance in machine learning. This includes a plethora of recent work on deep multi-agent reinforcement learning, but also can be extended to hierarchical RL, generative adversarial networks and decentralised optimisation. In all these settings the presence of multiple learning agents renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method in which each agent shapes the anticipated learning of the other agents in the environment. The LOLA learning rule includes an additional term that accounts for the impact of one agent's policy on the anticipated parameter update of the other agents. Preliminary results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the iterated prisoners' dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to repeated matching pennies, LOLA agents converge to the Nash equilibrium. In a round robin tournament we show that LOLA agents can successfully shape the learning of a range of multi-agent learning algorithms from literature, resulting in the highest average returns on the IPD. We also show that the LOLA update rule can be efficiently calculated using an extension of the policy gradient estimator, making the method suitable for model-free RL. This method thus scales to large parameter and input spaces and nonlinear function approximators. We also apply LOLA to a grid world task with an embedded social dilemma using deep recurrent policies and opponent modelling. Again, by explicitly considering the learning of the other agent, LOLA agents learn to cooperate out of self-interest.


AI safety via debate

arXiv.org Machine Learning

To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information. In an analogy to complexity theory, debate with optimal play can answer any question in PSPACE given polynomial time judges (direct judging answers only NP questions). In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment. We report results on an initial MNIST experiment where agents compete to convince a sparse classifier, boosting the classifier's accuracy from 59.4% to 88.9% given 6 pixels and from 48.2% to 85.2% given 4 pixels. Finally, we discuss theoretical and practical aspects of the debate model, focusing on potential weaknesses as the model scales up, and we propose future human and computer experiments to test these properties.


Role Models in AI: Ece Kamar – AI4ALL – Medium

#artificialintelligence

Meet Ece Kamar, a senior researcher at Microsoft who works on human-machine collaboration, AI systems in the real world, and issues around bias, robustness, reliability, and transparency in AI. Ece also co-authored the first report in a 100-year study of artificial intelligence, intended to provide a set of reflections about the field as it progresses. The report offers insights on where AI is headed, policy recommendations, and the importance of reflecting on fairness and transparency in the field. Ece believes that it's unlikely that important tasks will ever be fully automated, as human-AI partnerships will be complementary, rather than a relationship of replacement. See how Ece envisions the future of AI, how her academic exploration in college helped shape her career, and how she sees diversity as key to moving the field in a positive direction.


Play To Transform MISC

#artificialintelligence

Will Playing Make AI More Human? On March 15, 2016, Google's AI program, AlphaGo, beat world champion Lee Sedol four-to-one in one of the most complex strategy games ever devised – the ancient Chinese game of Go. Since that historic match, Google has released a new version of its AI agent, called AlphaGo Zero, which defeated its predecessor by 100 games to 0. Unlike AlphaGo, which relied on big data, machine learning, and advanced algorithms, AlphaGo Zero started learning Go on its own, from scratch. Starting with a very primitive understanding of the game, AlphaGo Zero created a duplicate of itself, playing itself repeatedly and using what it learned in each match to advance and update its algorithms. Beginning with random play, it took AlphaGo Zero only 40 days to master the game and become the world's best player.


How morphological development can guide evolution

arXiv.org Artificial Intelligence

Organisms result from adaptive processes interacting across different time scales. One such interaction is that between development and evolution. Models have shown that development sweeps over several traits in a single agent, sometimes exposing promising static traits. Subsequent evolution can then canalize these rare traits. Thus, development can, under the right conditions, increase evolvability. Here, we report on a previously unknown phenomenon when embodied agents are allowed to develop and evolve: Evolution discovers body plans robust to control changes, these body plans become genetically assimilated, yet controllers for these agents are not assimilated. This allows evolution to continue climbing fitness gradients by tinkering with the developmental programs for controllers within these permissive body plans. This exposes a previously unknown detail about the Baldwin effect: instead of all useful traits becoming genetically assimilated, only traits that render the agent robust to changes in other traits become assimilated. We refer to this as differential canalization. This finding also has implications for the evolutionary design of artificial and embodied agents such as robots: robots robust to internal changes in their controllers may also be robust to external changes in their environment, such as transferal from simulation to reality or deployment in novel environments.


How and why Madison Reed's hair color quiz works

#artificialintelligence

Artificial intelligence makes this possible for the hair color products retailer. The machine-learning algorithm factors in the answers to a 20-question hair coloring quiz more than 4 million shoppers have taken, along with more than 24,0000 product reviews of the retailer's 50 SKUs. It also factors in a shopper's repeat purchase rate, net promoter score and hundreds of thousands of shopper and customer service agent interactions, says Dave King, chief technology officer at Madison Reed. With all of this historical information, Madison Reed's algorithm is the ultimate master colorist that can recommend hair coloring products, King says. When Madison Reed first launched its product recommendation engine, it was coded by humans and did not factor in all of the data points that it does today.



An approach to logical cognition and rationality in artificial intelligence

@machinelearnbot

Our description of logical framing as a process of rational comprehension of perceptual experience by an intelligent agent begins with the definition of "templates" and "objects". Templates are similar to forms and schemata, and objects are similar to perceptual patterns. While both are network-like structures of data, they differ in both content and function. The nodes of an object represent the elements or parts of some external thing, and the links represent the relations between elements. Elements are defined by "descriptive properties" that exist along any number of dimensions (e.g.


Generating Interpretable Fuzzy Controllers using Particle Swarm Optimization and Genetic Programming

arXiv.org Artificial Intelligence

Autonomously training interpretable control strategies, called policies, using pre-existing plant trajectory data is of great interest in industrial applications. Fuzzy controllers have been used in industry for decades as interpretable and efficient system controllers. In this study, we introduce a fuzzy genetic programming (GP) approach called fuzzy GP reinforcement learning (FGPRL) that can select the relevant state features, determine the size of the required fuzzy rule set, and automatically adjust all the controller parameters simultaneously. Each GP individual's fitness is computed using model-based batch reinforcement learning (RL), which first trains a model using available system samples and subsequently performs Monte Carlo rollouts to predict each policy candidate's performance. We compare FGPRL to an extended version of a related method called fuzzy particle swarm reinforcement learning (FPSRL), which uses swarm intelligence to tune the fuzzy policy parameters. Experiments using an industrial benchmark show that FGPRL is able to autonomously learn interpretable fuzzy policies with high control performance.