Goto

Collaborating Authors

 Agents


PACMAN: A Planner-Actor-Critic Architecture for Human-Centered Planning and Learning

arXiv.org Artificial Intelligence

Conventional reinforcement learning (RL) allows an agent to learn policies via environmental rewards only, with a long and slow learning curve at the beginning stage. On the contrary, human learning is usually much faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a \textbf{P}lanner-\textbf{A}ctor-\textbf{C}ritic architecture for hu\textbf{MAN}-centered planning and learning (\textbf{PACMAN}), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, while integrates Actor-Critic algorithm of RL to fine-tune its behaviors towards both environmental rewards and human feedback. This is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent and misleading feedback.


Automatic Algorithm Selection In Multi-agent Pathfinding

arXiv.org Artificial Intelligence

In a multi-agent pathfinding (MAPF) problem, agents need to navigate from their start to their goal locations without colliding into each other. There are various MAPF algorithms, including Windowed Hierarchical Cooperative A*, Flow Annotated Replanning, and Bounded Multi-Agent A*. It is often the case that there is no a single algorithm that dominates all MAPF instances. Therefore, in this paper, we investigate the use of deep learning to automatically select the best MAPF algorithm from a portfolio of algorithms for a given MAPF problem instance. Empirical results show that our automatic algorithm selection approach, which uses an off-the-shelf convolutional neural network, is able to outperform any individual MAPF algorithm in our portfolio.


SQIL: Imitation Learning via Regularized Behavioral Cloning

arXiv.org Machine Learning

Learning to imitate expert behavior given action demonstrations containing high-dimensional, continuous observations and unknown dynamics is a difficult problem in robotic control. Simple approaches based on behavioral cloning (BC) suffer from state distribution shift, while more complex methods that generalize to out-of-distribution states can be difficult to use, since they typically involve adversarial optimization. We propose an alternative that combines the simplicity of BC with the robustness of adversarial imitation learning. The key insight is that under the maximum entropy model of expert behavior, BC corresponds to fitting a soft Q function that maximizes the likelihood of observed actions. This perspective suggests a way to regularize BC so that it generalizes to out-of-distribution states: combine the standard maximum-likelihood objective with a penalty on the soft Bellman error of the soft Q function. We show that this penalty term gives the agent an incentive to take actions that lead it back to demonstrated states when it encounters new states. Experiments show that our method outperforms BC and GAIL on a variety of image-based and low-dimensional environments in Box2D, Atari, and MuJoCo.


Towards Empathetic Planning

arXiv.org Artificial Intelligence

Critical to successful human interaction is a capacity for empathy - the ability to understand and share the thoughts and feelings of another. As Artificial Intelligence (AI) systems are increasingly required to interact with humans in a myriad of settings, it is important to enable AI to wield empathy as a tool to benefit those it interacts with. In this paper, we work towards this goal by bringing together a number of important concepts: empathy, AI planning, and reasoning in the presence of knowledge and belief. We formalize the notion of Empathetic Planning which is informed by the beliefs and affective state of the empathizee. We appeal to an epistemic logic framework to represent the beliefs of the empathizee and propose AI planning-based computational approaches to compute empathetic solutions. We illustrate the potential benefits of our approach by conducting a study where we evaluate participants' perceptions of the agent's empathetic abilities and assistive capabilities.


Empowering swarm-based optimizers by multi-scale search to enhance Gradient Descent initialization performance

arXiv.org Machine Learning

Swarm-based optimizers like Particle Swarm Optimization or Imperialistic Competitive Algorithm that act under influences of cooperation or competition among groups, are unable to search in multiple volumes of locality or globality and do not have nested localities. As hybrid optimizers, they may not give satisfactory results as initializers in Gradient Descent approximators used in plenty of multimodal problems like nonlinear subspace learning and neural network training, which have hierarchies of convex spaces due to nonlinearity and multi-layer nature of these models. To search in various levels of scale in a homogenous way, a framework is proposed to equip PSO and ICA a multi-scale search capability. Then, the resulted optimizers are evaluated in single and GD-hybridized mode. Hybrid evaluation as GD randomizer is implemented with the help of a nonlinear subspace filtering objective function over EEG data and optimization loss and validation data accuracy is compared with other hybrids containing GD. A single evaluation is also taken place between the proposed ones, PSO, ICA, CLPSO, and CICA, which are used more in hybrid learning-based approaches. Evaluations were with respect to solution error. Before concluding the paper, it is shown and analyzed that proposed optimizers outperform algorithms of related context both in single and hybrid-GD mode.


Multi-type Resource Allocation with Partial Preferences

arXiv.org Artificial Intelligence

We propose multi-type probabilistic serial (MPS) and multi-type random priority (MRP) as extensions of the well known PS and RP mechanisms to the multi-type resource allocation problem (MTRA) with partial preferences. In our setting, there are multiple types of divisible items, and a group of agents who have partial order preferences over bundles consisting of one item of each type. We show that for the unrestricted domain of partial order preferences, no mechanism satisfies both sd-efficiency and sd-envy-freeness. Notwithstanding this impossibility result, our main message is positive: When agents' preferences are represented by acyclic CP-nets, MPS satisfies sd-efficiency, sd-envy-freeness, ordinal fairness, and upper invariance, while MRP satisfies ex-post-efficiency, sd-strategy-proofness, and upper invariance, recovering the properties of PS and RP.


Curriculum Learning for Cumulative Return Maximization

arXiv.org Artificial Intelligence

Curriculum learning has been successfully used in reinforcement learning to accelerate the learning process, through knowledge transfer between tasks of increasing complexity. Critical tasks, in which suboptimal exploratory actions must be minimized, can benefit from curriculum learning, and its ability to shape exploration through transfer. We propose a task sequencing algorithm maximizing the cumulative return, that is, the return obtained by the agent across all the learning episodes. By maximizing the cumulative return, the agent not only aims at achieving high rewards as fast as possible, but also at doing so while limiting suboptimal actions. We experimentally compare our task sequencing algorithm to several popular metaheuristic algorithms for combinatorial optimization, and show that it achieves significantly better performance on the problem of cumulative return maximization. Furthermore, we validate our algorithm on a critical task, optimizing a home controller for a micro energy grid.


Autonomous systems - what kind of potential do they hold?

#artificialintelligence

And what kind of future will they bring with them? A future that is more efficient. A future that is safer. A future that is full of low-emission and energy-efficient solutions. A future that is abundant with this kind of business.


Competing Bandits in Matching Markets

arXiv.org Machine Learning

Stable matching, a classical model for two-sided markets, has long been studied with little consideration for how each side's preferences are learned. With the advent of massive online markets powered by data-driven matching platforms, it has become necessary to better understand the interplay between learning and market objectives. We propose a statistical learning model in which one side of the market does not have a priori knowledge about its preferences for the other side and is required to learn these from stochastic rewards. Our model extends the standard multi-armed bandits framework to multiple players, with the added feature that arms have preferences over players. We study both centralized and decentralized approaches to this problem and show surprising exploration-exploitation trade-offs compared to the single player multi-armed bandits setting.


Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning

arXiv.org Artificial Intelligence

Recent developments in deep reinforcement learning are concerned with creating decision-making agents which can perform well in various complex domains. A particular approach which has received increasing attention is multi-agent reinforcement learning, in which multiple agents learn concurrently to coordinate their actions. In such multi-agent environments, additional learning problems arise due to the continually changing decision-making policies of agents. This paper surveys recent works that address the non-stationarity problem in multi-agent deep reinforcement learning. The surveyed methods range from modifications in the training procedure, such as centralized training, to learning representations of the opponent's policy, meta-learning, communication, and decentralized learning. The survey concludes with a list of open problems and possible lines of future research.