In this paper, we propose an AI-FML robotic agent for student learning behavior ontology construction which can be applied in English speaking and listening domain. The AI-FML robotic agent with the ontology contains the perception intelligence, computational intelligence, and cognition intelligence for analyzing student learning behavior. In addition, there are three intelligent agents, including a perception agent, a computational agent, and a cognition agent in the AI-FML robotic agent. We deploy the perception agent and the cognition agent on the robot Kebbi Air. Moreover, the computational agent with the Deep Neural Network (DNN) model is performed in the cloud and can communicate with the perception agent and cognition agent via the Internet. The proposed AI-FML robotic agent is applied in Taiwan and tested in Japan. The experimental results show that the agents can be utilized in the human and machine co-learning model for the future education.
Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter η to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma, LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the iterated prisoner's dilemma.
We discuss the role of coordination as a direct learning objective in multi-agent reinforcement learning (MARL) domains. To this end, we present a novel means of quantifying coordination in multi-agent systems, and discuss the implications of using such a measure to optimize coordinated agent policies. This concept has important implications for adversary-aware RL, which we take to be a sub-domain of multi-agent learning.
When DeepMind's AlphaGo defeated South Korean master Lee Se-dol, it was a historic stride for AI. The depth of this development, coupled with higher computing power and cheaper data storage, is moving AI into the mainstream. Perhaps the most popular application of AI today comes in the form of virtual assistants and bots, or "agents" as my good friend Shivon defines them. An agent can schedule your meetings, manage your finances, book your travels, order your meals, and more. And even though these agents are typically focused on one specific task, it's remarkable to consider how much progress we have made outsourcing mundane work for a fraction of the cost.
Potential-based reward shaping (PBRS) is a particular category of machine learning methods which aims to improve the learning speed of a reinforcement learning agent by extracting and utilizing extra knowledge while performing a task. There are two steps in the process of transfer learning: extracting knowledge from previously learned tasks and transferring that knowledge to use it in a target task. The latter step is well discussed in the literature with various methods being proposed for it, while the former has been explored less. With this in mind, the type of knowledge that is transmitted is very important and can lead to considerable improvement. Among the literature of both the transfer learning and the potential-based reward shaping, a subject that has never been addressed is the knowledge gathered during the learning process itself. In this paper, we presented a novel potential-based reward shaping method that attempted to extract knowledge from the learning process. The proposed method extracts knowledge from episodes' cumulative rewards. The proposed method has been evaluated in the Arcade learning environment and the results indicate an improvement in the learning process in both the single-task and the multi-task reinforcement learner agents.