Goto

Collaborating Authors

 Agents


Weakly-Supervised Neural Response Selection from an Ensemble of Task-Specialised Dialogue Agents

arXiv.org Artificial Intelligence

Dialogue engines that incorporate different types of agents to converse with humans are popular. However, conversations are dynamic in the sense that a selected response will change the conversation on-the-fly, influencing the subsequent utterances in the conversation, which makes the response selection a challenging problem. We model the problem of selecting the best response from a set of responses generated by a heterogeneous set of dialogue agents by taking into account the conversational history, and propose a \emph{Neural Response Selection} method. The proposed method is trained to predict a coherent set of responses within a single conversation, considering its own predictions via a curriculum training mechanism. Our experimental results show that the proposed method can accurately select the most appropriate responses, thereby significantly improving the user experience in dialogue systems.


Discrete-to-Deep Supervised Policy Learning

arXiv.org Machine Learning

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.


Dynamic Federated Learning

arXiv.org Machine Learning

Federated learning has emerged as an umbrella term for centralized coordination strategies in multi-agent environments. While many federated learning architectures process data in an online manner, and are hence adaptive by nature, most performance analyses assume static optimization problems and offer no guarantees in the presence of drifts in the problem solution or data characteristics. We consider a federated learning model where at every iteration, a random subset of available agents perform local updates based on their data. Under a non-stationary random walk model on the true minimizer for the aggregate optimization problem, we establish that the performance of the architecture is determined by three factors, namely, the data variability at each agent, the model variability across all agents, and a tracking term that is inversely proportional to the learning rate of the algorithm. The results clarify the trade-off between convergence and tracking performance.


Demand-Side Scheduling Based on Deep Actor-Critic Learning for Smart Grids

arXiv.org Machine Learning

We consider the problem of demand-side energy management, where each household is equipped with a smart meter that is able to schedule home appliances online. The goal is to minimise the overall cost under a real-time pricing scheme. While previous works have introduced centralised approaches, we formulate the smart grid environment as a Markov game, where each household is a decentralised agent, and the grid operator produces a price signal that adapts to the energy demand. The main challenge addressed in our approach is partial observability and perceived non-stationarity of the environment from the viewpoint of each agent. We propose a multi-agent extension of a deep actor-critic algorithm that shows success in learning in this environment. This algorithm learns a centralised critic that coordinates training of all agents. Our approach thus uses centralised learning but decentralised execution. Simulation results show that our online deep reinforcement learning method can reduce both the peak-to-average ratio of total energy consumed and the cost of electricity for all households based purely on instantaneous observations and a price signal.


Multiagent Value Iteration Algorithms in Dynamic Programming and Reinforcement Learning

arXiv.org Artificial Intelligence

We consider infinite horizon dynamic programming problems, where the control at each stage consists of several distinct decisions, each one made by one of several agents. In an earlier work we introduced a policy iteration algorithm, where the policy improvement is done one-agent-at-a-time in a given order, with knowledge of the choices of the preceding agents in the order. As a result, the amount of computation for each policy improvement grows linearly with the number of agents, as opposed to exponentially for the standard all-agents-at-once method. For the case of a finite-state discounted problem, we showed convergence to an agent-by-agent optimal policy. In this paper, this result is extended to value iteration and optimistic versions of policy iteration, as well as to more general DP problems where the Bellman operator is a contraction mapping, such as stochastic shortest path problems with all policies being proper.


LIMEtree: Interactively Customisable Explanations Based on Local Surrogate Multi-output Regression Trees

arXiv.org Artificial Intelligence

Systems based on artificial intelligence and machine learning models should be transparent, in the sense of being capable of explaining their decisions to gain humans' approval and trust. While there are a number of explainability techniques that can be used to this end, many of them are only capable of outputting a single one-size-fits-all explanation that simply cannot address all of the explainees' diverse needs. In this work we introduce a model-agnostic and post-hoc local explainability technique for black-box predictions called LIMEtree, which employs surrogate multi-output regression trees. We validate our algorithm on a deep neural network trained for object detection in images and compare it against Local Interpretable Model-agnostic Explanations (LIME). Our method comes with local fidelity guarantees and can produce a range of diverse explanation types, including contrastive and counterfactual explanations praised in the literature. Some of these explanations can be interactively personalised to create bespoke, meaningful and actionable insights into the model's behaviour. While other methods may give an illusion of customisability by wrapping, otherwise static, explanations in an interactive interface, our explanations are truly interactive, in the sense of allowing the user to "interrogate" a black-box model. LIMEtree can therefore produce consistent explanations on which an interactive exploratory process can be built.


Vocabulary Alignment in Openly Specified Interactions

Journal of Artificial Intelligence Research

The problem of achieving common understanding between agents that use different vocabularies has been mainly addressed by techniques that assume the existence of shared external elements, such as a meta-language or a physical environment. In this article, we consider agents that use different vocabularies and only share knowledge of how to perform a task, given by the specification of an interaction protocol. We present a framework that lets agents learn a vocabulary alignment from the experience of interacting. Unlike previous work in this direction, we use open protocols that constrain possible actions instead of defining procedures, making our approach more general. We present two techniques that can be used either to learn an alignment from scratch or to repair an existent one, and we evaluate their performance experimentally.


Reinforcement Learning for Decentralized Stable Matching

arXiv.org Artificial Intelligence

When it comes to finding a match/partner in the real world, it is usually an independent and autonomous task performed by people/entities. For a person, a match can be several things such as a romantic partner, business partner, school, roommate, etc. Our purpose in this paper is to train autonomous agents to find suitable matches for themselves using reinforcement learning. We consider the decentralized two-sided stable matching problem, where an agent is allowed to have at most one partner at a time from the opposite set. Each agent receives some utility for being in a match with a member of the opposite set. We formulate the problem spatially as a grid world environment and having autonomous agents acting independently makes our environment very uncertain and dynamic. We run experiments with various instances of both complete and incomplete weighted preference lists for agents. Agents learn their policies separately, using separate training modules. Our goal is to train agents to find partners such that the outcome is a stable matching if one exists and also a matching with set-equality, meaning the outcome is approximately equally likable by agents from both the sets.


AI scientific Policies in China โ€“ Idees

#artificialintelligence

Artificial intelligence (AI) has evolved into a new era, and its rapid development will profoundly affect the everyday life of citizens worldwide. Countries around the world are establishing governmental strategies and initiatives to guide the development of AI. The Chinese government is using the development of AI as a major strategy to enhance national competitiveness and protect national security. In January 2016, the Chinese State Council released the 13th Five-year Plan on National Science and Technology Innovation, explicitly putting forward the guidance, general requirements, strategic mission and reform measures for Chinese science and technology innovation. Over the next five years, smart manufacturing will be one of the major missions of the "Science and Technology Innovation 2030 Project" and there will be a focus on the development of AI technology.


Type-2 fuzzy reliability redundancy allocation problem and its solution using particle swarm optimization algorithm

arXiv.org Artificial Intelligence

In this paper, the fuzzy multi-objective reliability redundancy allocation problem (FMORRAP) is proposed, which maximizes the system reliability while simultaneously minimizing the system cost under the type 2 fuzzy uncertainty. In the proposed formulation, the higher order uncertainties (such as parametric, manufacturing, environmental, and designers uncertainty) associated with the system are modeled with interval type 2 fuzzy sets (IT2 FS). The footprint of uncertainty of the interval type 2 membership functions (IT2 MFs) accommodates these uncertainties by capturing the multiple opinions from several system experts. We consider IT2 MFs to represent the subsystem reliability and cost, which are to be further aggregated using extension principle to evaluate the total system reliability and cost according to their configurations, i.e., series parallel and parallel series. We proposed a particle swarm optimization (PSO) based novel solution approach to solve the FMORRAP. To demonstrate the applicability of two formulations, namely, series parallel FMORRAP and parallel series FMORRAP, we performed experimental simulations on various numerical data sets. The decision makers/system experts assign different importance to the objectives (system reliability and cost), and these preferences are represented by sets of weights. The optimal results are obtained from our solution approach, and the Pareto optimal front is established using these different weight sets. The genetic algorithm (GA) was implemented to compare the results obtained from our proposed solution approach. A statistical analysis was conducted between PSO and GA, and it was found that the PSO based Pareto solution outperforms the GA.