Undirected Networks
Monte Carlo Markov Chain (MCMC) explained
MCMC methods are a family of algorithms that uses Markov Chains to perform Monte Carlo estimate. The name gives us a hint, that it is composed of two components -- Monte Carlo and Markov Chain. Let us understand them separately and in their combined form. Monte Carlo method derives its name from a Monte Carlo casino in Monaco. It is a technique for sampling from a probability distribution and using those samples to approximate desired quantity. In other words, it uses randomness to estimate some deterministic quantity of interest.
SONG: Self-Organizing Neural Graphs
Struski, Łukasz, Danel, Tomasz, Śmieja, Marek, Tabor, Jacek, Zieliński, Bartosz
Recent years have seen a surge in research on deep interpretable neural networks with decision trees as one of the most commonly incorporated tools. There are at least three advantages of using decision trees over logistic regression classification models: they are easy to interpret since they are based on binary decisions, they can make decisions faster, and they provide a hierarchy of classes. However, one of the well-known drawbacks of decision trees, as compared to decision graphs, is that decision trees cannot reuse the decision nodes. Nevertheless, decision graphs were not commonly used in deep learning due to the lack of efficient gradient-based training techniques. In this paper, we fill this gap and provide a general paradigm based on Markov processes, which allows for efficient training of the special type of decision graphs, which we call Self-Organizing Neural Graphs (SONG). We provide an extensive theoretical study of SONG, complemented by experiments conducted on Letter, Connect4, MNIST, CIFAR, and TinyImageNet datasets, showing that our method performs on par or better than existing decision models.
Packet Routing with Graph Attention Multi-agent Reinforcement Learning
Mai, Xuan, Fu, Quanzhi, Chen, Yi
Packet routing is a fundamental problem in communication networks that decides how the packets are directed from their source nodes to their destination nodes through some intermediate nodes. With the increasing complexity of network topology and highly dynamic traffic demand, conventional model-based and rule-based routing schemes show significant limitations, due to the simplified and unrealistic model assumptions, and lack of flexibility and adaption. Adding intelligence to the network control is becoming a trend and the key to achieving high-efficiency network operation. In this paper, we develop a model-free and data-driven routing strategy by leveraging reinforcement learning (RL), where routers interact with the network and learn from the experience to make some good routing configurations for the future. Considering the graph nature of the network topology, we design a multi-agent RL framework in combination with Graph Neural Network (GNN), tailored to the routing problem. Three deployment paradigms, centralized, federated, and cooperated learning, are explored respectively. Simulation results demonstrate that our algorithm outperforms some existing benchmark algorithms in terms of packet transmission delay and affordable load.
Reinforcement Learning with Formal Performance Metrics for Quadcopter Attitude Control under Non-nominal Contexts
Bernini, Nicola, Bessa, Mikhail, Delmas, Rémi, Gold, Arthur, Goubault, Eric, Pennec, Romain, Putot, Sylvie, Sillion, François
We explore the reinforcement learning approach to designing controllers by extensively discussing the case of a quadcopter attitude controller. We provide all details allowing to reproduce our approach, starting with a model of the dynamics of a crazyflie 2.0 under various nominal and non-nominal conditions, including partial motor failures and wind gusts. We develop a robust form of a signal temporal logic to quantitatively evaluate the vehicle's behavior and measure the performance of controllers. The paper thoroughly describes the choices in training algorithms, neural net architecture, hyperparameters, observation space in view of the different performance metrics we have introduced. We discuss the robustness of the obtained controllers, both to partial loss of power for one rotor and to wind gusts and finish by drawing conclusions on practical controller design by reinforcement learning.
Dual Slot Selector via Local Reliability Verification for Dialogue State Tracking
Guo, Jinyu, Shuang, Kai, Li, Jijie, Wang, Zihan
The goal of dialogue state tracking (DST) is to predict the current dialogue state given all previous dialogue contexts. Existing approaches generally predict the dialogue state at every turn from scratch. However, the overwhelming majority of the slots in each turn should simply inherit the slot values from the previous turn. Therefore, the mechanism of treating slots equally in each turn not only is inefficient but also may lead to additional errors because of the redundant slot value generation. To address this problem, we devise the two-stage DSS-DST which consists of the Dual Slot Selector based on the current turn dialogue, and the Slot Value Generator based on the dialogue history. The Dual Slot Selector determines each slot whether to update slot value or to inherit the slot value from the previous turn from two aspects: (1) if there is a strong relationship between it and the current turn dialogue utterances; (2) if a slot value with high reliability can be obtained for it through the current turn dialogue. The slots selected to be updated are permitted to enter the Slot Value Generator to update values by a hybrid method, while the other slots directly inherit the values from the previous turn. Empirical results show that our method achieves 56.93%, 60.73%, and 58.04% joint accuracy on MultiWOZ 2.0, MultiWOZ 2.1, and MultiWOZ 2.2 datasets respectively and achieves a new state-of-the-art performance with significant improvements.
Rational Verification for Probabilistic Systems
Gutierrez, Julian, Hammond, Lewis, Lin, Anthony W., Najib, Muhammad, Wooldridge, Michael
Rational verification is the problem of determining which temporal logic properties will hold in a multi-agent system, under the assumption that agents in the system act rationally, by choosing strategies that collectively form a game-theoretic equilibrium. Previous work in this area has largely focussed on deterministic systems. In this paper, we develop the theory and algorithms for rational verification in probabilistic systems. We focus on concurrent stochastic games (CSGs), which can be used to model uncertainty and randomness in complex multi-agent environments. We study the rational verification problem for both non-cooperative games and cooperative games in the qualitative probabilistic setting. In the former case, we consider LTL properties satisfied by the Nash equilibria of the game and in the latter case LTL properties satisfied by the core. In both cases, we show that the problem is 2EXPTIME-complete, thus not harder than the much simpler verification problem of model checking LTL properties of systems modelled as Markov decision processes (MDPs).
Ideas Lab's P.U.N.C.H. Model
Over the last several years, Ideas Labs has been building a suite of advanced markerless AI to analyze various aspects of human motion, from biomechanics to event tracking to a bird's eye view of an entire rink with player identification. While our technology is more sport-agonstic, we've been focusing on golf and baseball -- two stick-based sports with discrete arc of motions (sorry, Gumby). More recently, we've begun looking at martial arts with a specific relevance of our motion-based analytics of play (in this specific case of boxing, defense, stepping and attack, and threading). As in other sports, there is a broad base of literature around applying various sensor-based and motion capture-based analytics in boxing. One paper, by Khasanshin (2021), leveraged sensors applied on the wrist to analyze the speed of punches of boxes while shadow boxing in specific types of punches (jab, cross, hook, and uppercut) and type of activity (shadow boxing, single punch, or multiple punches).
Reinforced Imitation Learning by Free Energy Principle
Ogishima, Ryoya, Karino, Izumi, Kuniyoshi, Yasuo
Reinforcement Learning (RL) requires a large amount of exploration especially in sparse-reward settings. Imitation Learning (IL) can learn from expert demonstrations without exploration, but it never exceeds the expert's performance and is also vulnerable to distributional shift between demonstration and execution. In this paper, we radically unify RL and IL based on Free Energy Principle (FEP). FEP is a unified Bayesian theory of the brain that explains perception, action and model learning by a common fundamental principle. We present a theoretical extension of FEP and derive an algorithm in which an agent learns the world model that internalizes expert demonstrations and at the same time uses the model to infer the current and future states and actions that maximize rewards. The algorithm thus reduces exploration costs by partially imitating experts as well as maximizing its return in a seamless way, resulting in a higher performance than the suboptimal expert. Our experimental results show that this approach is promising in visual control tasks especially in sparse-reward environments.
A Survey of Monte Carlo Methods for Parameter Estimation
Luengo, D., Martino, L., Bugallo, M., Elvira, V., Särkkä, S.
Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE) estimators. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and the Monte Carlo (MC) methodology is one feasible approach. MC methods proceed by drawing random samples, either from the desired distribution or from a simpler one, and using them to compute consistent estimators. The most important families of MC algorithms are Markov chain MC (MCMC) and importance sampling (IS). On the one hand, MCMC methods draw samples from a proposal density, building then an ergodic Markov chain whose stationary distribution is the desired distribution by accepting or rejecting those candidate samples as the new state of the chain. On the other hand, IS techniques draw samples from a simple proposal density, and then assign them suitable weights that measure their quality in some appropriate way. In this paper, we perform a thorough review of MC methods for the estimation of static parameters in signal processing applications. A historical note on the development of MC schemes is also provided, followed by the basic MC method and a brief description of the rejection sampling (RS) algorithm, as well as three sections describing many of the most relevant MCMC and IS algorithms, and their combined use.
Bangla sign language recognition using concatenated BdSL network
Abedin, Thasin, Prottoy, Khondokar S. S., Moshruba, Ayana, Hakim, Safayat Bin
Sign language is the only medium of communication for the hearing impaired and the deaf and dumb community. Communication with the general mass is thus always a challenge for this minority group. Especially in Bangla sign language (BdSL), there are 38 alphabets with some having nearly identical symbols. As a result, in BdSL recognition, the posture of hand is an important factor in addition to visual features extracted from traditional Convolutional Neural Network (CNN). In this paper, a novel architecture "Concatenated BdSL Network" is proposed which consists of a CNN based image network and a pose estimation network. While the image network gets the visual features, the relative positions of hand keypoints are taken by the pose estimation network to obtain the additional features to deal with the complexity of the BdSL symbols. A score of 91.51% was achieved by this novel approach in test set and the effectiveness of the additional pose estimation network is suggested by the experimental results.