Röhm, André
Blending Optimal Control and Biologically Plausible Learning for Noise-Robust Physical Neural Networks
Sunada, Satoshi, Niiyama, Tomoaki, Kanno, Kazutaka, Nogami, Rin, Röhm, André, Awano, Takato, Uchida, Atsushi
The rapidly increasing computational demands for artificial intelligence (AI) have spurred the exploration of computing principles beyond conventional digital computers. Physical neural networks (PNNs) offer efficient neuromorphic information processing by harnessing the innate computational power of physical processes; however, training their weight parameters is computationally expensive. We propose a training approach for substantially reducing this training cost. Our training approach merges an optimal control method for continuous-time dynamical systems with a biologically plausible training method--direct feedback alignment. In addition to the reduction of training time, this approach achieves robust processing even under measurement errors and noise without requiring detailed system information. The effectiveness was numerically and experimentally verified in an optoelectronic delay system. Our approach significantly extends the range of physical systems practically usable as PNNs.
Decentralized multi-agent reinforcement learning algorithm using a cluster-synchronized laser network
Kotoku, Shun, Mihana, Takatomo, Röhm, André, Horisaki, Ryoichi
Multi-agent reinforcement learning (MARL) studies crucial principles that are applicable to a variety of fields, including wireless networking and autonomous driving. We propose a photonic-based decision-making algorithm to address one of the most fundamental problems in MARL, called the competitive multi-armed bandit (CMAB) problem. Our numerical simulations demonstrate that chaotic oscillations and cluster synchronization of optically coupled lasers, along with our proposed decentralized coupling adjustment, efficiently balance exploration and exploitation while facilitating cooperative decision-making without explicitly sharing information among agents. Our study demonstrates how decentralized reinforcement learning can be achieved by exploiting complex physical processes controlled by simple algorithms.
Asymmetric leader-laggard cluster synchronization for collective decision-making with laser network
Kotoku, Shun, Mihana, Takatomo, Röhm, André, Horisaki, Ryoichi, Naruse, Makoto
Photonic accelerators [1] have been gaining attention in recent years, and a variety of implementations and applications have now been explored [2-9]. These advancements can be attributed to a growing awareness of the saturating speed of performance improvements in conventional computational systems [10], despite the soaring demands for information processing in an extensive range of applications, especially in machine learning. Reinforcement learning [11] is a subfield of machine learning that involves optimizing computer outputs or actions to maximize the reward function. Its applications are now essential to our daily lives, ranging from self-driving vehicles [12] and targeted advertising [13] to wireless networking [14], and there is now a strong demand for computational acceleration. Specifically, what we focus on here is decision-making.
Bandit Algorithm Driven by a Classical Random Walk and a Quantum Walk
Yamagami, Tomoki, Segawa, Etsuo, Mihana, Takatomo, Röhm, André, Horisaki, Ryoichi, Naruse, Makoto
A random walk(RW) is one of the most ubiquitous stochastic processes and is employed for both mathematical analyses and applications, such as describing real-world phenomena and constructing various algorithms. Meanwhile, along with the increasing interest in quantum mechanics from both theoretical and applied perspectives, the quantum counterpart of a RW, known as a quantum walk (QW), is also attracting attention [1-4]. A QW includes the effects of quantum superposition or time evolution. In classical RWs, a random walker(RWer) selects in which direction to go probabilistically at each time step, and thus one can track where the RWer is at any time step. On the other hand, in QWs, one cannot tell where a quantum walker (QWer) exists during the time evolution, and the location is determined only after conducting the measurement. QWs have a property that classical RWs do not possess: the coexistence of linear spreading and localization [5, 6]. As a result, QWs show probability distributions that are totally different from those of random walks, which weakly converge to normal distributions. The former behavior, linear spreading, means that the standard deviation of the probability distribution of measurement of quantum walkers (QWers) grows in proportion to the run time t.
Asymmetric quantum decision-making
Shiratori, Honoka, Shinkawa, Hiroaki, Röhm, André, Chauvet, Nicolas, Segawa, Etsuo, Laurent, Jonathan, Bachelier, Guillaume, Yamagami, Tomoki, Horisaki, Ryoichi, Naruse, Makoto
Collective decision-making is crucial to information and communication systems. Decision conflicts among agents hinder the maximization of potential utilities of the entire system. Quantum processes can realize conflict-free joint decisions among two agents using the entanglement of photons or quantum interference of orbital angular momentum (OAM). However, previous studies have always presented symmetric resultant joint decisions. Although this property helps maintain and preserve equality, it cannot resolve disparities. Global challenges, such as ethics and equity, are recognized in the field of responsible artificial intelligence as responsible research and innovation paradigm. Thus, decision-making systems must not only preserve existing equality but also tackle disparities. This study theoretically and numerically investigates asymmetric collective decision-making using quantum interference of photons carrying OAM or entangled photons. Although asymmetry is successfully realized, a photon loss is inevitable in the proposed models. The available range of asymmetry and method for obtaining the desired degree of asymmetry are analytically formulated.
Effect of temporal resolution on the reproduction of chaotic dynamics via reservoir computing
Tsuchiyama, Kohei, Röhm, André, Mihana, Takatomo, Horisaki, Ryoichi, Naruse, Makoto
Reservoir computing is a machine learning paradigm that uses a structure called a reservoir, which has nonlinearities and short-term memory. In recent years, reservoir computing has expanded to new functions such as the autonomous generation of chaotic time series, as well as time series prediction and classification. Furthermore, novel possibilities have been demonstrated, such as inferring the existence of previously unseen attractors. Sampling, in contrast, has a strong influence on such functions. Sampling is indispensable in a physical reservoir computer that uses an existing physical system as a reservoir because the use of an external digital system for the data input is usually inevitable. This study analyzes the effect of sampling on the ability of reservoir computing to autonomously regenerate chaotic time series. We found, as expected, that excessively coarse sampling degrades the system performance, but also that excessively dense sampling is unsuitable. Based on quantitative indicators that capture the local and global characteristics of attractors, we identify a suitable window of the sampling frequency and discuss its underlying mechanisms.
Bandit approach to conflict-free multi-agent Q-learning in view of photonic implementation
Shinkawa, Hiroaki, Chauvet, Nicolas, Röhm, André, Mihana, Takatomo, Horisaki, Ryoichi, Bachelier, Guillaume, Naruse, Makoto
Recently, extensive studies on photonic reinforcement learning to accelerate the process of calculation by exploiting the physical nature of light have been conducted. Previous studies utilized quantum interference of photons to achieve collective decision-making without choice conflicts when solving the competitive multi-armed bandit problem, a fundamental example of reinforcement learning. However, the bandit problem deals with a static environment where the agent's action does not influence the reward probabilities. This study aims to extend the conventional approach to a more general multi-agent reinforcement learning targeting the grid world problem. Unlike the conventional approach, the proposed scheme deals with a dynamic environment where the reward changes because of agents' actions. A successful photonic reinforcement learning scheme requires both a photonic system that contributes to the quality of learning and a suitable algorithm. This study proposes a novel learning algorithm, discontinuous bandit Q-learning, in view of a potential photonic implementation. Here, state-action pairs in the environment are regarded as slot machines in the context of the bandit problem and an updated amount of Q-value is regarded as the reward of the bandit problem. We perform numerical simulations to validate the effectiveness of the bandit algorithm. In addition, we propose a multi-agent architecture in which agents are indirectly connected through quantum interference of light and quantum principles ensure the conflict-free property of state-action pair selections among agents. We demonstrate that multi-agent reinforcement learning can be accelerated owing to conflict avoidance among multiple agents.
Conflict-free joint sampling for preference satisfaction through quantum interference
Shinkawa, Hiroaki, Chauvet, Nicolas, Röhm, André, Mihana, Takatomo, Horisaki, Ryoichi, Bachelier, Guillaume, Naruse, Makoto
Collective decision-making is vital for recent information and communications technologies. In our previous research, we mathematically derived conflict-free joint decision-making that optimally satisfies players' probabilistic preference profiles. However, two problems exist regarding the optimal joint decision-making method. First, as the number of choices increases, the computational cost of calculating the optimal joint selection probability matrix explodes. Second, to derive the optimal joint selection probability matrix, all players must disclose their probabilistic preferences. Now, it is noteworthy that explicit calculation of the joint probability distribution is not necessarily needed; what is necessary for collective decisions is sampling. This study examines several sampling methods that converge to heuristic joint selection probability matrices that satisfy players' preferences. We show that they can significantly reduce the above problems of computational cost and confidentiality. We analyze the probability distribution each of the sampling methods converges to, as well as the computational cost required and the confidentiality secured. In particular, we introduce two conflict-free joint sampling methods through quantum interference of photons. The first system allows the players to hide their choices while satisfying the players' preferences almost perfectly when they have the same preferences. The second system, where the physical nature of light replaces the expensive computational cost, also conceals their choices under the assumption that they have a trusted third party. This paper has been published in Phys. Rev. Applied 18, 064018 (2022) (DOI: 10.1103/PhysRevApplied.18.064018).
Learning unseen coexisting attractors
Gauthier, Daniel J., Fischer, Ingo, Röhm, André
Reservoir computing is a machine learning approach that can generate a surrogate model of a dynamical system. It can learn the underlying dynamical system using fewer trainable parameters and hence smaller training data sets than competing approaches. Recently, a simpler formulation, known as next-generation reservoir computing, removes many algorithm metaparameters and identifies a well-performing traditional reservoir computer, thus simplifying training even further. Here, we study a particularly challenging problem of learning a dynamical system that has both disparate time scales and multiple co-existing dynamical states (attractors). We compare the next-generation and traditional reservoir computer using metrics quantifying the geometry of the ground-truth and forecasted attractors. For the studied four-dimensional system, the next-generation reservoir computing approach uses $\sim 1.7 \times$ less training data, requires $10^3 \times$ shorter `warm up' time, has fewer metaparameters, and has an $\sim 100\times$ higher accuracy in predicting the co-existing attractor characteristics in comparison to a traditional reservoir computer. Furthermore, we demonstrate that it predicts the basin of attraction with high accuracy. This work lends further support to the superior learning ability of this new machine learning algorithm for dynamical systems.
Optimal preference satisfaction for conflict-free joint decisions
Shinkawa, Hiroaki, Chauvet, Nicolas, Bachelier, Guillaume, Röhm, André, Horisaki, Ryoichi, Naruse, Makoto
We all have preferences when multiple choices are available. If we insist on satisfying our preferences only, we may suffer a loss due to conflicts with other people's identical selections. Such a case applies when the choice cannot be divided into multiple pieces due to the intrinsic nature of the resources. Former studies, such as the top trading cycle, examined how to conduct fair joint decision-making while avoiding decision conflicts from the perspective of game theory when multiple players have their own deterministic preference profiles. However, in reality, probabilistic preferences can naturally appear in relation to the stochastic decision-making of humans. Here, we theoretically derive conflict-free joint decision-making that can satisfy the probabilistic preferences of all individual players. More specifically, we mathematically prove the conditions wherein the deviation of the resultant chance of obtaining each choice from the individual preference profile, which we call the loss, becomes zero, meaning that all players' satisfaction is perfectly appreciated while avoiding decision conflicts. Furthermore, even in situations where zero-loss conflict-free joint decision-making is unachievable, we show how to derive joint decision-making that accomplishes the theoretical minimum loss while ensuring conflict-free choices. Numerical demonstrations are also shown with several benchmarks.