Paul, Aswin
Biological Neurons Compete with Deep Reinforcement Learning in Sample Efficiency in a Simulated Gameworld
Khajehnejad, Moein, Habibollahi, Forough, Paul, Aswin, Razi, Adeel, Kagan, Brett J.
How do biological systems and machine learning algorithms compare in the number of samples required to show significant improvements in completing a task? We compared the learning efficiency of in vitro biological neural networks to the state-of-the-art deep reinforcement learning (RL) algorithms in a simplified simulation of the game `Pong'. Using DishBrain, a system that embodies in vitro neural networks with in silico computation using a high-density multi-electrode array, we contrasted the learning rate and the performance of these biological systems against time-matched learning from three state-of-the-art deep RL algorithms (i.e., DQN, A2C, and PPO) in the same game environment. This allowed a meaningful comparison between biological neural systems and deep RL. We find that when samples are limited to a real-world time course, even these very simple biological cultures outperformed deep RL algorithms across various game performance characteristics, implying a higher sample efficiency. Ultimately, even when tested across multiple types of information input to assess the impact of higher dimensional data input, biological neurons showcased faster learning than all deep reinforcement learning agents.
On Predictive planning and counterfactual learning in active inference
Paul, Aswin, Isomura, Takuya, Razi, Adeel
Defining and thereby separating the intelligent "agent" from its embodied "environment", which then provides feedback to the agent, is crucial to model intelligent behaviour. Popular approaches, like reinforcement learning (RL), heavily employ such models containing agent-environment loops, which boils down the problem to agent(s) trying to maximise reward in the given uncertain environment Sutton and Barto [2018]. Active inference has emerged in neuroscience as a biologically plausible framework Friston [2010], which adopts a different approach to modelling intelligent behaviour compared to other contemporary methods like RL. In the active inference framework, an agent accumulates and maximises the model evidence during its lifetime to perceive, learn, and make decisions Da Costa et al. [2020], Sajid et al. [2021], Millidge et al. [2020]. However, maximising the model evidence becomes challenging when the agent encounters a highly'entropic' observation (i.e. an unexpected observation) concerning the agent's generative (world) model Da Costa et al. [2020], Sajid et al. [2021], Millidge et al. [2020]. This seemingly intractable objective of maximising model evidence (or minimising the entropy of encountered observations) is achievable by minimising an upper bound on the entropy of observations, called variational free energy Da Costa et al. [2020], Sajid et al. [2021]. Given this general foundation, active inference Friston et al. [2017] offers excellent flexibility in defining the generative model structure for a given problem and has attracted much attention in various domainsKuchling et al. [2020], Deane et al. [2020]. In this work, we develop an efficient decision-making scheme based on active inference by combining'planning' and'learning from experience'.
Active Inference and Intentional Behaviour
Friston, Karl J., Salvatori, Tommaso, Isomura, Takuya, Tschantz, Alexander, Kiefer, Alex, Verbelen, Tim, Koudahl, Magnus, Paul, Aswin, Parr, Thomas, Razi, Adeel, Kagan, Brett, Buckley, Christopher L., Ramstead, Maxwell J. D.
Recent advances in theoretical biology suggest that basal cognition and sentient behaviour are emergent properties of in vitro cell cultures and neuronal networks, respectively. Such neuronal networks spontaneously learn structured behaviours in the absence of reward or reinforcement. In this paper, we characterise this kind of self-organisation through the lens of the free energy principle, i.e., as self-evidencing. We do this by first discussing the definitions of reactive and sentient behaviour in the setting of active inference, which describes the behaviour of agents that model the consequences of their actions. We then introduce a formal account of intentional behaviour, that describes agents as driven by a preferred endpoint or goal in latent state-spaces. We then investigate these forms of (reactive, sentient, and intentional) behaviour using simulations. First, we simulate the aforementioned in vitro experiments, in which neuronal cultures spontaneously learn to play Pong, by implementing nested, free energy minimising processes. The simulations are then used to deconstruct the ensuing predictive behaviour, leading to the distinction between merely reactive, sentient, and intentional behaviour, with the latter formalised in terms of inductive planning. This distinction is further studied using simple machine learning benchmarks (navigation in a grid world and the Tower of Hanoi problem), that show how quickly and efficiently adaptive behaviour emerges under an inductive form of active inference.
On efficient computation in active inference
Paul, Aswin, Sajid, Noor, Da Costa, Lancelot, Razi, Adeel
Despite being recognized as neurobiologically plausible, active inference faces difficulties when employed to simulate intelligent behaviour in complex environments due to its computational cost and the difficulty of specifying an appropriate target distribution for the agent. This paper introduces two solutions that work in concert to address these limitations. First, we present a novel planning algorithm for finite temporal horizons with drastically lower computational complexity. Second, inspired by Z-learning from control theory literature, we simplify the process of setting an appropriate target distribution for new and existing active inference planning schemes. Our first approach leverages the dynamic programming algorithm, known for its computational efficiency, to minimize the cost function used in planning through the Bellman-optimality principle. Accordingly, our algorithm recursively assesses the expected free energy of actions in the reverse temporal order. This improves computational efficiency by orders of magnitude and allows precise model learning and planning, even under uncertain conditions. Our method simplifies the planning process and shows meaningful behaviour even when specifying only the agent's final goal state. The proposed solutions make defining a target distribution from a goal state straightforward compared to the more complicated task of defining a temporally informed target distribution. The effectiveness of these methods is tested and demonstrated through simulations in standard grid-world tasks. These advances create new opportunities for various applications.