Markov Models
Markov Chain Monte Carlo with Gaussian Process Emulation for a 1D Hemodynamics Model of CTEPH
Kachabi, Amirreza, Colebank, Mitchel J., Correa, Sofia Altieri, Chesler, Naomi C.
SUMMARY Microvascular disease is a contributor to persistent pulmonary hypertension in those with chronic thromboembolic pulmonary hypertension (CTEPH). The heterogenous nature of the micro and macrovascular defects motivates the use of personalized computational models, which can predict flow dynamics within multiple generations of the arterial tree and into the microvasculature. Our study uses computational hemodynamics models and Gaussian processes for rapid, subject-specific calibration using retrospective data from a large animal model of CTEPH. Our subject-specific predictions shed light on microvascular dysfunction and arterial wall shear stress changes in CTEPH. Key words: 1D fluid dynamics, gaussian processes, parameter inference, emulation 1 INTRODUCTION Chronic thromboembolic pulmonary hypertension (CTEPH) is a subgroup of pulmonary hypertension (PH) caused by blood clots that lodge in the blood vessels of the lungs.
Learning to Beat ByteRL: Exploitability of Collectible Card Game Agents
Haluska, Radovan, Schmid, Martin
The goal of the game is to decrease While Poker, as a family of games, has been studied extensively in the opponent's health to zero. There are many popular collectible the last decades, collectible card games have seen relatively little card games, such as Magic: The Gathering [24], Hearthstone [3], attention. Only recently have we seen an agent that can compete The Elder Scrolls: Legends [20] and many others. A trait that makes with professional human players in Hearthstone, one of the most collectible card games appealing to human players and challenging popular collectible card games. Although artificial agents must be for AI agents is the broad range of ways to mix and match available able to work with imperfect information in both of these genres, cards into decks. Even small collectible card games with tens of collectible card games pose another set of distinct challenges. Unlike available cards can offer more potential decks than the total number in many poker variants, agents must deal with state space so vast of atoms in the universe [17].
Developing Acoustic Models for Automatic Speech Recognition in Swedish
This paper is concerned with automatic continuous speech recognition using trainable systems. The aim of this work is to build acoustic models for spoken Swedish. This is done employing hidden Markov models and using the SpeechDat database to train their parameters. Acoustic modeling has been worked out at a phonetic level, allowing general speech recognition applications, even though a simplified task (digits and natural number recognition) has been considered for model evaluation. Different kinds of phone models have been tested, including context independent models and two variations of context dependent models. Furthermore many experiments have been done with bigram language models to tune some of the system parameters. System performance over various speaker subsets with different sex, age and dialect has also been examined. Results are compared to previous similar studies showing a remarkable improvement.
The Trembling-Hand Problem for LTLf Planning
Yu, Pian, Zhu, Shufang, De Giacomo, Giuseppe, Kwiatkowska, Marta, Vardi, Moshe
Consider an agent acting to achieve its temporal goal, but with a "trembling hand". In this case, the agent may mistakenly instruct, with a certain (typically small) probability, actions that are not intended due to faults or imprecision in its action selection mechanism, thereby leading to possible goal failure. We study the trembling-hand problem in the context of reasoning about actions and planning for temporally extended goals expressed in Linear Temporal Logic on finite traces (LTLf), where we want to synthesize a strategy (aka plan) that maximizes the probability of satisfying the LTLf goal in spite of the trembling hand. We consider both deterministic and nondeterministic (adversarial) domains. We propose solution techniques for both cases by relying respectively on Markov Decision Processes and on Markov Decision Processes with Set-valued Transitions with LTLf objectives, where the set-valued probabilistic transitions capture both the nondeterminism from the environment and the possible action instruction errors from the agent. We formally show the correctness of our solution techniques and demonstrate their effectiveness experimentally through a proof-of-concept implementation.
Sequential Inference of Hospitalization Electronic Health Records Using Probabilistic Models
Kaplan, Alan D., Ray, Priyadip, Greene, John D., Liu, Vincent X.
In the dynamic hospital setting, decision support can be a valuable tool for improving patient outcomes. Data-driven inference of future outcomes is challenging in this dynamic setting, where long sequences such as laboratory tests and medications are updated frequently. This is due in part to heterogeneity of data types and mixed-sequence types contained in variable length sequences. In this work we design a probabilistic unsupervised model for multiple arbitrary-length sequences contained in hospitalization Electronic Health Record (EHR) data. The model uses a latent variable structure and captures complex relationships between medications, diagnoses, laboratory tests, neurological assessments, and medications. It can be trained on original data, without requiring any lossy transformations or time binning. Inference algorithms are derived that use partial data to infer properties of the complete sequences, including their length and presence of specific values. We train this model on data from subjects receiving medical care in the Kaiser Permanente Northern California integrated healthcare delivery system. The results are evaluated against held-out data for predicting the length of sequences and presence of Intensive Care Unit (ICU) in hospitalization bed sequences. Our method outperforms a baseline approach, showing that in these experiments the trained model captures information in the sequences that is informative of their future values.
Supercompiler Code Optimization with Zero-Shot Reinforcement Learning
Wu, Jialong, Deng, Chaoyi, Wang, Jianmin, Long, Mingsheng
Effective code optimization in compilers plays a central role in computer and software engineering. While compilers can be made to automatically search the optimization space without the need for user interventions, this is not a standard practice since the search is slow and cumbersome. Here we present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent. To overcome the huge range of possible test programs, we prepare a large dataset of training programs that emphasize quality, naturalness, and diversity. To tackle the vast space of possible optimizations, we adapt deep reinforcement learning to train the agent in a sample-efficient manner through interacting with a world model of the compiler environment. Evaluation on both benchmark suites and production-level code optimization problems demonstrates our agent's supercompiler performances and zero-shot generalization abilities, outperforming built-in optimization options designed by compiler experts. Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.
Human-in-the-loop Learning for Dynamic Congestion Games
Today mobile users learn and share their traffic observations via crowdsourcing platforms (e.g., Waze). Yet such platforms simply cater to selfish users' myopic interests to recommend the shortest path, and do not encourage enough users to travel and learn other paths for future others. Prior studies focus on one-shot congestion games without considering users' information learning, while our work studies how users learn and alter traffic conditions on stochastic paths in a human-in-the-loop manner. Our analysis shows that the myopic routing policy leads to severe under-exploration of stochastic paths. This results in a price of anarchy (PoA) greater than $2$, as compared to the socially optimal policy in minimizing the long-term social cost. Besides, the myopic policy fails to ensure the correct learning convergence about users' traffic hazard beliefs. To address this, we focus on informational (non-monetary) mechanisms as they are easier to implement than pricing. We first show that existing information-hiding mechanisms and deterministic path-recommendation mechanisms in Bayesian persuasion literature do not work with even (\text{PoA}=\infty). Accordingly, we propose a new combined hiding and probabilistic recommendation (CHAR) mechanism to hide all information from a selected user group and provide state-dependent probabilistic recommendations to the other user group. Our CHAR successfully ensures PoA less than (\frac{5}{4}), which cannot be further reduced by any other informational (non-monetary) mechanism. Besides the parallel network, we further extend our analysis and CHAR to more general linear path graphs with multiple intermediate nodes, and we prove that the PoA results remain unchanged. Additionally, we carry out experiments with real-world datasets to further extend our routing graphs and verify the close-to-optimal performance of our CHAR.
Nested Inheritance Dynamics
The idea of the inheritance of biological processes, such as the developmental process or the life cycle of an organism, has been discussed in the biology literature, but formal mathematical descriptions and plausible data analysis frameworks are lacking. We introduce an extension of the nested Dirichlet Process (nDP) to a multiscale model to aid in understanding the mechanisms by which biological processes are inherited, remain stable, and are modified across generations. To address these issues, we introduce Nested Inheritance Dynamics Algorithm (NIDA). At its primary level, NIDA encompasses all processes unfolding within an individual organism's lifespan. The secondary level delineates the dynamics through which these processes evolve or remain stable over time. This framework allows for the specification of a physical system model at either scale, thus promoting seamless integration with established models of development and heredity.
Safe POMDP Online Planning among Dynamic Agents via Adaptive Conformal Prediction
Sheng, Shili, Yu, Pian, Parker, David, Kwiatkowska, Marta, Feng, Lu
Online planning for partially observable Markov decision processes (POMDPs) provides efficient techniques for robot decision-making under uncertainty. However, existing methods fall short of preventing safety violations in dynamic environments. This work presents a novel safe POMDP online planning approach that offers probabilistic safety guarantees amidst environments populated by multiple dynamic agents. Our approach utilizes data-driven trajectory prediction models of dynamic agents and applies Adaptive Conformal Prediction (ACP) for assessing the uncertainties in these predictions. Leveraging the obtained ACP-based trajectory predictions, our approach constructs safety shields on-the-fly to prevent unsafe actions within POMDP online planning. Through experimental evaluation in various dynamic environments using real-world pedestrian trajectory data, the proposed approach has been shown to effectively maintain probabilistic safety guarantees while accommodating up to hundreds of dynamic agents.
GRSN: Gated Recurrent Spiking Neurons for POMDPs and MARL
Qin, Lang, Wang, Ziming, Jiang, Runhao, Yan, Rui, Tang, Huajin
Spiking neural networks (SNNs) are widely applied in various fields due to their energy-efficient and fast-inference capabilities. Applying SNNs to reinforcement learning (RL) can significantly reduce the computational resource requirements for agents and improve the algorithm's performance under resource-constrained conditions. However, in current spiking reinforcement learning (SRL) algorithms, the simulation results of multiple time steps can only correspond to a single-step decision in RL. This is quite different from the real temporal dynamics in the brain and also fails to fully exploit the capacity of SNNs to process temporal data. In order to address this temporal mismatch issue and further take advantage of the inherent temporal dynamics of spiking neurons, we propose a novel temporal alignment paradigm (T AP) that leverages the single-step update of spiking neurons to accumulate historical state information in RL and introduces gated units to enhance the memory capacity of spiking neurons. Experimental results show that our method can solve partially observable Markov decision processes (POMDPs) and multi-agent cooperation problems with similar performance as recurrent neural networks (RNNs) but with about 50% power consumption.