Goto

Collaborating Authors

 Undirected Networks


Protein Language Models and Structure Prediction: Connection and Progression

arXiv.org Artificial Intelligence

The prediction of protein structures from sequences is an important task for function prediction, drug design, and related biological processes understanding. Recent advances have proved the power of language models (LMs) in processing the protein sequence databases, which inherit the advantages of attention networks and capture useful information in learning representations for proteins. The past two years have witnessed remarkable success in tertiary protein structure prediction (PSP), including evolution-based and single-sequence-based PSP. It seems that instead of using energy-based models and sampling procedures, protein language model (pLM)-based pipelines have emerged as mainstream paradigms in PSP. Despite the fruitful progress, the PSP community needs a systematic and up-to-date survey to help bridge the gap between LMs in the natural language processing (NLP) and PSP domains and introduce their methodologies, advancements and practical applications. To this end, in this paper, we first introduce the similarities between protein and human languages that allow LMs extended to pLMs, and applied to protein databases. Then, we systematically review recent advances in LMs and pLMs from the perspectives of network architectures, pre-training strategies, applications, and commonly-used protein databases. Next, different types of methods for PSP are discussed, particularly how the pLM-based architectures function in the process of protein folding. Finally, we identify challenges faced by the PSP community and foresee promising research directions along with the advances of pLMs. This survey aims to be a hands-on guide for researchers to understand PSP methods, develop pLMs and tackle challenging problems in this field for practical purposes.


Reinforcement Learning Methods for Wordle: A POMDP/Adaptive Control Approach

arXiv.org Artificial Intelligence

In this paper, we discuss a Reinforcement Learning (RL) approach towards a class of sequential decision problems, exemplified for the popular Wordle puzzle that appears daily in the New York Times. Wordle involves a list of 5-letter mystery words, which is a subset of a larger list of guess words. A word is selected at random from the mystery list, and the objective is to find that word by sequentially selecting no more than six words from the guess list. Each guess word selection provides information about the letters contained in the hidden mystery word according to a given set of rules, which involves color coding of letters shared by the guess word and the mystery word. We will adopt a more general point of view, by considering a broad class of problems that include Wordle as a special case. In particular, the problems that we consider include sequential search situations, where the objective is to guess correctly an unknown object from a given finite set of objects (the set of mystery words in the Wordle context), by using a sequence of decisions from a finite set (the set of guess words in Wordle), which result in a sequence of corresponding observations (the information outcomes of the guesses in Wordle). We aim to minimize some cost function, such as the expected number of observations required to determine the unknown object. Within the search context just described, some basic information theory concepts are relevant, which have already been applied to Wordle, and are important for our methodology.


Parameterisation of Reasoning on Temporal Markov Logic Networks

arXiv.org Artificial Intelligence

We aim at improving reasoning on inconsistent and uncertain data. We focus on knowledge-graph data, extended with time intervals to specify their validity, as regularly found in historical sciences. We propose principles on semantics for efficient Maximum A-Posteriori inference on the new Temporal Markov Logic Networks (TMLN) which extend the Markov Logic Networks (MLN) by uncertain temporal facts and rules. We examine total and partial temporal (in)consistency relations between sets of temporal formulae. Then we propose a new Temporal Parametric Semantics, which may combine several sub-functions, allowing to use different assessment strategies. Finally, we expose the constraints that semantics must respect to satisfy our principles.


Learning and Understanding a Disentangled Feature Representation for Hidden Parameters in Reinforcement Learning

arXiv.org Artificial Intelligence

Hidden parameters are latent variables in reinforcement learning (RL) environments that are constant over the course of a trajectory. Understanding what, if any, hidden parameters affect a particular environment can aid both the development and appropriate usage of RL systems. We present an unsupervised method to map RL trajectories into a feature space where distance represents the relative difference in system behavior due to hidden parameters. Our approach disentangles the effects of hidden parameters by leveraging a recurrent neural network (RNN) world model as used in model-based RL. First, we alter the standard world model training algorithm to isolate the hidden parameter information in the world model memory. Then, we use a metric learning approach to map the RNN memory into a space with a distance metric approximating a bisimulation metric with respect to the hidden parameters. The resulting disentangled feature space can be used to meaningfully relate trajectories to each other and analyze the hidden parameter. We demonstrate our approach on four hidden parameters across three RL environments. Finally we present two methods to help identify and understand the effects of hidden parameters on systems.


Safer Motion Planning of Steerable Needles via a Shaft-to-Tissue Force Model

arXiv.org Artificial Intelligence

Steerable needles are capable of accurately targeting difficult-to-reach clinical sites in the body. By bending around sensitive anatomical structures, steerable needles have the potential to reduce the invasiveness of many medical procedures. However, inserting these needles with curved trajectories increases the risk of tissue damage due to perpendicular forces exerted on the surrounding tissue by the needle's shaft, potentially resulting in lateral shearing through tissue. Such forces can cause significant damage to surrounding tissue, negatively affecting patient outcomes. In this work, we derive a tissue and needle force model based on a Cosserat string formulation, which describes the normal forces and frictional forces along the shaft as a function of the planned needle path, friction model and parameters, and tip piercing force. We propose this new force model and associated cost function as a safer and more clinically relevant metric than those currently used in motion planning for steerable needles. We fit and validate our model through physical needle robot experiments in a gel phantom. We use this force model to define a bottleneck cost function for motion planning and evaluate it against the commonly used path-length cost function in hundreds of randomly generated 3-D environments. Plans generated with our force-based cost show a 62% reduction in the peak modeled tissue force with only a 0.07% increase in length on average compared to using the path-length cost in planning. Additionally, we demonstrate the ability to plan motions with our force-based cost function in a lung tumor biopsy scenario from a segmented computed tomography (CT) scan. By planning motions for the needle that aim to minimize the modeled needle-to-tissue force explicitly, our method plans needle paths that may reduce the risk of significant tissue damage while still reaching desired targets in the body.


Will My Robot Achieve My Goals? Predicting the Probability that an MDP Policy Reaches a User-Specified Behavior Target

arXiv.org Artificial Intelligence

As an autonomous system performs a task, it should maintain a calibrated estimate of the probability that it will achieve the user's goal. If that probability falls below some desired level, it should alert the user so that appropriate interventions can be made. This paper considers settings where the user's goal is specified as a target interval for a real-valued performance summary, such as the cumulative reward, measured at a fixed horizon $H$. At each time $t \in \{0, \ldots, H-1\}$, our method produces a calibrated estimate of the probability that the final cumulative reward will fall within a user-specified target interval $[y^-,y^+].$ Using this estimate, the autonomous system can raise an alarm if the probability drops below a specified threshold. We compute the probability estimates by inverting conformal prediction. Our starting point is the Conformalized Quantile Regression (CQR) method of Romano et al., which applies split-conformal prediction to the results of quantile regression. CQR is not invertible, but by using the conditional cumulative distribution function (CDF) as the non-conformity measure, we show how to obtain an invertible modification that we call \textbf{P}robability-space \textbf{C}onformalized \textbf{Q}uantile \textbf{R}egression (PCQR). Like CQR, PCQR produces well-calibrated conditional prediction intervals with finite-sample marginal guarantees. By inverting PCQR, we obtain marginal guarantees for the probability that the cumulative reward of an autonomous system will fall within an arbitrary user-specified target intervals. Experiments on two domains confirm that these probabilities are well-calibrated.


Reinforcement Learning Basics With Examples (Markov Chain and Tree Search) - neptune.ai

#artificialintelligence

Have you ever played against the computer in a video game, and wondered how it gets so good? Well, a big part of it is reinforcement learning. Reinforcement Learning (RL) is a machine learning domain that focuses on building self-improving systems that learn for their own actions and experiences in an interactive environment. In RL, the system (learner) will learn what to do and how to do based on rewards. Unlike other machine learning algorithms, we don't tell the system what to do. It autonomously explores and discovers which action can yield the most rewards. Reinforcement problems are considered a closed-loop because the system's present actions will influence its later inputs. "Reinforcement Learning, in the context of machine learning and artificial intelligence, is a type of dynamic programming that trains algorithms using a system of reward and punishment." In this article, we're going to explore reinforcement learning in-depth along with some practical examples.


A generalized stacked reinforcement learning method for sampled systems

arXiv.org Artificial Intelligence

A common setting of reinforcement learning (RL) is a Markov decision process (MDP) in which the environment is a stochastic discrete-time dynamical system. Whereas MDPs are suitable in such applications as video-games or puzzles, physical systems are time-continuous. A general variant of RL is of digital format, where updates of the value (or cost) and policy are performed at discrete moments in time. The agent-environment loop then amounts to a sampled system, whereby sample-and-hold is a specific case. In this paper, we propose and benchmark two RL methods suitable for sampled systems. Specifically, we hybridize model-predictive control (MPC) with critics learning the optimal Q- and value (or cost-to-go) function. Optimality is analyzed and performance comparison is done in an experimental case study with a mobile robot.


Tackling Visual Control via Multi-View Exploration Maximization

arXiv.org Artificial Intelligence

We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.


Performance Evaluation, Optimization and Dynamic Decision in Blockchain Systems: A Recent Overview

arXiv.org Artificial Intelligence

With rapid development of blockchain technology as well as integration of various application areas, performance evaluation, performance optimization, and dynamic decision in blockchain systems are playing an increasingly important role in developing new blockchain technology. This paper provides a recent systematic overview of this class of research, and especially, developing mathematical modeling and basic theory of blockchain systems. Important examples include (a) performance evaluation: Markov processes, queuing theory, Markov reward processes, random walks, fluid and diffusion approximations, and martingale theory; (b) performance optimization: Linear programming, nonlinear programming, integer programming, and multi-objective programming; (c) optimal control and dynamic decision: Markov decision processes, and stochastic optimal control; and (d) artificial intelligence: Machine learning, deep reinforcement learning, and federated learning. So far, a little research has focused on these research lines. We believe that the basic theory with mathematical methods, algorithms and simulations of blockchain systems discussed in this paper will strongly support future development and continuous innovation of blockchain technology.