AITopics

2009.0495

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.36)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Morato, P. G., Papakonstantinou, K. G., Andriotis, C. P., Nielsen, J. S., Rigo, P.

Optimal Inspection and Maintenance Planning for Deteriorating Structures through Dynamic Bayesian Networks and Markov Decision Processes

arXiv.org Artificial IntelligenceSep-9-2020

Civil and maritime engineering systems, among others, from bridges to offshore platforms and wind turbines, must be efficiently managed as they are exposed to deterioration mechanisms throughout their operational life, such as fatigue or corrosion. Identifying optimal inspection and maintenance policies demands the solution of a complex sequential decision-making problem under uncertainty, with the main objective of efficiently controlling the risk associated with structural failures. Addressing this complexity, risk-based inspection planning methodologies, supported often by dynamic Bayesian networks, evaluate a set of pre-defined heuristic decision rules to reasonably simplify the decision problem. However, the resulting policies may be compromised by the limited space considered in the definition of the decision rules. Avoiding this limitation, Partially Observable Markov Decision Processes (POMDPs) provide a principled mathematical methodology for stochastic optimal control under uncertain action outcomes and observations, in which the optimal actions are prescribed as a function of the entire, dynamically updated, state probability distribution. In this paper, we combine dynamic Bayesian networks with POMDPs in a joint framework for optimal inspection and maintenance planning, and we provide the formulation for developing both infinite and finite horizon POMDPs in a structural reliability context. The proposed methodology is implemented and tested for the case of a structural component subject to fatigue deterioration, demonstrating the capability of state-of-the-art point-based POMDP solvers for solving the underlying planning optimization problem. Within the numerical experiments, POMDP and heuristic-based policies are thoroughly compared, and results showcase that POMDPs achieve substantially lower costs as compared to their counterparts, even for traditional problem settings.

pomdp, renewable energy, upstream oil & gas, (19 more...)

2009.04547

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (1.00)

Industry:

Energy > Renewable > Wind (0.48)
Materials > Construction Materials (0.46)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Schamberg, Gabe, Badgeley, Marcus, Brown, Emery N.

Controlling Level of Unconsciousness by Titrating Propofol with Deep Reinforcement Learning

arXiv.org Artificial IntelligenceSep-9-2020

Reinforcement Learning (RL) can be used to fit a mapping from patient state to a medication regimen. Prior studies have used deterministic and value-based tabular learning to learn a propofol dose from an observed anesthetic state. Deep RL replaces the table with a deep neural network and has been used to learn medication regimens from registry databases. Here we perform the first application of deep RL to closed-loop control of anesthetic dosing in a simulated environment. We use the cross-entropy method to train a deep neural network to map an observed anesthetic state to a probability of infusing a fixed propofol dosage. During testing, we implement a deterministic policy that transforms the probability of infusion to a continuous infusion rate. The model is trained and tested on simulated pharmacokinetic/pharmacodynamic models with randomized parameters to ensure robustness to patient variability. The deep RL agent significantly outperformed a proportional-integral-derivative controller (median absolute performance error 1.7% +/- 0.6 and 3.4% +/- 1.2). Modeling continuous input variables instead of a table affords more robust pattern recognition and utilizes our prior domain knowledge. Deep RL learned a smooth policy with a natural interpretation to data scientists and anesthesia care providers alike.

controller, machine learning, reinforcement learning, (18 more...)

2008.12333

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Health Care Providers & Services (0.68)
Health & Medicine > Surgery (0.52)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Talebi, Mohammad Sadegh, Jonsson, Anders, Maillard, Odalric-Ambrym

Improved Exploration in Factored Average-Reward MDPs

arXiv.org Machine LearningSep-9-2020

We consider a regret minimization task under the average-reward criterion in an unknown Factored Markov Decision Process (FMDP). More specifically, we consider an FMDP where the state-action space $\mathcal X$ and the state-space $\mathcal S$ admit the respective factored forms of $\mathcal X = \otimes_{i=1}^n \mathcal X_i$ and $\mathcal S=\otimes_{i=1}^m \mathcal S_i$, and the transition and reward functions are factored over $\mathcal X$ and $\mathcal S$. Assuming known factorization structure, we introduce a novel regret minimization strategy inspired by the popular UCRL2 strategy, called DBN-UCRL, which relies on Bernstein-type confidence sets defined for individual elements of the transition function. We show that for a generic factorization structure, DBN-UCRL achieves a regret bound, whose leading term strictly improves over existing regret bounds in terms of the dependencies on the size of $\mathcal S_i$'s and the involved diameter-related terms. We further show that when the factorization structure corresponds to the Cartesian product of some base MDPs, the regret of DBN-UCRL is upper bounded by the sum of regret of the base MDPs. We demonstrate, through numerical experiments on standard environments, that DBN-UCRL enjoys a substantially improved regret empirically over existing algorithms.

algorithm, dbn-ucrl, mdp, (17 more...)

2009.04575

Country:

Europe > France (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

arXiv.org Machine LearningSep-9-2020

Non-convex Learning via Replica Exchange Stochastic Gradient MCMC

Deng, Wei, Feng, Qi, Gao, Liyao, Liang, Faming, Lin, Guang

Replica exchange Monte Carlo (reMC), also known as parallel tempering, is an important technique for accelerating the convergence of the conventional Markov Chain Monte Carlo (MCMC) algorithms. However, such a method requires the evaluation of the energy function based on the full dataset and is not scalable to big data. The na\"ive implementation of reMC in mini-batch settings introduces large biases, which cannot be directly extended to the stochastic gradient MCMC (SGMCMC), the standard sampling method for simulating from deep neural networks (DNNs). In this paper, we propose an adaptive replica exchange SGMCMC (reSGMCMC) to automatically correct the bias and study the corresponding properties. The analysis implies an acceleration-accuracy trade-off in the numerical discretization of a Markov jump process in a stochastic environment. Empirically, we test the algorithm through extensive experiments on various setups and obtain the state-of-the-art results on CIFAR10, CIFAR100, and SVHN in both supervised learning and semi-supervised learning tasks.

artificial intelligence, machine learning, non-convex learning, (14 more...)

2008.05367

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

#artificialintelligenceSep-8-2020, 18:17:03 GMT

Markov chain - Wikipedia

A Markov chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event.[1][2][3] A countably infinite sequence, in which the chain moves state at discrete time steps, gives a discrete-time Markov chain (DTMC). A continuous-time process is called a continuous-time Markov chain (CTMC). It is named after the Russian mathematician Andrey Markov. Markov chains have many applications as statistical models of real-world processes,[1][4][5][6] such as studying cruise control systems in motor vehicles, queues or lines of customers arriving at an airport, currency exchange rates and animal population dynamics.[7] Markov processes are the basis for general stochastic simulation methods known as Markov chain Monte Carlo, which are used for simulating sampling from complex probability distributions, and have found application in Bayesian statistics and artificial intelligence.[7][8][9] The adjective Markovian is used to describe something that is related to a Markov process.[1][10] A Markov process is a stochastic process that satisfies the Markov property[1] (sometimes characterized as "memorylessness"). In simpler terms, it is a process for which predictions can be made regarding future outcomes based solely on its present state and--most importantly--such predictions are just as good as the ones that could be made knowing the process's full history.[11] In other words, conditional on the present state of the system, its future and past states are independent. A Markov chain is a type of Markov process that has either a discrete state space or a discrete index set (often representing time), but the precise definition of a Markov chain varies.[12]

artificial intelligence, machine learning, markov chain, (18 more...)

#artificialintelligence

Country: Asia > Middle East > Jordan (0.04)

Industry:

Banking & Finance (1.00)
Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Furelos-Blanco, Daniel, Law, Mark, Jonsson, Anders, Broda, Krysia, Russo, Alessandra

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

arXiv.org Artificial IntelligenceSep-8-2020

In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task's subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding. A state-of-the-art inductive logic programming system is used to learn a subgoal automaton that covers the traces of high-level events observed by the RL agent. When the currently exploited automaton does not correctly recognize a trace, the automaton learner induces a new automaton that covers that trace. The interleaving process guarantees the induction of automata with the minimum number of states, and applies a symmetry breaking mechanism to shrink the search space whilst remaining complete. We evaluate ISA in several grid-world and continuous state space problems using different RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning process performance in terms of the traces, the symmetric breaking and specific restrictions imposed on the final learnable automaton. For each class of RL problem, we show that the learned automata can be successfully exploited to learn policies that reach the goal, achieving an average reward comparable to the case where automata are not learned but handcrafted and given beforehand.

logic & formal reasoning, machine learning, reinforcement learning, (18 more...)

2009.03855

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Spain (0.04)

Genre:

Research Report (0.50)
Workflow (0.45)

Industry: Leisure & Entertainment (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

De Bois, Maxime, Amroun, Hamdi, Ammi, Mehdi

Energy Expenditure Estimation Through Daily Activity Recognition Using a Smart-phone

arXiv.org Artificial IntelligenceSep-8-2020

This paper presents a 3-step system that estimates the real-time energy expenditure of an individual in a non-intrusive way. First, using the user's smart-phone's sensors, we build a Decision Tree model to recognize his physical activity (\textit{running}, \textit{standing}, ...). Then, we use the detected physical activity, the time and the user's speed to infer his daily activity (\textit{watching TV}, \textit{going to the bathroom}, ...) through the use of a reinforcement learning environment, the Partially Observable Markov Decision Process framework. Once the daily activities are recognized, we translate this information into energy expenditure using the compendium of physical activities. By successfully detecting 8 physical activities at 90\%, we reached an overall accuracy of 80\% in recognizing 17 different daily activities. This result leads us to estimate the energy expenditure of the user with a mean error of 26\% of the expected estimation.

artificial intelligence, machine learning, physical activity, (15 more...)

doi: 10.1109/WF-IoT.2018.8355097

2009.03681

Country: Europe > France > Île-de-France > Paris > Paris (0.05)

Genre:

Research Report (0.65)
Workflow (0.46)

Industry:

Health & Medicine > Consumer Health (1.00)
Education > Health & Safety > School Nutrition (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

van der Himst, Otto, Lanillos, Pablo

Deep Active Inference for Partially Observable MDPs

arXiv.org Artificial IntelligenceSep-8-2020

Deep active inference has been proposed as a scalable approach to perception and action that deals with large policy and state spaces. However, current models are limited to fully observable domains. In this paper, we describe a deep active inference model that can learn successful policies directly from high-dimensional sensory inputs. The deep learning architecture optimizes a variant of the expected free energy and encodes the continuous state representation by means of a variational autoencoder. We show, in the OpenAI benchmark, that our approach has comparable or better performance than deep Q-learning, a state-of-the-art deep reinforcement learning algorithm.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2009.03622

Country: Europe > Netherlands > Gelderland > Nijmegen (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

De Simone, Andrea, Morandini, Alessandro

Nonparametric Density Estimation from Markov Chains

arXiv.org Machine LearningSep-8-2020

We introduce a new nonparametric density estimator inspired by Markov Chains, and generalizing the well-known Kernel Density Estimator (KDE). Our estimator presents several benefits with respect to the usual ones and can be used straightforwardly as a foundation in all density-based algorithms. We prove the consistency of our estimator and we find it typically outperforms KDE in situations of large sample size and high dimensionality. We also employ our density estimator to build a local outlier detector, showing very promising results when applied to some realistic datasets.

artificial intelligence, data mining, machine learning, (20 more...)

2009.03937

Country:

Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.71)