Goto

Collaborating Authors

 Markov Models


67496dfa96afddab795530cc7c69b57a-Supplemental-Conference.pdf

Neural Information Processing Systems

Theoptimalbaseline, however, israrelyusedinpractice (Sutton & Barto (2018); foran exception, see (Peters & Schaal, 2008)). Equation (1) thentakesthefollowingform: r E R(x)= E (R(x) B)r log (x).



Outline

Neural Information Processing Systems

We first prove the direction that efficiency ordering implies Loewner ordering. Next we want to showlimt (I ฮณA)t = 0. Since we assume0 < ฮณ < 2/ A 2, we have I ฮณA 2 = maxi=1,2,,n|1 ฮณฮปi(A)| < 1, where ฮปi(A) > 0 is thei-the eigenvalue of the positivedefinite matrixA. For the original functionG: Rd V Rd, we define another functionฮฆ: Rd E Rd such thatฮฆ(ฮธ,eij) = G(ฮธ,j). This is true for periodic Markov chain, and is shown in the following lemma. Due to its random nature across each epoch, random shuffling is not a Markov chain on state space[n].




Scaling up Continuous-Time Markov Chains Helps Resolve Underspecification

Neural Information Processing Systems

Modeling the time evolution of discrete sets of items (e.g., genetic mutations) is a fundamental problem in many biomedical applications. We approach this problem through the lens of continuous-time Markov chains, and show that the resulting learning task is generally underspecified in the usual setting of cross-sectional data. We explore a perhaps surprising remedy: including a number of additional independent items can help determine time order, and hence resolve underspecifi-cation. This is in sharp contrast to the common practice of limiting the analysis to a small subset of relevant items, which is followed largely due to poor scaling of existing methods. To put our theoretical insight into practice, we develop an approximate likelihood maximization method for learning continuous-time Markov chains, which can scale to hundreds of items and is orders of magnitude faster than previous methods. We demonstrate the effectiveness of our approach on synthetic and real cancer data.


EmergentComplexityandZero-shotTransfervia UnsupervisedEnvironmentDesign

Neural Information Processing Systems

Awide range ofreinforcement learning (RL) problems --including robustness, transfer learning, unsupervised RL, and emergent complexity -- require specifying a distribution of tasks or environments in which a policy will be trained.