Goto

Collaborating Authors

 ylx


Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Neural Information Processing Systems

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.


Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Neural Information Processing Systems

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.


Efficient Resources Allocation for Markov Decision Processes

Munos, Rémi

Neural Information Processing Systems

Assume that we model a complex decision-making problem under uncertainty by a finite MDP. Because of the limited resources used, the parameters of the MDP (transition probabilities and rewards) are uncertain: we assume that we only know a belief state over their possible values. IT we select the most probable values of the parameters, we can build a MDP and solve it to deduce the corresponding optimal policy. However, because of the uncertainty over the true parameters, this policy may not be the one that maximizes the expected cumulative rewards of the true (but partially unknown) decision-making problem. We can nevertheless use sampling techniques to estimate the expected loss of using this policy.


A Gradient-Based Boosting Algorithm for Regression Problems

Zemel, Richard S., Pitassi, Toniann

Neural Information Processing Systems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typically binary-valued.The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1), ..., (xn, Yn)}, drawn randomly from X x Yaccording todistribution D; it outputs a hypothesis I:


A Gradient-Based Boosting Algorithm for Regression Problems

Zemel, Richard S., Pitassi, Toniann

Neural Information Processing Systems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typically binary-valued. The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1),..., (xn, Yn)}, drawn randomly from X x Yaccording to distribution D; it outputs a hypothesis I:


A Gradient-Based Boosting Algorithm for Regression Problems

Zemel, Richard S., Pitassi, Toniann

Neural Information Processing Systems

Adaptive boosting methods are simple modular algorithms that operate as follows. Let 9: X -t Y be the function to be learned, where the label set Y is finite, typically binary-valued. The algorithm uses a learning procedure, which has access to n training examples, {(Xl, Y1),..., (xn, Yn)}, drawn randomly from X x Yaccording to distribution D; it outputs a hypothesis I:


Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm

Jebara, Tony, Pentland, Alex

Neural Information Processing Systems

Advantages in feature selection, robustness andlimited resource allocation have been studied. Ultimately, tasks such as regression and classification reduce to the evaluation of a conditional density. However, popularity of maximumjoint likelihood and EM techniques remains strong in part due to their elegance and convergence properties. Thus, many conditional problems are solved by first estimating joint models then conditioning them.


Convergence of the Wake-Sleep Algorithm

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Neural Information Processing Systems

The WS (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. Buteven for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding ofthe WS algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the WS algorithm for the factor analysis model. We also show the condition for the convergence in general models.


Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm

Jebara, Tony, Pentland, Alex

Neural Information Processing Systems

We present the CEM (Conditional Expectation Maximi::ation) algorithm as an extension of the EM (Expectation M aximi::ation) algorithm to conditional density estimation under missing data. A bounding and maximization process is given to specifically optimize conditional likelihood instead of the usual joint likelihood. We apply the method to conditioned mixture models and use bounding techniques to derive the model's update rules. Monotonic convergence, computational efficiency and regression results superior to EM are demonstrated.


Convergence of the Wake-Sleep Algorithm

Ikeda, Shiro, Amari, Shun-ichi, Nakahara, Hiroyuki

Neural Information Processing Systems

The W-S (Wake-Sleep) algorithm is a simple learning rule for the models with hidden variables. It is shown that this algorithm can be applied to a factor analysis model which is a linear version of the Helmholtz machine. But even for a factor analysis model, the general convergence is not proved theoretically. In this article, we describe the geometrical understanding of the W-S algorithm in contrast with the EM (Expectation Maximization) algorithm and the em algorithm. As the result, we prove the convergence of the W-S algorithm for the factor analysis model. We also show the condition for the convergence in general models.