Goto

Collaborating Authors

 enull 1



Appendix for Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen

Neural Information Processing Systems

Finally, we describe the training setup for our back-translation experiments. We continue to differentiate our method from other existing works. Our method does not train multiple peer models with EM training either. In each round, a forward (or backward) model takes turn to play the "back-translation" role to train The role is switched in the next round. In other words, source and target are identical.





Stochastic Optimal Control Matching

Neural Information Processing Systems

Stochastic optimal control, which has the goal of driving the behavior of noisy systems, is broadly applicable in science, engineering and artificial intelligence. Our work introduces Stochastic Optimal Control Matching (SOCM), a novel Iterative Diffusion Optimization (IDO) technique for stochastic optimal control that stems from the same philosophy as the conditional score matching loss for diffusion models.


Appendix for Data Diversification: A Simple Strategy For Neural Machine Translation Xuan-Phi Nguyen

Neural Information Processing Systems

Finally, we describe the training setup for our back-translation experiments. We continue to differentiate our method from other existing works. Our method does not train multiple peer models with EM training either. In each round, a forward (or backward) model takes turn to play the "back-translation" role to train The role is switched in the next round. In other words, source and target are identical.



A Proofs

Neural Information Processing Systems

To prove the main results, we need the following Lemma and Propositions 1 and 2. The Lemma is a Continuing from Eq.(19), we have: n E null 1 n A.4 Time complexity of gradient calculation in ML-CPC Suppose g is a neural network parametrized by θ, then the gradient to the ML-CPC objective is So the time complexity to compute the ML-CPC gradient is O ( nm). We include a PyTorch implementation to α -ML-CPC as follows. Alternatively, one can use kl_div() to ensure that the loss is non-negative. C.2 Mutual information estimation The general procedure follows that in [40] and [44]. We consider two types of architectures - joint and separable .


A Statistical Boosting via Improper Game Playing

Neural Information Processing Systems

In this section we first give a game-theoretic perspective of our method when applied to the statistical setting (Subsection A.1). A formal description is provided in Algorithm 4. Algorithm 4 Boosting with OCO It is described in this section. Player A's goal is to minimize the payoff, while player B's goal is to maximize it. There are several ways to circumvent it. If players A and B play according to Algorithm 5, then player B's average strategy's depend on the sequence of q's, they are also random variables, as well as p.