Notes
–Neural Information Processing Systems
In practice, it is more convenient to maximize the expected sum over t in a sequence drawn uniformly from the set of sequences in the training dataset. This scales the objective up by the average sequence length, preserving the property that longer sequences have more weight. While this paper's speedup over the MLE objective (2) comes from avoiding the integral, an alternative would be to estimate the integral more efficiently. One might try randomized adaptive quadrature (Baran et al., 2008) modified for our discontinuous intensity functions and GPU hardware; or importance sampling of (t, k) pairs where the proposal distribution is roughly proportional to λ A density must be integrated to yield a probability. This is not essential to the NCE approach, since in principle the M + 1 elements of the bag could all be drawn from different distributions.
Neural Information Processing Systems
Jun-2-2025, 12:01:39 GMT
- Technology: