Goto

Collaborating Authors

 qt 1



Optimizing Data Collection for Machine Learning

Neural Information Processing Systems

For eachDk subsets, respectively, we follow the same subsampling procedure used in the singlevariate case. That is, we letq10 = 10% of the first data subset andq20 = 10% of the second data subset.


ProductRankingforRevenueMaximizationwith MultiplePurchases

Neural Information Processing Systems

Online retailing has become increasingly popular over the last decades [17, 28, 52]. The way of product ranking is the crux for online retailers because it determines the consumers' shopping behaviors [17] and thus influences the retailers' revenue [20, 49]. For instance, the probability of consumers' purchasing from a firm or clicking an advertisement is strongly related to the display order[8,3,33].


Supplementto" Sample-EfficientReinforcement LearningforLinearly-ParameterizedMDPs withaGenerativeModel "

Neural Information Processing Systems

In addition, we define1 to be a vector with all the entries being 1, andI be the identity matrix. Suppose thatδ > 0andε (0,(1 γ) 1/2]. The remainder of this section is devotedtoprovingTheorem3. VT) to be the policy (resp. The remainder of this section is devotedtoprovingTheorem4.



OptimizingoverMultipleDistributionsunder Generalized Quasar-ConvexityCondition

Neural Information Processing Systems

When f is convex with respect tox, many efficient algorithms can be powerful tools for solving Problem(1). One well-known algorithm is mirror descent (MD) [5]which is based on Bregman divergence.


1325cdae3b6f0f91a1b629307bf2d498-Supplemental.pdf

Neural Information Processing Systems

C.1 Datasetdescription For WMT'16 English-German experiment, we used the same preprocessed data provided by [31] 1, including the samevalidation(neewsteest2013)andtest (neewsteest2014) splits. The data volume for train, validation and test splits are 4500966, 3000, 3003 sentence pairs respectively. When using LayerDrop we use 50% dropout probability. Similarly,we use beam search with beam size 5and length penalty 1.0 for decoding. First, we show that adding the auxiliary lossLK discretizes the samples and achieve the pruning purpose byenforcingsparsity oftheresulting model.


Discriminative Viewer Identification using Generative Models of Eye Gaze

Makowski, Silvia, Jäger, Lena A., Schwetlick, Lisa, Trukenbrod, Hans, Engbert, Ralf, Scheffer, Tobias

arXiv.org Machine Learning

We study the problem of identifying viewers of arbitrary images based on their eye gaze. Psychological research has derived generative stochastic models of eye movements. In order to exploit this background knowledge within a discriminatively trained classification model, we derive Fisher kernels from different generative models of eye gaze. Experimentally, we find that the performance of the classifier strongly depends on the underlying generative model. Using an SVM with Fisher kernel improves the classification performance over the underlying generative model.


Demand forecasting techniques for build-to-order lean manufacturing supply chains

Rivera-Castro, Rodrigo, Nazarov, Ivan, Xiang, Yuke, Pletneev, Alexander, Maksimov, Ivan, Burnaev, Evgeny

arXiv.org Machine Learning

Build-to-order (BTO) supply chains have become common-place in industries such as electronics, automotive and fashion. They enable building products based on individual requirements with a short lead time and minimum inventory and production costs. Due to their nature, they differ significantly from traditional supply chains. However, there have not been studies dedicated to demand forecasting methods for this type of setting. This work makes two contributions. First, it presents a new and unique data set from a manufacturer in the BTO sector. Second, it proposes a novel data transformation technique for demand forecasting of BTO products. Results from thirteen forecasting methods show that the approach compares well to the state-of-the-art while being easy to implement and to explain to decision-makers.


Minimal model of permutation symmetry in unsupervised learning

Hou, Tianqi, Wong, K. Y. Michael, Huang, Haiping

arXiv.org Machine Learning

Permutation of any two hidden units yields invariant properties in typical deep generative neural networks. This permutation symmetry plays an important role in understanding the computation performance of a broad class of neural networks with two or more hidden units. However, a theoretical study of the permutation symmetry is still lacking. Here, we propose a minimal model with only two hidden units in a restricted Boltzmann machine, which aims to address how the permutation symmetry affects the critical learning data size at which the concept-formation (or spontaneous symmetry breaking in physics language) starts, and moreover semi-rigorously prove a conjecture that the critical data size is independent of the number of hidden units once this number is finite. Remarkably, we find that the embedded correlation between two receptive fields of hidden units reduces the critical data size. In particular, the weakly-correlated receptive fields have the benefit of significantly reducing the minimal data size that triggers the transition, given less noisy data. Inspired by the theory, we also propose an efficient fully-distributed algorithm to infer the receptive fields of hidden units. Overall, our results demonstrate that the permutation symmetry is an interesting property that affects the critical data size for computation performances of related learning algorithms. All these effects can be analytically probed based on the minimal model, providing theoretical insights towards understanding unsupervised learning in a more general context.