Goto

Collaborating Authors

 qt 1



Optimizing Data Collection for Machine Learning

Neural Information Processing Systems

For eachDk subsets, respectively, we follow the same subsampling procedure used in the singlevariate case. That is, we letq10 = 10% of the first data subset andq20 = 10% of the second data subset.


ProductRankingforRevenueMaximizationwith MultiplePurchases

Neural Information Processing Systems

Online retailing has become increasingly popular over the last decades [17, 28, 52]. The way of product ranking is the crux for online retailers because it determines the consumers' shopping behaviors [17] and thus influences the retailers' revenue [20, 49]. For instance, the probability of consumers' purchasing from a firm or clicking an advertisement is strongly related to the display order[8,3,33].


Supplementto" Sample-EfficientReinforcement LearningforLinearly-ParameterizedMDPs withaGenerativeModel "

Neural Information Processing Systems

In addition, we define1 to be a vector with all the entries being 1, andI be the identity matrix. Suppose thatδ > 0andε (0,(1 γ) 1/2]. The remainder of this section is devotedtoprovingTheorem3. VT) to be the policy (resp. The remainder of this section is devotedtoprovingTheorem4.



40bb79c081828bebdc39d65a82367246-Supplemental-Conference.pdf

Neural Information Processing Systems

Table1: Linearnetwork Layer# Name Layer Inshape Outshape 1 Flatten() (3,32,32) 3072 2 fc1 nn.Linear(3072, 200) 3072 200 3 fc2 nn.Linear(200, 1) 200 1 Fully-connected Network We conduct further experiments on several different fully-connected networks with 4 hidden layers with various activation functions. Our subset is smaller because of the computation limitation when calculating the Gram matrix. Experiments show that the properties along GD trajectory(e.g. We consider simple linear networks, fully-connected networks, convolutional networks in this appendix. The following Figure 4 illustrates the positive correlation between thesharpness andtheA-norm, andtherelationship between theloss D(t) 2 and R(t) 2 alongthetrajectory.


OptimizingoverMultipleDistributionsunder Generalized Quasar-ConvexityCondition

Neural Information Processing Systems

When f is convex with respect tox, many efficient algorithms can be powerful tools for solving Problem(1). One well-known algorithm is mirror descent (MD) [5]which is based on Bregman divergence.


1325cdae3b6f0f91a1b629307bf2d498-Supplemental.pdf

Neural Information Processing Systems

C.1 Datasetdescription For WMT'16 English-German experiment, we used the same preprocessed data provided by [31] 1, including the samevalidation(neewsteest2013)andtest (neewsteest2014) splits. The data volume for train, validation and test splits are 4500966, 3000, 3003 sentence pairs respectively. When using LayerDrop we use 50% dropout probability. Similarly,we use beam search with beam size 5and length penalty 1.0 for decoding. First, we show that adding the auxiliary lossLK discretizes the samples and achieve the pruning purpose byenforcingsparsity oftheresulting model.


Discriminative Viewer Identification using Generative Models of Eye Gaze

Makowski, Silvia, Jäger, Lena A., Schwetlick, Lisa, Trukenbrod, Hans, Engbert, Ralf, Scheffer, Tobias

arXiv.org Machine Learning

We study the problem of identifying viewers of arbitrary images based on their eye gaze. Psychological research has derived generative stochastic models of eye movements. In order to exploit this background knowledge within a discriminatively trained classification model, we derive Fisher kernels from different generative models of eye gaze. Experimentally, we find that the performance of the classifier strongly depends on the underlying generative model. Using an SVM with Fisher kernel improves the classification performance over the underlying generative model.


Demand forecasting techniques for build-to-order lean manufacturing supply chains

Rivera-Castro, Rodrigo, Nazarov, Ivan, Xiang, Yuke, Pletneev, Alexander, Maksimov, Ivan, Burnaev, Evgeny

arXiv.org Machine Learning

Build-to-order (BTO) supply chains have become common-place in industries such as electronics, automotive and fashion. They enable building products based on individual requirements with a short lead time and minimum inventory and production costs. Due to their nature, they differ significantly from traditional supply chains. However, there have not been studies dedicated to demand forecasting methods for this type of setting. This work makes two contributions. First, it presents a new and unique data set from a manufacturer in the BTO sector. Second, it proposes a novel data transformation technique for demand forecasting of BTO products. Results from thirteen forecasting methods show that the approach compares well to the state-of-the-art while being easy to implement and to explain to decision-makers.