Goto

Collaborating Authors

 vc theory


We thank the reviewers for the valuable time they have invested during this difficult period to review the paper and

Neural Information Processing Systems

"The paper presents novel theoretical results that are highly relevant for the machine learning community" (Reviewer The results are impressive, non-trivial and interesting" (Reviewer 4). The remainder of this response mostly addresses suggestions and questions raised by Reviewers 1 and 3. Both Reviewers 1 and 3 ask us to elaborate on the differences between [JO19] and this paper. By contrast, this paper utilizes the distribution's structure, or even rough proximity to a structure, to identify a much Reviewer 1 writes, requires a "non-trivial" combination of VC theory and the filtering framework, allows us to remove Regarding Reviewer 1's specific question whether the technique also applies when the distributions underlying genuine We will add a similar explanation to the final version. This relation was explained in [JO19]. To enhance the reader's understanding of the context, in the final version of Please note that Section 3 of the paper starts by stating that "The current results extend several long lines of For the reader's benefit we will follow the reviewer's advice and We fully sympathize with the reviewer's desire to see more hard proofs in the We also note that Reviewer 4's response to question 2 seems We will try to accommodate Reviewer 1's request by including as much information Finally, Reviewer 3 asks about the time complexity of the paper's two efficient algorithms: learning piecewise Both algorithms have very reasonable complexities.


VC Theory for Inventory Policies

Xie, Yaqi, Ma, Will, Xin, Linwei

arXiv.org Machine Learning

Advances in computational power and AI have increased interest in reinforcement learning approaches to inventory management. This paper provides a theoretical foundation for these approaches and investigates the benefits of restricting to policy structures that are well-established by decades of inventory theory. In particular, we prove generalization guarantees for learning several well-known classes of inventory policies, including base-stock and (s, S) policies, by leveraging the celebrated Vapnik-Chervonenkis (VC) theory. We apply the concepts of the Pseudo-dimension and Fat-shattering dimension from VC theory to determine the generalizability of inventory policies, that is, the difference between an inventory policy's performance on training data and its expected performance on unseen data. We focus on a classical setting without contexts, but allow for an arbitrary distribution over demand sequences and do not make any assumptions such as independence over time. We corroborate our supervised learning results using numerical simulations. Managerially, our theory and simulations translate to the following insights. First, there is a principle of "learning less is more" in inventory management: depending on the amount of data available, it may be beneficial to restrict oneself to a simpler, albeit suboptimal, class of inventory policies to minimize overfitting errors. Second, the number of parameters in a policy class may not be the correct measure of overfitting error: in fact, the class of policies defined by T time-varying base-stock levels exhibits a generalization error comparable to that of the two-parameter (s, S) policy class. Finally, our research suggests situations in which it could be beneficial to incorporate the concepts of base-stock and inventory position into black-box learning machines, instead of having these machines directly learn the order quantity actions.


Multiclass Learning Approaches: A Theoretical Comparison with Implications

Neural Information Processing Systems

We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension d, and in particular from the class of halfspaces over \reals d . We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions.


Multiclass Learning Approaches: A Theoretical Comparison with Implications

Daniely, Amit, Sabato, Sivan, Shwartz, Shai S.

Neural Information Processing Systems

We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension $d$, and in particular from the class of halfspaces over $\reals d$. We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions.



Dynamically Adapting Kernels in Support Vector Machines

Cristianini, Nello, Campbell, Colin, Shawe-Taylor, John

Neural Information Processing Systems

The kernel-parameter is one of the few tunable parameters in Support Vector machines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm which can automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.


Dynamically Adapting Kernels in Support Vector Machines

Cristianini, Nello, Campbell, Colin, Shawe-Taylor, John

Neural Information Processing Systems

The kernel-parameter is one of the few tunable parameters in Support Vector machines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm which can automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.


Dynamically Adapting Kernels in Support Vector Machines

Cristianini, Nello, Campbell, Colin, Shawe-Taylor, John

Neural Information Processing Systems

The kernel-parameter is one of the few tunable parameters in Support Vectormachines, controlling the complexity of the resulting hypothesis. Its choice amounts to model selection and its value is usually found by means of a validation set. We present an algorithm whichcan automatically perform model selection with little additional computational cost and with no need of a validation set. In this procedure model selection and learning are not separate, but kernels are dynamically adjusted during the learning process to find the kernel parameter which provides the best possible upper bound on the generalisation error. Theoretical results motivating the approach and experimental results confirming its validity are presented.