Importance of Hyper-parameters in Model development


Machine Learning (ML) development is an iterative process in which the accuracy of predictions made by the models is continuously improved by repeating the training and evaluation phases. In each of these iterations, certain parameters are tweaked continuously by developers. Any parameter manually selected based on learning from previous experiments qualify to be called a model hyper-parameter. These parameters represent intuitive decisions whose value cannot be estimated from data or from ML theory. The hyper-parameters are knobs that you tweak during each iteration of training a model to improve the accuracy in the predictions made by the model.

Counting the Optimal Solutions in Graphical Models

Neural Information Processing Systems

We introduce #opt, a new inference task for graphical models which calls for counting the number of optimal solutions of the model. We describe a novel variable elimination based approach for solving this task, as well as a depth-first branch and bound algorithm that traverses the AND/OR search space of the model. The key feature of the proposed algorithms is that their complexity is exponential in the induced width of the model only. It does not depend on the actual number of optimal solutions. Our empirical evaluation on various benchmarks demonstrates the effectiveness of the proposed algorithms compared with existing depth-first and best-first search based approaches that enumerate explicitly the optimal solutions.

Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Neural Information Processing Systems

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error.

The Epoch-Greedy Algorithm for Multi-armed Bandits with Side Information

Neural Information Processing Systems

Epoch-Greedy has the following properties: No knowledge of a time horizon $T$ is necessary. The regret incurred by Epoch-Greedy is controlled by a sample complexity bound for a hypothesis class. Here $S$ is the complexity term in a sample complexity bound for standard supervised learning. Papers published at the Neural Information Processing Systems Conference.

Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

Neural Information Processing Systems

In this work we use Branch-and-Bound (BB) to efficiently detect objects with deformable part models. Instead of evaluating the classifier score exhaustively over image locations and scales, we use BB to focus on promising image locations. The core problem is to compute bounds that accommodate part deformations; for this we adapt the Dual Trees data structure to our problem. We evaluate our approach using Mixture-of-Deformable Part Models. We obtain exactly the same results but are 10-20 times faster on average.

Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

Neural Information Processing Systems

Regularization technique has become a principle tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases.

Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models

Neural Information Processing Systems

Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial.

Bootstrapping from Game Tree Search

Neural Information Processing Systems

In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuels checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function.

Variance Reduction in Monte-Carlo Tree Search

Neural Information Processing Systems

Monte-Carlo Tree Search (MCTS) has proven to be a powerful, generic planning technique for decision-making in single-agent and adversarial environments. The stochastic nature of the Monte-Carlo simulations introduces errors in the value estimates, both in terms of bias and variance. Whilst reducing bias (typically through the addition of domain knowledge) has been studied in the MCTS literature, comparatively little effort has focused on reducing variance. This is somewhat surprising, since variance reduction techniques are a well-studied area in classical statistics. In this paper, we examine the application of some standard techniques for variance reduction in MCTS, including common random numbers, antithetic variates and control variates.

Near-minimax recursive density estimation on the binary hypercube

Neural Information Processing Systems

This paper describes a recursive estimation procedure for multivariate binary densities using orthogonal expansions. For $d$ covariates, there are $2 d$ basis coefficients to estimate, which renders conventional approaches computationally prohibitive when $d$ is large. However, for a wide class of densities that satisfy a certain sparsity condition, our estimator runs in probabilistic polynomial time and adapts to the unknown sparsity of the underlying density in two key ways: (1) it attains near-minimax mean-squared error, and (2) the computational complexity is lower for sparser densities. Our method also allows for flexible control of the trade-off between mean-squared error and computational complexity. Papers published at the Neural Information Processing Systems Conference.