Importance of Hyper-parameters in Model development


Machine Learning (ML) development is an iterative process in which the accuracy of predictions made by the models is continuously improved by repeating the training and evaluation phases. In each of these iterations, certain parameters are tweaked continuously by developers. Any parameter manually selected based on learning from previous experiments qualify to be called a model hyper-parameter. These parameters represent intuitive decisions whose value cannot be estimated from data or from ML theory. The hyper-parameters are knobs that you tweak during each iteration of training a model to improve the accuracy in the predictions made by the model.

Will SD-WAN Solve IoT's Toughest Questions? - SDxCentral


IoT and SD-WAN might not sound like they belong together, but ask VMware's VeloCloud or managed service provider Apcela and you might be surprised by what they have to say. The two companies see SD-WAN as the key to making large IoT deployments manageable at a human scale. Sanjay Uppal, who co-founded VeloCloud and now serves as the head of VMware's SD-WAN division, said the expanding scope of SD-WAN has opened the door to several applications that the technology wouldn't normally be associated with, and IoT is one of them. "You think of IoT, it's not just IoT running on a cellular network or IoT running on Bluetooth, you could absolutely run IoT on your enterprise SD-WAN," Uppal said in an earlier interview. "Just think of that IoT traffic as a new data type that you will steer across the WAN and you can add services to it as it is steered."

Image Parsing with Stochastic Scene Grammar

Neural Information Processing Systems

In contrast to previous scene labeling work that applied discriminative classifiers to pixels (or super-pixels), we use a generative Stochastic Scene Grammar (SSG). This grammar represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations. Production rules: (i) AND rules represent the decomposition of an entity into sub-parts; (ii) OR rules represent the switching among sub-types of an entity; (iii) SET rules rep- resent an ensemble of visual entities. Contextual relations: (i) Cooperative " " relations represent positive links between binding entities, such as hinged faces of a object or aligned boxes; (ii) Competitive "-" relations represents negative links between competing entities, such as mutually exclusive boxes. We design an efficient MCMC inference algorithm, namely Hierarchical cluster sampling, to search in the large solution space of scene configurations. The algorithm has two stages: (i) Clustering: It forms all possible higher-level structures (clusters) from lower-level entities by production rules and contextual relations.

Goal-conditioned Imitation Learning

Neural Information Processing Systems

Designing rewards for Reinforcement Learning (RL) is challenging because it needs to convey the desired task, be efficient to optimize, and be easy to compute. The latter is particularly problematic when applying RL to robotics, where detecting whether the desired configuration is reached might require considerable supervision and instrumentation. Furthermore, we are often interested in being able to reach a wide range of configurations, hence setting up a different reward every time might be unpractical. Methods like Hindsight Experience Replay (HER) have recently shown promise to learn policies able to reach many goals, without the need of a reward. Unfortunately, without tricks like resetting to points along the trajectory, HER might require many samples to discover how to reach certain areas of the state-space.

Weakly-supervised Discovery of Visual Pattern Configurations

Neural Information Processing Systems

The prominence of weakly labeled data gives rise to a growing demand for object detection methods that can cope with minimal supervision. We propose an approach that automatically identifies discriminative configurations of visual patterns that are characteristic of a given object class. We formulate the problem as a constrained submodular optimization problem and demonstrate the benefits of the discovered configurations in remedying mislocalizations and finding informative positive and negative training examples. Papers published at the Neural Information Processing Systems Conference.

One ticket to win them all: generalizing lottery ticket initializations across datasets and optimizers

Neural Information Processing Systems

The success of lottery ticket initializations (Frankle and Carbin, 2019) suggests that small, sparsified networks can be trained so long as the network is initialized appropriately. Unfortunately, finding these "winning ticket'' initializations is computationally expensive. One potential solution is to reuse the same winning tickets across a variety of datasets and optimizers. However, the generality of winning ticket initializations remains unclear. Here, we attempt to answer this question by generating winning tickets for one training configuration (optimizer and dataset) and evaluating their performance on another configuration.

Near-Optimal Edge Evaluation in Explicit Generalized Binomial Graphs

Neural Information Processing Systems

Robotic motion-planning problems, such as a UAV flying fast in a partially-known environment or a robot arm moving around cluttered objects, require finding collision-free paths quickly. Typically, this is solved by constructing a graph, where vertices represent robot configurations and edges represent potentially valid movements of the robot between theses configurations. The main computational bottlenecks are expensive edge evaluations to check for collisions. State of the art planning methods do not reason about the optimal sequence of edges to evaluate in order to find a collision free path quickly. In this paper, we do so by drawing a novel equivalence between motion planning and the Bayesian active learning paradigm of decision region determination (DRD).

Offline Contextual Bayesian Optimization

Neural Information Processing Systems

In black-box optimization, an agent repeatedly chooses a configuration to test, so as to find an optimal configuration. In many practical problems of interest, one would like to optimize several systems, or tasks'', simultaneously; however, in most of these scenarios the current task is determined by nature. In this work, we explore the offline'' case in which one is able to bypass nature and choose the next task to evaluate (e.g. via a simulator). Because some tasks may be easier to optimize and others may be more critical, it is crucial to leverage algorithms that not only consider which configurations to try next, but also which tasks to make evaluations for. In this work, we describe a theoretically grounded Bayesian optimization method to tackle this problem.

Analyze sentiment using the ML.NET CLI - ML.NET


In this particular case, in only 10 seconds and with the small dataset provided, the CLI tool was able to run quite a few iterations, meaning training multiple times based on different combinations of algorithms/configuration with different internal data transformations and algorithm's hyper-parameters. Finally, the "best quality" model found in 10 seconds is a model using a particular trainer/algorithm with any specific configuration. Depending on the exploration time, the command can produce a different result. The selection is based on the multiple metrics shown, such as Accuracy. The first and easiest metric to evaluate a binary-classification model is the accuracy, which is simple to understand.

A Comprehensive and Modularized Statistical Framework for Gradient Norm Equality in Deep Neural Networks Machine Learning

In recent years, plenty of metrics have been proposed to identify networks that are free of gradient explosion and vanishing. However, due to the diversity of network components and complex serial-parallel hybrid connections in modern DNNs, the evaluation of existing metrics usually requires strong assumptions, complex statistical analysis, or has limited application fields, which constraints their spread in the community. In this paper, inspired by the Gradient Norm Equality and dynamical isometry, we first propose a novel metric called Block Dynamical Isometry, which measures the change of gradient norm in individual block. Because our Block Dynamical Isometry is norm-based, its evaluation needs weaker assumptions compared with the original dynamical isometry. To mitigate the challenging derivation, we propose a highly modularized statistical framework based on free probability. Our framework includes several key theorems to handle complex serial-parallel hybrid connections and a library to cover the diversity of network components. Besides, several sufficient prerequisites are provided. Powered by our metric and framework, we analyze extensive initialization, normalization, and network structures. We find that Gradient Norm Equality is a universal philosophy behind them. Then, we improve some existing methods based on our analysis, including an activation function selection strategy for initialization techniques, a new configuration for weight normalization, and a depth-aware way to derive coefficients in SeLU. Moreover, we propose a novel normalization technique named second moment normalization, which is theoretically 30% faster than batch normalization without accuracy loss. Last but not least, our conclusions and methods are evidenced by extensive experiments on multiple models over CIFAR10 and ImageNet.