hyperparameter


There's No Such Thing As The Machine Learning Platform

#artificialintelligence

In the past few years, you might have noticed the increasing pace at which vendors are rolling out "platforms" that serve the AI ecosystem, namely addressing data science and machine learning (ML) needs. The "Data Science Platform" and "Machine Learning Platform" are at the front lines of the battle for the mind share and wallets of data scientists, ML project managers, and others that manage AI projects and initiatives. If you're a major technology vendor and you don't have some sort of big play in the AI space, then you risk rapidly becoming irrelevant. But what exactly are these platforms and why is there such an intense market share grab going on? The core of this insight is the realization that ML and data science projects are nothing like typical application or hardware development projects.


There's No Such Thing As The Machine Learning Platform

#artificialintelligence

AI is the vendor battlefield of the moment. If you're a major technology vendor and you don't have some sort of big play in the AI space, then you risk rapidly becoming irrelevant. In the past few years, you might have noticed the increasing pace at which vendors are rolling out "platforms" that serve the AI ecosystem, namely data science and ML communities. The "Data Science Platform" and "Machine Learning Platform" are at the front lines of the battle for the mind share and wallets of data scientists, ML project managers, and others that manage AI projects and initiatives. But what exactly are these platforms and why is there such an intense market share grab going on?


12 Steps to Applied AI

#artificialintelligence

For those who've been looking for a 12 step program to get rid of bad data habits, here's a handy applied machine learning and artificial intelligence project roadmap. Well, it should properly be 13 steps, so we'll start counting at zero to make it work. Check that you actually need ML/AI. Can you identify many small decisions you need help with? Has the non-ML/AI approach already been shown to be worthless?


Practical Bayesian Optimization of Machine Learning Algorithms

Neural Information Processing Systems

The use of machine learning algorithms frequently involves careful tuning of learning parameters and model hyperparameters. Unfortunately, this tuning is often a "black art" requiring expert experience, rules of thumb, or sometimes brute-force search. There is therefore great appeal for automatic approaches that can optimize the performance of any given learning algorithm to the problem at hand. In this work, we consider this problem through the framework of Bayesian optimization, in which a learning algorithm's generalization performance is modeled as a sample from a Gaussian process (GP). We show that certain choices for the nature of the GP, such as the type of kernel and the treatment of its hyperparameters, can play a crucial role in obtaining a good optimizer that can achieve expert-level performance.


Efficient multiple hyperparameter learning for log-linear models

Neural Information Processing Systems

Using multiple regularization hyperparameters is an effective method for managing model complexity in problems where input features have varying amounts of noise. While algorithms for choosing multiple hyperparameters are often used in neural networks and support vector machines, they are not common in structured prediction tasks, such as sequence labeling or parsing. In this paper, we consider the problem of learning regularization hyperparameters for log-linear models, a class of probabilistic models for structured prediction tasks which includes conditional random fields (CRFs). Using an implicit differentiation trick, we derive an efficient gradient-based method for learning Gaussian regularization priors with multiple hyperparameters. In both simulations and the real-world task of computational RNA secondary structure prediction, we find that multiple hyperparameter learning provides a significant boost in accuracy compared to models learned using only a single regularization hyperparameter.


Bayesian active learning with localized priors for fast receptive field characterization

Neural Information Processing Systems

Active learning can substantially improve the yield of neurophysiology experiments by adaptively selecting stimuli to probe a neuron's receptive field (RF) in real time. Bayesian active learning methods maintain a posterior distribution over the RF, and select stimuli to maximally reduce posterior entropy on each time step. However, existing methods tend to rely on simple Gaussian priors, and do not exploit uncertainty at the level of hyperparameters when determining an optimal stimulus. This uncertainty can play a substantial role in RF characterization, particularly when RFs are smooth, sparse, or local in space and time. In this paper, we describe a novel framework for active learning under hierarchical, conditionally Gaussian priors.


AutoPrune: Automatic Network Pruning by Regularizing Auxiliary Parameters

Neural Information Processing Systems

Reducing the model redundancy is an important task to deploy complex deep learning models to resource-limited or time-sensitive devices. Directly regularizing or modifying weight values makes pruning procedure less robust and sensitive to the choice of hyperparameters, and it also requires prior knowledge to tune different hyperparameters for different models. To build a better generalized and easy-to-use pruning method, we propose AutoPrune, which prunes the network through optimizing a set of trainable auxiliary parameters instead of original weights. The instability and noise during training on auxiliary parameters will not directly affect weight values, which makes pruning process more robust to noise and less sensitive to hyperparameters. Moreover, we design gradient update rules for auxiliary parameters to keep them consistent with pruning tasks.


Multi-Task Bayesian Optimization

Neural Information Processing Systems

Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up $k$-fold cross-validation.


Amazon SageMaker Autopilot – Automatically Create High-Quality Machine Learning Models With Full Control And Visibility Amazon Web Services

#artificialintelligence

Today, we're extremely happy to launch Amazon SageMaker Autopilot to automatically create the best classification and regression machine learning models, while allowing full control and visibility. In 1959, Arthur Samuel defined machine learning as the ability for computers to learn without being explicitly programmed. In practice, this means finding an algorithm than can extract patterns from an existing data set, and use these patterns to build a predictive model that will generalize well to new data. Since then, lots of machine learning algorithms have been invented, giving scientists and engineers plenty of options to choose from, and helping them build amazing applications. However, this abundance of algorithms also creates a difficulty: which one should you pick?


Hyperparameter Learning via Distributional Transfer

Neural Information Processing Systems

Bayesian optimisation is a popular technique for hyperparameter learning but typically requires initial exploration even in cases where similar prior tasks have been solved. We propose to transfer information across tasks using learnt representations of training datasets used in those tasks. Representations make use of the framework of distribution embeddings into reproducing kernel Hilbert spaces. The developed method has a faster convergence compared to existing baselines, in some cases requiring only a few evaluations of the target objective. Papers published at the Neural Information Processing Systems Conference.