Attention on Attention: Architectures for Visual Question Answering (VQA)

arXiv.org Artificial Intelligence

Visual Question Answering (VQA) is an increasingly popular topic in deep learning research, requiring coordination of natural language processing and computer vision modules into a single architecture. We build upon the model which placed first in the VQA Challenge by developing thirteen new attention mechanisms and introducing a simplified classifier. We performed 300 GPU hours of extensive hyperparameter and architecture searches and were able to achieve an evaluation score of 64.78%, outperforming the existing state-of-the-art single model's validation score of 63.15%.

Let's evolve a neural network with a genetic algorithm--code included


Building the perfect deep learning network involves a hefty amount of art to accompany sound science. One way to go about finding the right hyperparameters is through brute force trial and error: Try every combination of sensible parameters, send them to your Spark cluster, go about your daily jive, and come back when you have an answer. But there's gotta be a better way! Here, we try to improve upon the brute force method by applying a genetic algorithm to evolve a network with the goal of achieving optimal hyperparameters in a fraction the time of a brute force search. Let's say it takes five minutes to train and evaluate a network on your dataset.

How to train and deploy deep learning at scale


In five lines, you can describe how your architecture looks and then you can also specify what algorithms you want to use for training. There are a lot of other systems challenges associated with actually going end to end, from data to a deployed model. The existing software solutions don't really tackle a big set of these challenges. For example, regardless of the software you're using, it takes days to weeks to train a deep learning model. There's real open challenges of how to best use parallel and distributed computing both to train a particular model and in the context of tuning hyperparameters of different models.

Caret Package - A Practical Guide to Machine Learning in R


Caret Package is a comprehensive framework for building machine learning models in R. In this tutorial, I explain nearly all the core features of the caret package and walk you through the step-by-step process of building predictive models. Be it a decision tree or xgboost, caret helps to find the optimal model in the shortest possible time. Caret nicely integrates all the activities associated with the model development in a streamlined workflow, for nearly every major ML algorithm available in R. Actually we will not just stop with the caret package but will also go a step ahead and see how to smartly ensemble predictions from multiple best models and possibly produce an even better prediction using caretEnsemble. Caret is short for Classification And REgression Training. With R having so many implementations of machine learning algorithms, spread across packages it may be challenging to keep track of which algorithm resides in which package. Sometimes the syntax and the way to implement the algorithm differ across packages combined with preprocessing and looking at the help page for the hyperparameters (parameters that define how the algorithm learns) can make building predictive models an involved task. Well, thanks to caret because no matter which package the algorithm resides, caret will remember that for you and may just prompt you to run install.package Later in this tutorial I will show how to see all the available ML algorithms supported by caret (it's a long list!) and what hyperparameters can be tuned.

A refresher on batch (re-)normalization – Luminovo – Medium


When the mini-batch mean (µB) and mini-batch standard deviation (σB) diverge from the mean and standard deviation over the entire training set too often, BatchNorm breaks. Remember that at inference time we use the moving averages of µB and σB (as an estimate of the statistics of the entire training set) to do the normalization step. Naturally, if your means and standard deviations during training and testing are different, so are your activations and you can't be surprised if your results are different (read worse), too. This can happen when your mini-batch samples are non-i.i.d.

Deploying AI to production: 12 tips from the trenches - SC5


I'm Max, and I work on applied AI here at SC5. As a consultancy, SC5 is expected to provide our clients with services that are not only well-designed and functional, but also capable of scaling and withstanding production load. An application isn't much good unless it works in the real world. Machine learning is, in many ways, a completely different beast than "traditional" software engineering. Machine learning solutions also need to be deployed to production to be of any use, and with that comes a special set of considerations.

The Random Forest Algorithm – Towards Data Science


Random Forest is a flexible, easy to use machine learning algorithm that produces, even without hyper-parameter tuning, a great result most of the time. It is also one of the most used algorithms, because it's simplicity and the fact that it can be used for both classification and regression tasks. In this post, you are going to learn, how the random forest algorithm works and several other important things about it.

AI and Deep Learning in 2017 – A Year in Review


The year is coming to an end. I did not write nearly as much as I had planned to. But I'm hoping to change that next year, with more tutorials around Reinforcement Learning, Evolution, and Bayesian Methods coming to WildML! And what better way to start than with a summary of all the amazing things that happened in 2017? Looking back through my Twitter history and the WildML newsletter, the following topics repeatedly came up. I'll inevitably miss some important milestones, so please let me know about it in the comments!