Results


Don't fall for the AI hype: Here are the ingredients you need to build an actual useful thing

#artificialintelligence

Artificial intelligence these days is sold as if it were a magic trick. Data is fed into a neural net – or black box – as a stream of jumbled numbers, and voilà! It comes out the other side completely transformed, like a rabbit pulled from a hat. That's possible in a lab, or even on a personal dev machine, with carefully cleaned and tuned data. However, it is takes a lot, an awful lot, of effort to scale machine-learning algorithms up to something resembling a multiuser service – something useful, in other words.


Questions To Ask When Moving Machine Learning From Practice to Production

#artificialintelligence

With growing interest in neural networks and deep learning, individuals and companies are claiming ever-increasing adoption rates of artificial intelligence into their daily workflows and product offerings. Coupled with breakneck speeds in AI-research, the new wave of popularity shows a lot of promise for solving some of the harder problems out there. That said, I feel that this field suffers from a gulf between appreciating these developments and subsequently deploying them to solve "real-world" tasks. A number of frameworks, tutorials and guides have popped up to democratize machine learning, but the steps that they prescribe often don't align with the fuzzier problems that need to be solved. This post is a collection of questions (with some (maybe even incorrect) answers) that are worth thinking about when applying machine learning in production.


Moving machine learning from practice to production

#artificialintelligence

With growing interest in neural networks and deep learning, individuals and companies are claiming ever-increasing adoption rates of artificial intelligence into their daily workflows and product offerings. Spending some time on planning your infrastructure, standardizing setup and defining workflows early-on can save valuable time with each additional model that you build. After building, training and deploying your models to production, the task is still not complete unless you have monitoring systems in place. Periodically saving production statistics (data samples, predicted results, outlier specifics) has proven invaluable in performing analytics (and error postmortems) over deployments.


Laplace noising versus simulated out of sample methods (cross frames)

#artificialintelligence

Please read on for my discussion of some of the limitations of the technique, and how we solve the problem for impact coding (also called "effects codes"), and a worked example in R.We define a nested model as any model where the results of a sub-model are used as inputs for a later model. And I now think such a theorem would actually have fairly unsatisfying statement as a one possible "bad real world data" situation violates the usual "no re-use" requirements of differential privacy; duplicated or related columns or variables break the Laplace noising technique. But library code needs to work in the limit (as you don't know ahead of time what users will throw at it) and there are a lot of mechanisms that do produce duplicate, near-duplicate, and related columns in data sources used for data science (one of the difference between data science and classical statistics is data science tends to apply machine learning techniques on very under-curated data sets). The results on our artificial "each column five times" data set are below: Notice that the Laplace noising technique test performances are significantly degraded (performance on held-out test usually being a better simulation of future model performance than performance on the training set).


Deploy Your Predictive Model To Production - Machine Learning Mastery

#artificialintelligence

Often the complexity a machine learning algorithms is in the model training, not in making predictions. I also strongly recommend gathering outlier and interesting cases from operations over time that produce unexpected results (or break the system). Like a ratchet, consider incrementally updating performance requirements as model performance improves. If you're interested in more information on operationalizing machine learning models check out the post: This is more on the Google-scale machine learning model deployment.


Data-First Machine Learning - insideBIGDATA

#artificialintelligence

In this special guest feature, Victor Amin, Data Scientist at SendGrid, advises that businesses implementing machine learning systems focus on data quality first and worry about algorithms later in order to ensure accuracy and reliability in production. At SendGrid, Victor builds machine learning models to predict engagement and detect abuse in a mailstream that handles over a billion emails per day. The training set (the data your machine learning system learns from) is the most important part of any machine learning system. Instead, build a system that samples production data, and have a mechanism for reliably labeling your sampled production data that isn't your machine learning model.


Amazon Joins Tech Giants in Open Sourcing a Key Machine Learning Tool

#artificialintelligence

"DSSTNE (pronounced "Destiny") is an open source software library for training and deploying deep neural networks using GPUs. Amazon engineers built DSSTNE to solve deep learning problems at Amazon's scale. DSSTNE is built for production deployment of real-world deep learning applications, emphasizing speed and scale over experimental flexibility. "Deep Scalable Sparse Tensor Network Engine, (DSSTNE), pronounced "Destiny", is an Amazon developed library for building Deep Learning (DL) machine learning (ML) models.