Josh Willis, an engineer at Slack, spoke at our January MeetUp about testing machine learning models in production. (If you're interested in joining this MeetUp, sign up here.) Josh has worked as the Director of Data Science at Cloudera, he wrote the Java version of Google's AB testing framework, a...
Deploying models trained in your research environment is not always a simple task. Your research environment, your production programming language, and the interplay between them may affect the ease of introducing new statistical models in production. In this blog post, I'll demonstrate the complet...
I came across this interesting dataset called as SeattleWeather dataset and I decided to use it for my first post on Medium. In this tutorial, I will use this dataset to predict Rain in Seattle using a Linear SVM model. First, lets explore the dataset. In this example, I have not taken the date as a feature to predict the rain and hence eliminated the date column. The other three columns are equally important to predict the rain, precipitation, maximum and minimum temperature.
There are a number of machine learning models to choose from. We can use Linear Regression to predict a value, Logistic Regression to classify distinct outcomes, and Neural Networks to model non-linear behaviors. When we build these models, we always use a set of historical data to help our machine learning algorithms learn what is the relationship between a set of input features to a predicted output. But even if this model can accurately predict a value from historical data, how do we know it will work as well on new data? Or more plainly, how do we evaluate whether a machine learning model is actually "good"?
In this study, 96 303 head computed tomography (CT) reports were obtained. The linguistic complexity of these reports was compared with that of alternative corpora. Head CT reports were preprocessed, and machine-analyzable features were constructed by using bag-of-words (BOW), word embedding, and Latent Dirichlet allocation–based approaches. Ultimately, 1004 head CT reports were manually labeled for findings of interest by physicians, and a subset of these were deemed critical findings. Lasso logistic regression was used to train models for physician-assigned labels on 602 of 1004 head CT reports (60%) using the constructed features, and the performance of these models was validated on a held-out 402 of 1004 reports (40%).
For our API to work, we need to define the input, in our case the features of the test data. Good practice is to write the input parameter definition into you API Swagger UI, but the code would work without these annotations. We define the parameters by annotating them with name and description in our R-script using @parameter. For this purpose, I want to know the type and min/max values for each of my variables in the training data. Because categorical data has been converted to dummy variables and then scaled and centered, these values will all be numeric and between 0 and 1 in this example.
Kangaroo Kapital is the largest credit card company in Australia. Animals across the continent use Kangaroo Kapital credit cards to make all of their daily purchases, racking up points in the company's reward system. Since Australian animals have traditionally not worn much clothing, the challenges of carrying around cash are substantial. Only having to keep track of a single credit card is a big help for your average working wallaby. But, since no clothes means no pockets, even keeping track of one credit card can be problematic.
Kangaroo Kapital is the largest credit card company in Australia. Animals across the continent use Kangaroo Kapital credit cards to make all of their daily purchases, racking up points in the company's reward system. Since Australian animals have traditionally not worn much clothing, the challenges of carrying around cash are substantial. Only having to keep track of a single credit card is a big help for your average working wallaby. But since Australian animals have typically not worn much clothing, they still have a problem keeping track of even a single credit card.