You might be familiar with structured data, it is everywhere. Here i would like to focus on discussion on how we transform unstructured data to something data machine can process the data then to take inference. As the time goes by, people think how to handle unstructured like text, image, data satellite, audio, etc. Tthat might give you something useful to make decision in your business. In this case i take from kaggle competition named What's Cooking. The competition wants you to classify type of food based on its ingredients.
Microsoft today announced it has acquired Lobe, creator of a platform for building custom deep learning models using a visual interface that requires no code or technical understanding of AI. Lobe, a platform that can understand hand gestures, read handwriting, and hear music, will continue to develop as a standalone service, according to the company's website. Financial terms of the deal were not disclosed. People have only started to utilize the full potential of AI, Microsoft CTO Kevin Scott said today in a blog post announcing the acquisition. "This in large part is because AI development and building deep learning models are slow and complex processes even for experienced data scientists and developers.
When a machine learning model goes into production, it is very likely to be idle most of the time. There are a lot of use cases, where a model only needs to run inference when new data is available. If we do have such a use case and we deploy a model on a server, it will eagerly be checking for new data, only to be disappointed for most of its lifetime and meanwhile you pay for the live time of the server. Now the cloud era has arrived, we can deploy a model serverless. Meaning we only pay for the compute we need, and spinning up the resources when we need them.
The machine learning roadmap is filled with trial and error. Engineers and scientists, who are novices at the concept, will constantly tweak and alter their algorithms and models. During this process, challenges will arise, especially with handling data and determining the right model. When building a machine learning model, it's important to know that real-world data is imperfect, different types of data require different approaches and tools, and there will always be tradeoffs when determining the right model. The following systematic workflow walk through describes how to develop a trained model for a cell phone health monitoring app that tracks user activity throughout the day.
Thanks to a machine learning model built in-house at Center City nonprofit Benefits Data Trust, call-center staffers get extra insights while enrolling users onto the Supplemental Nutrition Assistance Program (SNAP). Deployed last week for Pennsylvania residents, the tool helps employees understand what level of assistance potential beneficiaries might need. BDT Director of Data Science Matt Stevens said the model is already in use, helping workers identify cases that may require a more hands-on approach through the application process, specifically for the documents submission phase. "This isn't making distinctions on how much benefits people can have access to, but rather what level guidance of support we should provide," Stevens said. The model, Stevens said, is in the initial phases of use and is expected to be deployed across all projects.
Enterprise-wide deployments of AI are constrained by the requirements of scaling any new system or technology: transparency, security, and the application's ability to work across many systems. But solving for these challenges is not enough. Every organization that develops or uses AI, or hosts or processes data, must do so in ways that allow them to rationalize the decisions or recommendations in a way that is easily consumable. Much like an impressionable child, new technologies like AI are prone to influence by the nature of the information and data sets with which they are presented. Or, AI models could be unknowingly fed biased data that affects their output.
This post is a token of appreciation for the amazing open source community of Data Science, to which I owe a lot of what I have learned. For last few months, I have been working on a side project of mine to develop machine learning application on streaming data. It was a great learning experience with numerous challenges and lots of learning, some of which I have tried to share in here. This post is focused on how to deploy machine learning models on streaming data and covers all 3 necessary areas of a successful production application: infrastructure, technology, and monitoring. The first step for any successful application is to determine the technology stack in which it should be written, on the basis of business requirements.
In this episode of the Data Show, I spoke with Sharad Goel, assistant professor at Stanford, and his student Sam Corbett-Davies. They recently wrote a survey paper, "A Critical Review of Fair Machine Learning," where they carefully examined the standard statistical tools used to check for fairness in machine learning models. It turns out that each of the standard approaches (anti-classification, classification parity, and calibration) has limitations, and their paper is a must-read tour through recent research in designing fair algorithms. We talked about their key findings, and, most importantly, I pressed them to list a few best practices that analysts and industrial data scientists might want to consider. Sam Corbett-Davies: The problem with many of the standard metrics is that they fail to take into account how different groups might have different distributions of risk.
Neural networks are among the most accurate supervised learning methods in use today. However, their opacity makes them difficult to trust in critical applications, especially when conditions in training may differ from those in practice. Recent efforts to develop explanations for neural networks and machine learning models more generally have produced tools to shed light on the implicit rules behind predictions. These tools can help us identify when models are right for the wrong reasons. However, they do not always scale to explaining predictions for entire datasets, are not always at the right level of abstraction, and most importantly cannot correct the problems they reveal. In this thesis, we explore the possibility of training machine learning models (with a particular focus on neural networks) using explanations themselves. We consider approaches where models are penalized not only for making incorrect predictions but also for providing explanations that are either inconsistent with domain knowledge or overly complex. These methods let us train models which can not only provide more interpretable rationales for their predictions but also generalize better when training data is confounded or meaningfully different from test data (even adversarially so).