The next Windows 10 update opens the way for the integration of artificial intelligence within Windows applications, directly impacting hundreds of millions of devices from Windows PCs and tablets to IoT Edge devices. The new version of the Windows ML platform allows developers to integrate pre-trained deep-learning models within their applications directly in Visual Studio. The models must be converted into the Open Neural Network Exchange (ONNX) format before importing into VS tools. ONNX is an open-source machine-learning framework launched by Microsoft and Facebook in September 2017, later joined by AWS. ONNX enables portability between neural-network frameworks, making it possible for models trained with tools like Pytorch, Apache MxNet, caffe2 or Microsoft Cognitive Toolkit (CNTK) to be translated to ONNX and later implemented in Windows applications.
Google recently introduced ML KIT, a machine-learning module fully integrated in its Firebase mobile development platform and available for both iOS and Android. With this new Firebase module, Google simplifies the creation of machine-learning powered applications on mobile phones and solves some of the challenges of implementing computationally intense features on mobile devices. ML Kit allows mobile developers to create machine-learning features based on some of the models available in its deep-learning Vision API such as image labeling, OCR and face detection. ML Kit is available both for Android and iOS applications directly within the Firebase platform alongside other Google Cloud based modules such as authentication and storage. ML Kit aims at solving several of the challenges specific to mobile devices which are raised by the computationally intensive operations required for artificial intelligence.
At WWDC Apple released Core ML 2: a new version of their machine learning SDK for iOS devices. The new release of Core ML, whose first version was released in June 2017, should create an inference time speedup of 30% for apps developed using Core ML 2. They achieve this using two techniques call "batch prediction" and "quantization". Batch prediction refers to the practice of predicting for multiple inputs at the same time (e.g. Quantization is the practice of representing weights and activation in fewer bits during inference than during training. During training, you can use floating-point numbers used for weights and activations, but they slow down computation a lot during inference on non-GPU devices.
Google has made their custom chips, Tensor Processing Units (TPU) for running machine learning workloads written for its TensorFlow framework, available in beta for Machine Learning (ML) experts and developers. With Google's Cloud TPUs, ML models can run on demand at lower costs and higher performance.