devclass
TensorFlow 2.3 aims for program 'understanding', resource economy • DEVCLASS
A good two months after its last big release, the TensorFlow team has bestowed version 2.3 upon followers of the self-proclaimed machine learning framework for everyone. TensorFlow 2.3 seems to have put a special focus on understanding and reducing resource usage, with new mechanisms in the data library and fresh profiler tools being among the most highlighted additions. An experimental snapshot API in tf.data for example is meant to store the output of a preprocessing pipeline to disk, so that already processed data can be reused, saving CPU resources needed to compute them again in other steps. Moreover, the tf.data service aims to speed up the training process in cases where the attached host isn't able to "keep up with the data consumption of the model". If a model for example can process more images then the host can generate, the service can take over leveraging a cluster of workers to prepare the needed amount of training data.
PyTorch lights up version 1.6, follows competition down the profiling route • DEVCLASS
Just one day after TensorFlow hit version 2.3, Facebook's challenger project PyTorch was updated to 1.6, sporting support for automatic mixed precision training and a changed classification scheme for new features. The classification system will fall into one of three categories: stable, beta, or prototype. Beta corresponds to what had been known as experimental features, meaning there is a proven added value, but the API could still change or there are performance or coverage issues yet to tackle. Examples for features in this category include custom C classes, named tensors, and PyTorch Mobile. Prototypes are meant for getting "high bandwidth" feedback on the utility of a proposed new feature in order to either commit to getting it to beta or let it fall by the wayside. Prototypes aren't part of a binary and only available for those building from source or using nightlies or the associated compiler flag, which is why a couple of neat additions such as a profiler for distributed training or graph mode quantisation are a bit trickier to access.
TensorFlow look to get more frugal with resources ahead of 2.3 release • DEVCLASS
Data scientists and machine learning types get a last chance for input in the upcoming 2.3 release of machine learning framework TensorFlow with the release candidate, which is now available, showcasing new features to tackle bottlenecks and preprocess data. The former is mostly realised through experimental snapshot and distribution mechanisms, which can now be found in TensorFlow's data module. They allow users to persist the outputs of their preprocessing to use in consequent steps and produce data for parallel iterations over a data set, leading to lesser resource consumption and speedups. If this doesn't help, developers will also find a memory profiler and a tracer for Python code in TensorFlow 2.3. With those, investigating performance bottlenecks should become a bit easier, providing teams with at least some clues as to what they could investigate to speed up their code.
What's in a data analyst? Survey suggests Python user base mainly devs • DEVCLASS
Last November, the Python Software Foundation and dev tool creator JetBrains took the pulse of the Python community for the third time. The results are out now, suggesting that Python users have gotten more into containers and mostly define themselves as developers, no matter how much data science they do. The Python Developers Survey 2019 was apparently answered by more than 24,000 Python users from over 150 countries and gives insight into what the language is used for and where. The "what" part has been relatively stable in the last three years, meaning that most respondents still say to use Python for data analysis purposes (58 per cent), followed by web development at 49 per cent. Other often cited project areas include DevOps/sysadmin/automation and machine learning (39 per cent, respectively).
- Information Technology > Artificial Intelligence > Machine Learning (0.53)
- Information Technology > Data Science > Data Mining > Big Data (0.52)
Google Cloud introduces pipelines for those beyond ML prototyping • DEVCLASS
The Google Cloud team just celebrated the beta launch of its AI Platform Pipelines feature with a couple of additions and improvements to the machine learning workflow execution environment. The product was started to provide those at the beginning of their machine learning journey with a way to "deploy robust, repeatable machine learning pipelines along with monitoring, auditing, version tracking, and reproducibility" in an "easy to install, secure" environment. It is therefore mainly made up of the infrastructural component needed to run the workflows, as well as tools for creating and sharing pipelines. Since the service is part of Google Cloud, it can be quickly installed via the company's cloud console, which also takes care of access management. Options for the building pipelines part boil down to the Kubeflow Pipelines SDK, which isn't surprising given that the AI Platform Pipelines run on a GKE cluster, and the development kit for TensorFlow Extended (TFX), TF's end-to-end machine learning platform.
Google tells AI to explain itself • DEVCLASS
Google has added Explainable AI services to its cloud platform, in an effort to make the decision making processes of machine learning models more transparent to users, and thus build greater trust in the models themselves. Announced on the Google Cloud Blog, the new capability is intended to improve the interpretability of machine learning models. But this is no easy task, as Google admits. Google Cloud AI Explanations takes the approach of quantifying each data factor's contribution to the output of a particular machine learning model to try and assist the human user in understanding why the model made the decisions it did. In other words, it is a far cry from an explanation in layman's terms, and will only really make sense to the data scientists or developers that are building the model in the first place.
Google AI Platform offers any model you like...as long as you like TensorFlow and Nvidia • DEVCLASS
Google has added the option of Nvidia GPUs to its AI Platform as part of an overhaul of the as a service platform. As Google explains, "ML models are so complex that they only run with acceptable latency on machines with many CPUs, or with accelerators like NVIDIA GPUs. This is especially true of models processing unstructured data like images, video, or text." Which would be why the service's one current generally available option of one vCPU, 2GB of RAM, and no GPU support seems a little bare bones – though it does support all types of models artifacts, with a maximum model size of 500MB. This basic tier is now joined by a beta option of 4 vCPUs.
TensorFlow ends 1.x series with default GPU support and compatibility helpers • DEVCLASS
Machine learning framework TensorFlow 1.15 is now available to download, offering those too shy to make the switch to TF 2.0 a way to emulate the new major version's behaviour, as well as offering additional features such as tensor equality and default GPU support. The release is the last of the 1.x branch, since the revamped TensorFlow 2.0 has already been out since end of September 2019. Moving forward, new features will likely be reserved for the more current series, but according to the release notes, patch releases will keep 1.x users safe from vulnerabilities for at least another year. Before updating, users should be aware of the breaking changes included in the release, especially in tf.keras. In the latter, the number of threads has to be configured using the tf.config.threading
PyTorch tries keeping up with research interest in 1.3 release • DEVCLASS
PyTorch has debuted a slew of experimental features in its just-released version 1.3 as support for the TensorFlow competitor broadens, and new tools to tackle challenges like privacy appear. PyTorch 1.3 seems to be right on trend with its new capabilities, adding, for example, previews of implementations for model quantisation and on-device machine learning. The latter is heavily looked into these days, as interest in privacy-focused approaches soars. Mobile support is one of the building blocks to, for example, realise federated learning, a technique which allows training data to be spread between clients, meaning that data doesn't have to leave a device anymore to be included in the training of a centralised model. In its first iteration, mobile support comes down to prebuilt LibTorch libraries for Android and iOS, optimised implementations for certain operators, modules making sure that TorchScript inference is possible and forward operations can be executed on mobile CPUs.
Big Blue opens up hub for machine learning datasets • DEVCLASS
IBM has launched a repository of datasets for training which data scientists can pick and mix to train their deep learning and machine learning models. The IBM Data Asset eXchange (DAX) is designed to complement the Model Asset eXchange it launched earlier this year, which offers researchers and developers models to deploy or train with their own data. In a blog announcing the data exchange, a quartet of IBM luminaries, wrote "Developers adopting ML models need open data that they can use confidently under clearly defined open data licenses." The data sets in question will be covered by the Linux Foundation's Community Data License Agreement (CDLA) open data licensing framework to enable data sharing and collaboration – "where possible". DAX will also provide "unique access to various IBM and IBM Research datasets."