Goto

Collaborating Authors

 onnx


How Do Model Export Formats Impact the Development of ML-Enabled Systems? A Case Study on Model Integration

Parida, Shreyas Kumar, Gerostathopoulos, Ilias, Bogner, Justus

arXiv.org Artificial Intelligence

Machine learning (ML) models are often integrated into ML-enabled systems to provide software functionality that would otherwise be impossible. This integration requires the selection of an appropriate ML model export format, for which many options are available. These formats are crucial for ensuring a seamless integration, and choosing a suboptimal one can negatively impact system development. However, little evidence is available to guide practitioners during the export format selection. We therefore evaluated various model export formats regarding their impact on the development of ML-enabled systems from an integration perspective. Based on the results of a preliminary questionnaire survey (n=17), we designed an extensive embedded case study with two ML-enabled systems in three versions with different technologies. We then analyzed the effect of five popular export formats, namely ONNX, Pickle, TensorFlow's SavedModel, PyTorch's TorchScript, and Joblib. In total, we studied 30 units of analysis (2 systems x 3 tech stacks x 5 formats) and collected data via structured field notes. The holistic qualitative analysis of the results indicated that ONNX offered the most efficient integration and portability across most cases. SavedModel and TorchScript were very convenient to use in Python-based systems, but otherwise required workarounds (TorchScript more than SavedModel). SavedModel also allowed the easy incorporation of preprocessing logic into a single file, which made it scalable for complex deep learning use cases. Pickle and Joblib were the most challenging to integrate, even in Python-based systems. Regarding technical support, all model export formats had strong technical documentation and strong community support across platforms such as Stack Overflow and Reddit. Practitioners can use our findings to inform the selection of ML export formats suited to their context.


Energy consumption of code small language models serving with runtime engines and execution providers

Durán, Francisco, Martinez, Matias, Lago, Patricia, Martínez-Fernández, Silverio

arXiv.org Artificial Intelligence

Background. The rapid growth of Language Models (LMs), particularly in code generation, requires substantial computational resources, raising concerns about energy consumption and environmental impact. Optimizing LMs inference for energy efficiency is crucial, and Small Language Models (SLMs) offer a promising solution to reduce resource demands. Aim. Our goal is to analyze the impact of deep learning runtime engines and execution providers on energy consumption, execution time, and computing-resource utilization from the point of view of software engineers conducting inference in the context of code SLMs. Method. We conducted a technology-oriented, multi-stage experimental pipeline using twelve code generation SLMs to investigate energy consumption, execution time, and computing-resource utilization across the configurations. Results. Significant differences emerged across configurations. CUDA execution provider configurations outperformed CPU execution provider configurations in both energy consumption and execution time. Among the configurations, TORCH paired with CUDA demonstrated the greatest energy efficiency, achieving energy savings from 37.99% up to 89.16% compared to other serving configurations. Similarly, optimized runtime engines like ONNX with the CPU execution provider achieved from 8.98% up to 72.04% energy savings within CPU-based configurations. Also, TORCH paired with CUDA exhibited efficient computing-resource utilization. Conclusions. Serving configuration choice significantly impacts energy efficiency. While further research is needed, we recommend the above configurations best suited to software engineers' requirements for enhancing serving efficiency in energy and performance.


python - BackendIsNotSupposedToImplementIt Error: Converting ONNX to Tensorflow - Stack Overflow

#artificialintelligence

When i run this code to convert onnx to tensorflow im getting error in google colab. I need to convert this onnx file to tensorflow lite so i can use it in android app. from onnx_tf.backend import


7 Lessons I've Learnt From Deploying Machine Learning Models Using ONNX

#artificialintelligence

In this post, we will outline key learnings from a real-world example of running inference on a sci-kit learn model using the ONNX Runtime API in an AWS Lambda function. This is not a tutorial but rather a guide focusing on useful tips, points to consider, and quirks that may save you some head-scratching! The Open Neural Network Exchange (ONNX) format is a bit like dipping your french fries into a milkshake; it shouldn't work but it just does. ONNX allows us to build a model using all the training frameworks we know and love like PyTorch and TensorFlow and package it up in a format supported by many hardware architectures and operating systems. The ONNX Runtime is a simple API that is cross-platform and provides optimal performance to run inference on an ONNX model exactly where you need them: the cloud, mobile, an IoT device, you name it!


Dilated Convolutional Neural Networks for Lightweight Diacritics Restoration

Csanády, Bálint, Lukács, András

arXiv.org Artificial Intelligence

Diacritics restoration has become a ubiquitous task in the Latin-alphabet-based English-dominated Internet language environment. In this paper, we describe a small footprint 1D dilated convolution-based approach which operates on a character-level. We find that solutions based on 1D dilated convolutional neural networks are competitive alternatives to models based on recursive neural networks or linguistic modeling for the task of diacritics restoration. Our solution surpasses the performance of similarly sized models and is also competitive with larger models. A special feature of our solution is that it even runs locally in a web browser. We also provide a working example of this browser-based implementation. Our model is evaluated on different corpora, with emphasis on the Hungarian language. We performed comparative measurements about the generalization power of the model in relation to three Hungarian corpora. We also analyzed the errors to understand the limitation of corpus-based self-supervised training.


Pytorch to Keras using ONNX

#artificialintelligence

Model Deployment is the method by which you integrate a machine learning model into an existing production environment to make practical business decisions based on data. It is one of the last stages in the machine learning life cycle and can be one of the most cumbersome. Model deployment is probably the most important part of the Machine Learning model lifecycle but still, the least studied one. Most of the courses out there around the ML/DL universe teach how to explore data, engineer the features, train the model, and generate predictions. But they miss the most important part: what to do after that? Apart from the models developed for learning or for Kaggle competitions, all other models are built to generate revenue, and if you don't deploy a model into production then there's no one using it and thus no revenue.


8 Alternatives to TensorFlow Serving

#artificialintelligence

TensorFlow Serving is an easy-to-deploy, flexible and high performing serving system for machine learning models built for production environments. It allows easy deployment of algorithms and experiments while allowing developers to keep the same server architecture and APIs. TensorFlow Serving provides seamless integration with TensorFlow models, and can also be easily extended to other models and data. Open-source platform Cortex makes execution of real-time inference at scale seamless. It is designed to deploy trained machine learning models directly as a web service in production.


Converting a model from Pytorch to Tensorflow: Guide to ONNX

#artificialintelligence

Open Neural Network Exchange (ONNX) is a powerful and open format built to represent machine learning models. The final outcome of training any machine learning or deep learning algorithm is a model file that represents the mapping of input data to output predictions in an efficient manner. These models are stored in different file formats depending on the framework they were created in .pkl Therein lies the problem, you can't take a model created and trained in one framework and use it or deploy it in a different framework. The intent behind ONNX is to be like the "USB standard" of the machine learning world.


PyTorch Vs TensorFlow - Facebook Vs Google - Understanding The Most Popular Deep Learning Frameworks – Fly Spaceships With Your Mind

#artificialintelligence

In recent years, the field of data science has been able to access increasingly powerful analysis methods thanks to increasingly high-performance hardware. Google's Tensorflow has been the benchmark for editing machine learning and modeling deep learning methods. It still has the most freedom today. But a wide range of options often creates a high barrier to entry. PyTorch vs TensorFlow – With the 2 years younger, also Python-based, open source package PyTorch, Facebook now wants to knock Tensorflow off its throne. It has been steadily gaining popularity for years due to its simplicity and features.


Machine learning groups form Consortium for Python Data API Standards to reduce fragmentation

#artificialintelligence

Deep learning framework Apache MXNet and Open Neural Network Exchange (ONNX) today launched the Consortium for Python Data API Standards to improve interoperability for machine learning practitioners and data scientists using any framework, library, or tool from the Python ecosystem. ONNX itself was formed by Facebook and Microsoft in 2017 to encourage interoperability between frameworks and tools. Today, ONNX includes nearly 40 organizations with influence in AI and data science, including AWS, Baidu, and IBM, along with hardware makers like Arm, Intel, and Qualcomm. The new consortium, which will develop standards for dataframes and arrays or tensors, hopes to address the fragmentation that has affected the data ecosystem in recent years. The Python programming language is used for Python dataframes like Pandas, PySpark, and Apache Arrow.