Goto

Collaborating Authors

 pipeline persistence and json serialization


Scikit-learn Pipeline Persistence and JSON Serialization

@machinelearnbot

First off, I would like to thank Sebastian Raschka, and Chris Wagner for providing the text and code that proved essential for writing this blog. For some time now, I have been wanting to replace simply pickling my sklearn pipelines. Pickle is incredibly convenient, but can be easy to corrupt, is not very transparent, and has compatibility issues. The latter has been quite a thorn in my side for several projects, and I stumbled upon it again while working on my own small text mining framework. Persistence is imperative when deploying a pipeline to a practical application like demo. Each piece of new data needs to be constructed in exactly the same vector size as it was offered in during development.