Goto

Collaborating Authors

 tfrecord format


Developing a Deep Learning Pipeline for Classifying Cassava Leaf Diseases

#artificialintelligence

After loading in the data from the train and test data folders and setting up our simple base model, we decided it would be worth the effort to figure out how to upload the data in the TFRecords format. TFRecords is a binary storage format specifically designed to expedite performance and training time of models built in Tensor Flow. In essence, data in the TFRecords format is optimized for use with Tensorflow in various aspects. Despite the previously mentioned advantages of using this data format, getting the data into a format that is ready to feed into a model is not straightforward. Doing so requires defining functions to read the files and decode the images contained in those files. It is also logical to augment the data (flip, randomly change brightness, add saturation, etc.) in this step since the images will eventually be reshaped into arrays.


trekhleb/links-detector

#artificialintelligence

The cool part about this approach is that we have the freedom to generate training examples for different fonts, ligatures, text colors, background colors. This is very useful if we want to avoid the model overfitting during the training (so that the model could generalize well to unseen real-world examples instead of failing once the background shade is changed for a bit). It is also possible to generate a variety of link types like http://, http://, ftp://, tcp:// etc. Otherwise, it might be hard to find enough real-world examples of this kind of links for training. Another benefit of this approach is that we could generate as many training examples as we want. We're not limited to the number of pages of the printed book we've found for the dataset.


tensorflow/models

#artificialintelligence

This directory contains TensorFlow models and data processing code for identifying exoplanets in astrophysical light curves. For complete background, see our paper in The Astronomical Journal. A light curve is a plot of the brightness of a star over time. We will be focusing on light curves produced by the Kepler space telescope, which monitored the brightness of 200,000 stars in our milky way galaxy for 4 years. An example light curve produced by Kepler is shown below.


tensorflow/models

#artificialintelligence

The Skip-Thoughts model is a sentence encoder. It learns to encode input sentences into a fixed-dimensional vector representation that is useful for many tasks, for example to detect paraphrases or to classify whether a product review is positive or negative. See the Skip-Thought Vectors paper for details of the model architecture and more example applications. A trained Skip-Thoughts model will encode similar sentences nearby each other in the embedding vector space. The following examples show the nearest neighbor by cosine similarity of some sentences from the movie review dataset.