fiftyone
FiftyOne Tips and Tricks for Accelerating Computer Vision Workflows -- Mar 17, 2023
Welcome to our weekly FiftyOne tips and tricks blog where we give practical pointers for using FiftyOne on topics inspired by discussions in the open source community. This week we'll cover some tips and tricks that will help you accelerate your computer vision workflows using FiftyOne. FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. Ok, let's dive into this week's tips and tricks! One of the great features of PyTorch is the DataLoader class, which makes it easy to efficiently load and process data.
Giving YOLOv8 a Second Look (Part 1)
Welcome to the first part in our three part series on YOLOv8! In this series, we'll show you how to work with YOLOv8, from downloading the off-the-shelf models, to fine-tuning these models for specific use cases, and everything in between. Throughout the series, we will be using two libraries: FiftyOne, the open source computer vision toolkit, and Ultralytics, the library that will give us access to YOLOv8. In Part 1, you'll learn how to generate, load, and visualize YOLOv8 predictions. In Part 2, we'll show you how to evaluate the quality of YOLOv8 model predictions.
Feb 10 2023 Computer Vision Tips and Tricks using open source FiftyOne
Welcome to our weekly FiftyOne tips and tricks blog where we recap interesting questions and answers that have recently popped up on Slack, GitHub, Stack Overflow, and Reddit. FiftyOne is an open source machine learning toolset that enables data science teams to improve the performance of their computer vision models by helping them curate high quality datasets, evaluate models, find mistakes, visualize embeddings, and get to production faster. Ok, let's dive into this week's tips and tricks! "Is there a way to just get bounding boxes around the possibly missing and possibly spurious objects in my dataset?" Here, George is asking about how to isolate potential mistakes in ground truth labels on a dataset.
Train a Custom Object Detector with Detectron2 and FiftyOne
Combine the dataset curation of FiftyOne with the model training of Detectron2 to easily train custom detection modelsImage 71df582bfb39b541 from the Open Images V6 dataset (CC-BY 2.0) visualized in FiftyOneIn recent years, every aspect of the Machine Learning (ML) lifecycle has had tooling developed to make it easier to bring a custom model from an idea to a reality. The most exciting part is that the community has a propensity for open-source tools, like Pytorch and Tensorflow, allowing the model development process to be more transparent and replicable.In this post, we take a look at how to integrate two open-source tools tackling different parts of an ML project: FiftyOne and Detectron2. Detectron2 is a library developed by Facebook AI Research designed to allow you to easily train state-of-the-art detection and segmentation algorithms on your own data. FiftyOne is a toolkit designed to let you easily visualize your data, curate high-quality datasets, and analyze your model results.Together, you can use FiftyOne to curate your custom dataset, use Detectron2 to train a model on your FiftyOne dataset, then evaluate the Detectron2 model results back in FiftyOne to learn how to improve your dataset, continuing the cycle until you have a high-performing model. This post closely follows the official Detectron2 tutorial, augmenting it to show how to work with FiftyOne datasets and evaluations.Follow along in Colab!Check out this notebook to follow along with this post right in your browser.Screenshot of Colab notebook (image by author)SetupTo start, we’ll need to install FiftyOne and Detectron2.# Install FiftyOnepip install fiftyone # Install Detectron2 from Source (Other options available)python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'# (add --user if you don't have permission)# Or, to install it from a local clone:git clone https://github.com/facebookresearch/detectron2.gitpython -m pip install -e detectron2# On macOS, you may need to prepend the above commands with a few environment variables:CC=clang CXX=clang++ ARCHFLAGS="-arch x86_64" python -m pip install ...Now let’s import FiftyOne and Detectron2 in Python.https://medium.com/media/aeed86d37435228fabf6d9c9ba9de189/hrefPrepare the DatasetIn this post, we show how to use a custom FiftyOne Dataset to train a Detectron2 model. We’ll train a license plate segmentation model from an existing model pre-trained on the COCO dataset, available in Detectron2’s model zoo.Since the COCO dataset doesn’t have a “Vehicle registration plate” category, we will be using segmentations of license plates from the Open Images v6 dataset in the FiftyOne Dataset Zoo to train the model to recognize this new category.Note: Images in the Open Images v6 dataset are under the CC-BY 2.0 license.For this example, we will just use some of the samples from the official “validation” split of the dataset. To improve model performance, we could always add in more data from the official “train” split as well but that will take longer to train so we’ll just stick to the “validation” split for this walkthrough.https://medium.com/media/199e938638b63c513645062845d0a30c/hrefSpecifying a classes when downloading a dataset from the zoo will ensure that only samples with one of the given classes will be present. However, these samples may still contain other labels, so we can use the powerful filtering capability of FiftyOne to easily keep only the “Vehicle registration plate” labels. We will also untag these samples as “validation” and create our own splits out of them.https://medium.com/media/752bb3531d42324afb97a185630c61a2/hrefhttps://medium.com/media/637aec3dc2829cfc944ddeba3235408f/hrefNext, we need to parse the dataset from FiftyOne’s format to Detectron2's format so that we can register it in the relevant Detectron2 catalogs for training. This is the most important code snippet to integrate FiftyOne and Detectron2.Note: In this example, we are specifically parsing the segmentations into bounding boxes and polylines. This function may require tweaks depending on the model being trained and the data it expects.https://medium.com/media/dab5dc327d07f670d088852b01d8cd08/hrefLet’s visualize some of the samples to make sure everything is being loaded properly:https://medium.com/media/f482d61d21f5dfe480845e047745fb31/hrefVisualizing Open Images V6 training dataset in FiftyOne (Image by author)Load the Model and Train!Following the official Detectron2 tutorial, we now fine-tune a COCO-pretrained R50-FPN Mask R-CNN model on the FiftyOne dataset. This will take a couple of minutes to run if using the linked Colab notebook.https://medium.com/media/a6294adcd080b451d88f5fc75646cda5/href# Look at training curves in tensorboard:tensorboard --logdir outputTensorboard training metrics visualization (Image by author)Inference & evaluation using the trained modelNow that the model is trained, we can run it on the validation split of our dataset and see how it performs! To start,
Train a Custom Object Detector with Detectron2 and FiftyOne
In recent years, every aspect of the Machine Learning (ML) lifecycle has had tooling developed to make it easier to bring a custom model from an idea to a reality. The most exciting part is that the community has a propensity for open-source tools, like Pytorch and Tensorflow, allowing the model development process to be more transparent and replicable. In this post, we take a look at how to integrate two open-source tools tackling different parts of an ML project: FiftyOne and Detectron2. Detectron2 is a library developed by Facebook AI Research designed to allow you to easily train state-of-the-art detection and segmentation algorithms on your own data. FiftyOne is a toolkit designed to let you easily visualize your data, curate high-quality datasets, and analyze your model results.
Voxel51 lands funds for its platform to manage unstructured data
Voxel51, a startup developing a platform to analyze unstructured data, such as images and videos, has raised $12.5 million in a Series A round led by Drive Capital, with participation from Top Harvest, Shasta Ventures, eLab Ventures and ID Ventures. Founder and CEO Jason Corso tells TechCrunch that the new capital will be put toward further developing the company's platform and doubling the size of Voxel51's team from 13 to 26 employees by year-end. Corso says he, alongside machine learning PhD Brian Moore, created Voxel51 to harness the growing flood of unstructured data in AI and machine learning. A professor at the University of Michigan, Corso says he saw a "critical need" for better software infrastructure to support machine learning engineers and data scientists in visualizing, analyzing and understanding their data. "Leveraging unstructured and visual data is a significant challenge. Although we've seen recent wins in the transition of capabilities from the lab to production, such as those in ADAS, there remains a difficulty in bringing computer vision capabilities into production," Corso told TechCrunch in an email interview.
Nearest Neighbor Embeddings Search with Qdrant and FiftyOne
Neural network embeddings are a low-dimensional representation of input data that give rise to a variety of applications. Embeddings have some interesting capabilities, as they are able to capture the semantics of the data points. This is especially useful for unstructured data like images and videos, so you can not only encode pixel similarities but also some more complex relationships. Performing searches over these embeddings gives rise to a lot of use cases like classification, building up the recommendation systems, or even anomaly detection. One of the primary benefits of performing a nearest neighbor search on embeddings to accomplish these tasks is that there is no need to create a custom network for every new problem, you can often just use pre-trained models.
Increasing Data Diversity with Iterative Sampling to Improve Performance
Cavusoglu, Devrim, Eryuksel, Ogulcan, Altinuc, Sinan
As a part of the Data-Centric AI Competition, we propose a data-centric approach to improve the diversity of the training samples by iterative sampling. The method itself relies strongly on the fidelity of augmented samples and the diversity of the augmentation methods. Moreover, we improve the performance further by introducing more samples for the difficult classes especially providing closer samples to edge cases potentially those the model at hand misclassifies.
FiftyOne -- FiftyOne 0.5.5 documentation
If you are looking to boost the performance of your machine learning models, chances are improving the quality of your dataset will provide the highest return on your investment. FiftyOne is a Python-based tool for machine learning/computer vision engineers and scientists that enables you to curate better datasets. Work efficiently with FiftyOne to achieve better models with dependable performance. "Become one with your data" FiftyOne does more than improve your dataset; it gets you closer to your data. Rapidly gain insight by visualizing samples overlayed with dynamic and queryable fields such as ground truth and predicted labels, dataset splits, and much more!