Segmenting Anything with Segment Anything and FiftyOne

Kyle Stratis

The recent explosion of generative AI has led to a new generation of promptable zero-shot (requiring no additional training) models in many domains, with perhaps the most exciting being Meta's Segment Anything Model (SAM). Trained on 11 million images with over 1.1 billion segmentation masks, Segment Anything can rapidly segment entire images as well as individual regions within an image with the use of a point or bounding box prompt.

When you use Segment Anything on its own, you have to write additional visualization code if you want to actually see your masks. Enter FiftyOne, an open source toolkit for building computer vision datasets visually. With FiftyOne, you can view, slice, filter, and do any number of operations on your image datasets. Powerful enough on its own, but when you introduce Segment Anything to the mix, you have an indispensable toolchain for building and inspecting segmentation datasets.

In this article, you will learn how to:

  • Set up an environment with FiftyOne, SAM, and an image dataset
  • Segment a single detected object in an image with Segment Anything and view it in FiftyOne
  • Segment all detected objects in an image with Segment Anything and view them in FiftyOne
  • Segment all detected objects in all images in a dataset with Segment Anything and view them in FiftyOne

Note: You can follow along in the code blocks below or in this Jupyter Notebook.

Setting Up

The Environment

To begin, create and activate a virtual environment using your tool of choice. In the activatedenvironment, install FiftyOne, SAM, and a few additional tools to work with your data:

$ pip install fiftyone git+https://github.com/facebookresearch/segment-
anything.git torch torchvision opencv-python numpy==1.24.4

After installing these dependencies, download the default (vit_h) Segment Anything model checkpoint to the same directory as your code. You can experiment with the other model sizes on offer as well, just ensure to update your code to point to the correct file.

Then, import the following packages:

# Standard ibrary imports
from copy import deepcopy
# External imports
import cv2
import fiftyone as fo
import fiftyone.zoo as foz
import numpy as np
from segment_anything import SamPredictor, sam_model_registry
import torch

Now that you've imported your dependencies,

The Model

To use the model, you will need to load it into memory from the checkpoint you downloaded and then instantiate a SamPredictor object. If you have a CUDA-enabled GPU, you can optionally move the loaded model to your GPU.

sam_checkpoint = "sam_vit_h_4b8939.pth"
model_type = "vit_h"
sam = sam_model_registry[model_type](checkpoint=sam_checkpoint)
# Only run the below line if you have a CUDA-enabled GPU
sam.to(device="cuda")
predictor = SamPredictor(sam)

The Dataset

Now it's time to set up your dataset. Even though assembling a dataset is normally the most tedious part of any computer vision project, FiftyOne's data zoo provides you with easy access to several major datasets.

For this tutorial, we will use FiftyOne's quickstart dataset, and then take a slice of the first 10 images:

dataset = foz.load_zoo_dataset("quickstart")
sliced_view = dataset[:10]

That's it. load_zoo_dataset() will download the dataset if you don't have it already and load it into memory, and list slice notation allows you to create sliced_view which is a DatasetView. The DatasetView is an object that allows you to split, reorganize, tag, and otherwise modify the samples in your dataset without actually changing the underlying data.

This dataset is especially useful because it contains ground truth bounding box annotations, which you can use as prompts to direct SAM where in the image to segment.

Segmenting and Viewing

Segmenting and Viewing a Single Object in a Single Image

In the last section, you set up your environment, your Segment Anything model, and your dataset. Now you can explore how to segment images with SAM and view the segments in FiftyOne, starting with one single sample.

First, select the first sample from the sliced_view dataset view:

sample = sliced_view.first()

! Warning
You can't access individual Sample objects by their index. Instead you can use methods like .first() and .last(), sample IDs, or filepaths.

To use Segment Anything, you have to load your image into memory as a NumPy array and then call .set_image() to generate embeddings for your image. Because we are using OpenCV in this tutorial, you will have to also change the color format from the default BGR to RGB:

image = cv2.imread(sample["filepath"])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
predictor.set_image(image)

With the embeddings created, you can start generating predictions. As mentioned earlier, you will be using the ground truth detection bounding boxes available in the Sample as bounding box prompts, but for this part of the tutorial we will start with just a single detection.

Even though you have bounding boxes available to you, they aren't in a format that SAM understands. Luckily, FiftyOne provides a utility to convert from FiftyOne's relative xywh format to the absolute xyxy VOC format.

h, w, _ = image.shape
detection = sample["ground_truth"]["detections"][0]
input_bbox = fo.utils.voc.VOCBoundingBox.from_detection_format(detection["bou

Now that you have input_bbox, you can call SamPredictor's predict method to generate a single mask and score for the main image contained within the bounding box prompt:

mask, score, _ = predictor.predict(
    box=np.array([input_bbox.xmin, input_bbox.ymin, input_bbox.xmax, input_bb
    multimask_output=False,
)

After generating the mask and its score, you can load this information into the Sample object as a prediction in its predictions field. An easy way to do this is to make a deep copy of the detection, then add the mask to the mask field and the score to the confidence field.

Take a deeper look at the second line below before moving on, because a few things are going on here. FiftyOne expects segmentation masks to be 2D arrays that only cover the governing bounding box, but if you check the mask's shape, you'll see that it is a 3D array with the dimensions of the original image.

So we instead store the 2D array with mask[0] and only take the values inside of the bounding box with list slicing.

To save this as a detection, pass it in a list to the Detections constructor, add it to the Sample object in the predictions field, and save the sample:

prediction = deepcopy(detection)
prediction["mask"] = mask[0][input_bbox.ymin:input_bbox.ymax+1, input_bbox.xm
prediction["confidence"] = score
sample["predictions"] = fo.Detections(detections=[prediction])
sample.save()

And with that, you've successfully generated a segmentation mask with Segment Anything and saved it along with its score to a Sample object representing the sample image.

With that done, you can launch the FiftyOne application and look at the segmented object:

session = fo.launch_app(sliced_view)

A screenshot of the FiftyOne app showing a 10-image dataset and the first image with a Segment Anything-generated segmentation mask

Putting this code together, you should have:

# Get the sample and open the image
sample = sliced_view.first()
image = cv2.imread(sample["filepath"])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Generate embeddings
predictor.set_image(image)

# Get ground truth bounding box and convert it to VOC
h, w, _ = image.shape
detection = sample["ground_truth"]["detections"][0]
input_bbox = fo.utils.voc.VOCBoundingBox.from_detection_format(detection["bounding_box"], (w, h))

# Generate and save the mask to the sample and sample to the DatasetView
mask, score, _ = predictor.predict(
    box=np.array([input_bbox.xmin, input_bbox.ymin, input_bbox.xmax, input_bbox.ymax]),
    multimask_output=False,
)
prediction = deepcopy(detection)
prediction["mask"] = mask[0][input_bbox.ymin:input_bbox.ymax+1, input_bbox.xmin:input_bbox.xmax+1]
prediction["confidence"] = score
sample["predictions"] = fo.Detections(detections=[prediction])
sample.save()

# Launch FiftyOne app
session = fo.launch_app(sliced_view)

Segmenting and Viewing All Detections in a Single Image

The hard part is done. Now you can apply the code you wrote above to all detected objects in an image by iterating through the sample's ground truth detections and storing all predictions in a list before saving them to the sample:

# Get the sample and open the image
sample = sliced_view.first()
image = cv2.imread(sample["filepath"])
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Generate embeddings
predictor.set_image(image)

# Iterate through all detections the sample
predictions = []
h, w, _ = image.shape
for detection in sample["ground_truth"]["detections"]:

    # Get ground truth bounding box and convert it to VOC 
    input_bbox = fo.utils.voc.VOCBoundingBox.from_detection_format(detection["bounding_box"], (w, h))

    # Generate and save the mask to the sample and sample to the DatasetView
    mask, score, _ = predictor.predict(
        box=np.array([input_bbox.xmin, input_bbox.ymin, input_bbox.xmax, input_bbox.ymax]),
        multimask_output=False,
    )
    prediction = deepcopy(detection)
    prediction["mask"] = mask[0][input_bbox.ymin:input_bbox.ymax+1, input_bbox.xmin:input_bbox.xmax+1]
    prediction["confidence"] = score
    predictions.append(prediction)

# Create a Detections object from the predictions list and save it to the sample
sample["predictions"] = fo.Detections(detections=predictions)
sample.save()

While this code sample may look more complex, it is the same code you already wrote, just moved into a for loop that builds the predictions list.

To view the new segments, you can either fully relaunch the FiftyOne app or, more simply, refresh the session:

session.refresh()

After doing this, you will see 3 segment masks overlaid on the first picture in the dataset:

The FiftyOne interface now showing 3 segment masks in the first image of the dataset

Segmenting and Viewing All Detections in All Images in a Dataset

This part is even simpler than the last. The code you've already written in the previous session just needs to be wrapped in a for loop, iterating through each sample in the sliced_view DatasetView:

for sample in sliced_view:
    image = cv2.imread(sample["filepath"])
    image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    predictor.set_image(image)
    predictions = []
    h, w, _ = image.shape
    for detection in sample["ground_truth"]["detections"]:
        input_bbox = fo.utils.voc.VOCBoundingBox.from_detection_format(detection["bounding_box"], (w, h))
        mask, score, _ = predictor.predict(
            box=np.array([input_bbox.xmin, input_bbox.ymin, input_bbox.xmax, input_bbox.ymax]),
            multimask_output=False,
        )
        prediction = deepcopy(detection)
        prediction["mask"] = mask[0][input_bbox.ymin:input_bbox.ymax+1, input_bbox.xmin:input_bbox.xmax+1]
        prediction["confidence"] = score
        predictions.append(prediction)
    sample["predictions"] = fo.Detections(detections=predictions)
    sample.save()

Again, just reset the FiftyOne session and you'll be able to see segments displayed on all of the images in your dataset:

session.refresh()

The FiftyOne interface now showing segments for all objects in all images in the dataset

With all images segmented, you can now do all kinds of operations on your dataset via FiftyOne. Do you remember saving the mask's confidence score in the prediction dictionary earlier in the tutorial? Now you can filter based on that score: just click "predictions" under "LABELS" and slide the slider to contain any range you want, like this setting that excludes all detections under a 0.9 confidence score:

The FiftyOne interface showing only high-confidence segmentation masks and the open predictions panel, revealing the confidence slider

Wrapping Up

Segment Anything and FiftyOne are a powerful combination for anyone building segmentation datasets for computer vision tasks. Segment Anything allows you to automatically generate whole-image and region-specific segmentation masks with no additional training while FiftyOne allows you to view, filter, and organize your datasets.

In this tutorial, you joined these two powerful tools together, learning how to:

  • Set up Segment Anything to run locally from a checkpoint file
  • Download pre-built datasets from FiftyOne's dataset zoo
  • Segment a single detected object in an image with Segment Anything
  • Segment all detected objects in an image with Segment Anything
  • Segment all detected objets in all images in a dataset with Segment Anything
  • Store and view segmentation masks in FiftyOne

What kinds of datasets will you be exploring now with Segment Anything and FiftyOne?

Related Stories

Applied AI

Top 8 Generative AI Trends Businesses Should Embrace

Applied AI

A primer on generative models for music production

Applied AI

AI in Private Equity: A Guide to Smarter Investing

Applied AI

AI Consulting in Healthcare: The Complete Guide

Applied AI

AI Implementation in Healthcare: How to Keep Data Secure and Stay Compliant

Applied AI

3 things we learned building Tribe and why project-based work will change AI

Applied AI

AI-Driven Digital Transformation

Applied AI

AI for Cybersecurity: How Online Safety is Enhanced by Artificial Intelligence

Applied AI

5 machine learning engineers predict the future of self-driving

Applied AI

7 Key Benefits of AI in Software Development

Applied AI

How 3 Companies Automated Manual Processes Using NLP

Applied AI

AI Consulting in Finance: Benefits, Types, and What to Consider

Applied AI

10 Common Mistakes to Avoid When Building AI Apps

Applied AI

A Deep Dive Into Machine Learning Consulting: Case Studies and FAQs

Applied AI

The Secret to Successful Enterprise RAG Solutions

Applied AI

A Gentle Introduction to Structured Generation with Anthropic API

Applied AI

AI Implementation: Ultimate Guide for Any Industry

Applied AI

From PoC to Production: Scaling Bright’s Training Simulations with Tribe AI & AWS Bedrock

Applied AI

What our community of 200+ ML engineers and data scientist is reading now

Applied AI

AI Consulting in Insurance Industry: Key Considerations for 2024 and Beyond

Applied AI

How to Build a Data-Driven Culture With AI in 6 Steps

Applied AI

Scalability in AI Projects: Strategies, Types & Challenges

Applied AI

Generative AI: Powering Business Growth across 7 Key Operations

Applied AI

7 Strategies to Improve Customer Care with AI

Applied AI

Top 5 AI Solutions for the Construction Industry

Applied AI

Best Practices for Integrating AI in Healthcare Without Disrupting Workflows

Applied AI

Everything you need to know about generative AI

Applied AI

Advanced AI Analytics: Strategies, Types and Best Practices

Applied AI

Top 10 Common Challenges in Developing AI Solutions (and How to Overcome Them)

Applied AI

How to Enhance Data Privacy with AI

Applied AI

Thoughts from AWS re:Invent

Applied AI

AI and Blockchain Integration: How They Work Together

Applied AI

AI in Portfolio Management

Applied AI

How to Measure ROI on AI Investments

Applied AI

AI Security: How to Use AI to Ensure Data Privacy in Finance Sector

Applied AI

How AI for Fraud Detection in Finance Bolsters Trust in Fintech Products

Applied AI

How the U.S. can accelerate AI adoption: Tribe AI + U.S. Department of State

Applied AI

How to Use Generative AI to Boost Your Sales

Applied AI

10 AI Techniques to Improve Developer Productivity

Applied AI

AI in Construction: How to Optimize Project Management and Reducing Costs

Applied AI

How to Optimize Supply Chains with AI

Applied AI

How to Measure and Present ROI from AI Initiatives

Applied AI

How AI Enhances Real-Time Credit Risk Assessment in Lending

Applied AI

Welcome to Tribe House New York 👋

Applied AI

Navigating the Generative AI Landscape: Opportunities and Challenges for Investors

Applied AI

AI and Predictive Analytics in the Cryptocurrency Market

Applied AI

AI and Predictive Analytics in Investment

Applied AI

Machine Learning in Healthcare: 7 real-world use cases

Applied AI

Leveraging Data Science – From Fintech to TradFi with Christine Hurtubise

Applied AI

How data science drives value for private equity from deal sourcing to post-investment data assets

Applied AI

10 ways to succeed at ML according to the data superstars

Applied AI

Current State of Enterprise AI Adoption, A Tale of Two Cities

Applied AI

8 Prerequisites for AI Transformation in Insurance Industry

Applied AI

AI in Banking and Finance: Is It Worth The Risk? (TL;DR: Yes.)

Applied AI

8 Ways AI for Healthcare Is Revolutionizing the Industry

Applied AI

Key Generative AI Use Cases From 10 Industries

Applied AI

Using data to drive private equity with Drew Conway

Applied AI

How to build a highly effective data science program

Applied AI

How AI Enhances Hospital Resource Management and Reduces Operational Costs

Applied AI

How to Evaluate Generative AI Opportunities – A Framework for VCs

Applied AI

Why do businesses fail at machine learning?

Applied AI

The Hitchhiker’s Guide to Generative AI for Proteins

Applied AI

How to Reduce Costs and Maximize Efficiency With AI in Finance

Applied AI

AI Diagnostics in Healthcare: How Artificial Intelligence Streamlines Patient Care

Applied AI

Write Smarter, Not Harder: AI-Powered Prompts for Every Product Manager

Applied AI

Common Challenges of Applying AI in Insurance and Solutions

Applied AI

How to Improve Sales Efficiency Using AI Solutions

Applied AI

AI in Customer Relationship Management

Applied AI

Tribe's First Fundraise

Applied AI

How to Reduce Costs and Maximize Efficiency With AI in Insurance

Applied AI

Self-Hosting Llama 3.1 405B (FP8): Bringing Superintelligence In-House

Applied AI

AI in Finance: Common Challenges and How to Solve Them

Applied AI

7 Prerequisites for AI Tranformation in Healthcare Industry

Applied AI

7 Effective Ways to Simplify AI Adoption in Your Company

Applied AI

Understanding MLOps: Key Components, Benefits, and Risks

Applied AI

Announcing Tribe AI’s new CRO!

Applied AI

How AI is Cutting Healthcare Costs and Streamlining Operations

Applied AI

Top 9 Criteria for Evaluating AI Talent

Applied AI

How AI Improves Knowledge Process Automation

Applied AI

A Guide to AI in Insurance: Use Cases, Examples, and Statistics

Applied AI

What the OpenAI Drama Taught us About Enterprise AI

Applied AI

How to Seamlessly Integrate AI in Existing Finance Systems

Applied AI

An Actionable Guide to Conversational AI for Customer Service

Applied AI

Tribe welcomes data science legend Drew Conway as first advisor 🎉

Applied AI

Making the moonshot real – what we can learn from a CTO using ML to transform drug discovery

Applied AI

AI in Construction in 2024 and Beyond: Use Cases and Benefits

Applied AI

No labels are all you need – how to build NLP models using little to no annotated data

Applied AI

10 Expert Tips to Improve Patient Care with AI

Applied AI

Key Takeaways from Tribe AI’s LLM Hackathon

Get started with Tribe

Companies

Find the right AI experts for you

Talent

Join the top AI talent network

Close
Kyle Stratis