Goto

Collaborating Authors

 hq-sam


Supplementary Material: Segment Anything in High Quality

Neural Information Processing Systems

In this supplementary material, Section 1 first presents the additional experimental analysis of our HQ-SAM, including more zero-shot transfer comparisons to SAM on both image and video benchmarks. SAM vs. HQ-SAM on V arious Backbones In Table 1, we provide a comprehensive comparison Table 2: Results on Y ouTubeVIS 2019 validation set and HQ-YTVIS test set using ViT -L based SAM. In Table 2, HQ-SAM achieves consistent gains of 1.4 points in Tube Mask AP, Robustness to Input Box Prompts In Table 4, we compare HQ-SAM to SAM by adding various scales of noises to the input ground truth box prompts. "center" point of Ground Truth (GT) masks, which is at a maximal value location in a mask's interior Results not obtained in a zero-shot manner (i.e. the training HQ-SAM improves over SAM, but still cannot achieve fully correct mask prediction. HQ-SAM produces significantly more accurate boundaries.



Segment Anything in High Quality

Neural Information Processing Systems

The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We design a learnable High-Quality Output Token, which is injected into SAM's mask decoder and is responsible for predicting the high-quality mask.






Segment Anything in High Quality

Neural Information Processing Systems

The recent Segment Anything Model (SAM) represents a big leap in scaling up segmentation models, allowing for powerful zero-shot capabilities and flexible prompting. Despite being trained with 1.1 billion masks, SAM's mask prediction quality falls short in many cases, particularly when dealing with objects that have intricate structures. We propose HQ-SAM, equipping SAM with the ability to accurately segment any object, while maintaining SAM's original promptable design, efficiency, and zero-shot generalizability. Our careful design reuses and preserves the pre-trained model weights of SAM, while only introducing minimal additional parameters and computation. We design a learnable High-Quality Output Token, which is injected into SAM's mask decoder and is responsible for predicting the high-quality mask.


How Much You Ate? Food Portion Estimation on Spoons

Sharma, Aaryam, Czarnecki, Chris, Chen, Yuhao, Xi, Pengcheng, Xu, Linlin, Wong, Alexander

arXiv.org Artificial Intelligence

Monitoring dietary intake is a crucial aspect of promoting healthy living. In recent years, advances in computer vision technology have facilitated dietary intake monitoring through the use of images and depth cameras. However, the current state-of-the-art image-based food portion estimation algorithms assume that users take images of their meals one or two times, which can be inconvenient and fail to capture food items that are not visible from a top-down perspective, such as ingredients submerged in a stew. To address these limitations, we introduce an innovative solution that utilizes stationary user-facing cameras to track food items on utensils, not requiring any change of camera perspective after installation. The shallow depth of utensils provides a more favorable angle for capturing food items, and tracking them on the utensil's surface offers a significantly more accurate estimation of dietary intake without the need for post-meal image capture. The system is reliable for estimation of nutritional content of liquid-solid heterogeneous mixtures such as soups and stews. Through a series of experiments, we demonstrate the exceptional potential of our method as a non-invasive, user-friendly, and highly accurate dietary intake monitoring tool.


Promoting Segment Anything Model towards Highly Accurate Dichotomous Image Segmentation

Liu, Xianjie, Fu, Keren, Zhao, Qijun

arXiv.org Artificial Intelligence

Abstract--Segmenting any object represents a crucial step towards achieving artificial general intelligence, and the "Segment Anything Model" (SAM) has significantly advanced the development of foundational models in computer vision. We have high expectations regarding whether SAM can enhance highly accurate dichotomous image segmentation. In fact, the evidence presented in this article demonstrates that by inputting SAM with simple prompt boxes and utilizing the results output by SAM as input for IS5Net, we can greatly improve the effectiveness of highly accurate dichotomous image segmentation. Over the last few months, there has points/boxes/masks to provide information for the decoder. The impressive that embed the extracted image features, connected outputs, and understanding capabilities of these large models have left users cue labels together for the final mask prediction.