vision problem
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Sanderson, Edward, Matuszewski, Bogdan J.
Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained in a supervised manner with ImageNet-1k as backbones. However, the use of modern self-supervised pretraining algorithms and a recent dataset of 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks. In addition to identifying the most suitable pretraining pipeline and backbone architecture for each task, out of those considered, our results suggest: that self-supervised pretraining generally produces more suitable backbones for GIE vision tasks than supervised pretraining; that self-supervised pretraining with ImageNet-1k is typically more suitable than pretraining with Hyperkvasir-unlabelled, with the notable exception of monocular depth estimation in colonoscopy; and that ViT-Bs are more suitable in polyp segmentation and monocular depth estimation in colonoscopy, ResNet50s are more suitable in polyp detection, and both architectures perform similarly in anatomical landmark recognition and pathological finding characterisation. We hope this work draws attention to the complexity of pretraining for GIE vision tasks, informs this development of more suitable approaches than the convention, and inspires further research on this topic to help advance this development. Code available: \underline{github.com/ESandML/SSL4GIE}
- Europe > United Kingdom > England > Lancashire > Preston (0.14)
- North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
- North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
- (3 more...)
- Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
- Health & Medicine > Diagnostic Medicine (1.00)
Spatial Latent Dirichlet Allocation
In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a bag-of-words''. It is also critical to properly designwords'' and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents.
Spatial Latent Dirichlet Allocation
In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a bag-of-words''. It is also critical to properly design words'' and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents.
Applying deep learning to real-world problems – merantix – Medium
It easier than ever before to train a neural network. However, it is rarely the case that you can just take code from a tutorial and directly make it work for your application. Interestingly, many of the most important tweaks are barely discussed in the academic literature but at the same time critical to make your product work. Therefore I thought it would be helpful for other people who plan to use deep learning in their business to understand some of these tweaks and tricks. This post is based on my talk I gave on May 10 at the Berlin.AI meetup (the slides are here).
The science of first sight: Researchers reveal how a baby's brain learns to see - and it could restore sight for people with vision problems
When a newborn first opens its eyes, it sees the world around it as blurry shapes. But a few months later, its vision starts to focus and it will start to recognize people and objects. Researchers at UNC's School of Medicine have found out more about how the brains of baby mammal's develop as they refine their sense of sight, and the research may also help restore sight for people with vision problems. When a newborn first opens its eyes, it sees the world around it as blurry shapes. The research, which was conducted on mice and published in the journal Nature Neuroscience, is part of a wider project that aims to maps the areas of the brain that play key roles in vision processing.
Inverse Graphics with Probabilistic CAD Models
Kulkarni, Tejas D., Mansinghka, Vikash K., Kohli, Pushmeet, Tenenbaum, Joshua B.
Recently, multiple formulations of vision problems as probabilistic inversions of generative models based on computer graphics have been proposed. However, applications to 3D perception from natural images have focused on low-dimensional latent scenes, due to challenges in both modeling and inference. Accounting for the enormous variability in 3D object shape and 2D appearance via realistic generative models seems intractable, as does inverting even simple versions of the many-to-many computations that link 3D scenes to 2D images. This paper proposes and evaluates an approach that addresses key aspects of both these challenges. We show that it is possible to solve challenging, real-world 3D vision problems by approximate inference in generative models for images based on rendering the outputs of probabilistic CAD (PCAD) programs. Our PCAD object geometry priors generate deformable 3D meshes corresponding to plausible objects and apply affine transformations to place them in a scene. Image likelihoods are based on similarity in a feature space based on standard mid-level image representations from the vision literature. Our inference algorithm integrates single-site and locally blocked Metropolis-Hastings proposals, Hamiltonian Monte Carlo and discriminative data-driven proposals learned from training data generated from our models. We apply this approach to 3D human pose estimation and object shape reconstruction from single images, achieving quantitative and qualitative performance improvements over state-of-the-art baselines.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
Spatial Latent Dirichlet Allocation
In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a ``bag-of-words''. It is also critical to properly design ``words'' and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents. Instead of knowing the partition of words into documents \textit{a priori}, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- Asia > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- North America > United States > California > Monterey County > Pacific Grove (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Learning in Computer Vision and Image Understanding
There is an increasing interest in the area of Learning in Computer Vision and Image Understanding, both from researchers in the learning community and from researchers involved with the computer vision world. The field is characterized by a shift away from the classical, purely model-based, computer vision techniques, towards data-driven learning paradigms for solving real-world vision problems. Using learning in segmentation or recognition tasks has several advantages over classical model-based techniques. These include adaptivity to noise and changing environments, as well as in many cases, a simplified system generation procedure. Yet, learning from examples introduces a new challenge - getting a representative data set of examples from which to learn.
- North America > United States > North Carolina (0.05)
- North America > United States > California > Los Angeles County > Pasadena (0.05)
- North America > Canada > Ontario > Toronto (0.05)
Learning in Computer Vision and Image Understanding
There is an increasing interest in the area of Learning in Computer Vision and Image Understanding, both from researchers in the learning community and from researchers involved with the computer vision world. The field is characterized by a shift away from the classical, purely model-based, computer vision techniques, towards data-driven learning paradigms for solving real-world vision problems. Using learning in segmentation or recognition tasks has several advantages over classical model-based techniques. These include adaptivity to noise and changing environments, as well as in many cases, a simplified system generation procedure. Yet, learning from examples introduces a new challenge - getting a representative data set of examples from which to learn.
- North America > United States > North Carolina (0.05)
- North America > United States > California > Los Angeles County > Pasadena (0.05)
- North America > Canada > Ontario > Toronto (0.05)