AITopics | vision problem

Solutions to vision tasks in gastrointestinal endoscopy (GIE) conventionally use image encoders pretrained in a supervised manner with ImageNet-1k as backbones. However, the use of modern self-supervised pretraining algorithms and a recent dataset of 100k unlabelled GIE images (Hyperkvasir-unlabelled) may allow for improvements. In this work, we study the fine-tuned performance of models with ResNet50 and ViT-B backbones pretrained in self-supervised and supervised manners with ImageNet-1k and Hyperkvasir-unlabelled (self-supervised only) in a range of GIE vision tasks. In addition to identifying the most suitable pretraining pipeline and backbone architecture for each task, out of those considered, our results suggest: that self-supervised pretraining generally produces more suitable backbones for GIE vision tasks than supervised pretraining; that self-supervised pretraining with ImageNet-1k is typically more suitable than pretraining with Hyperkvasir-unlabelled, with the notable exception of monocular depth estimation in colonoscopy; and that ViT-Bs are more suitable in polyp segmentation and monocular depth estimation in colonoscopy, ResNet50s are more suitable in polyp detection, and both architectures perform similarly in anatomical landmark recognition and pathological finding characterisation. We hope this work draws attention to the complexity of pretraining for GIE vision tasks, informs this development of more suitable approaches than the convention, and inspires further research on this topic to help advance this development. Code available: \underline{github.com/ESandML/SSL4GIE}

architecture, backbone, imagenet-1k, (13 more...)

arXiv.org Artificial Intelligence

2401.06278

Country:

Europe > United Kingdom > England > Lancashire > Preston (0.14)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Gastroenterology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Spatial Latent Dirichlet Allocation

Neural Information Processing SystemsApr-6-2023, 14:52:53 GMT

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a bag-of-words''. It is also critical to properly designwords'' and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents.

latent dirichlet allocation, spatial latent dirichlet allocation, visual word, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Spatial Latent Dirichlet Allocation

Wang, Xiaogang, Grimson, Eric

Neural Information Processing SystemsFeb-15-2020, 06:11:10 GMT

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a bag-of-words''. It is also critical to properly design words'' and "documents" when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents.

latent dirichlet allocation, spatial latent dirichlet allocation, visual word, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Applying deep learning to real-world problems – merantix – Medium

#artificialintelligenceMay-23-2017, 15:45:42 GMT

It easier than ever before to train a neural network. However, it is rarely the case that you can just take code from a tutorial and directly make it work for your application. Interestingly, many of the most important tweaks are barely discussed in the academic literature but at the same time critical to make your product work. Therefore I thought it would be helpful for other people who plan to use deep learning in their business to understand some of these tweaks and tricks. This post is based on my talk I gave on May 10 at the Berlin.AI meetup (the slides are here).

artificial intelligence, deep learning, machine learning, (10 more...)

#artificialintelligence

Genre: Instructional Material (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

The science of first sight: Researchers reveal how a baby's brain learns to see - and it could restore sight for people with vision problems

Daily Mail - Science & techJan-9-2017, 22:15:05 GMT

When a newborn first opens its eyes, it sees the world around it as blurry shapes. But a few months later, its vision starts to focus and it will start to recognize people and objects. Researchers at UNC's School of Medicine have found out more about how the brains of baby mammal's develop as they refine their sense of sight, and the research may also help restore sight for people with vision problems. When a newborn first opens its eyes, it sees the world around it as blurry shapes. The research, which was conducted on mice and published in the journal Nature Neuroscience, is part of a wider project that aims to maps the areas of the brain that play key roles in vision processing.

artificial intelligence, circuitry, vision problem, (14 more...)

Daily Mail - Science & tech

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

Inverse Graphics with Probabilistic CAD Models

Kulkarni, Tejas D., Mansinghka, Vikash K., Kohli, Pushmeet, Tenenbaum, Joshua B.

arXiv.org Machine LearningJul-4-2014

Recently, multiple formulations of vision problems as probabilistic inversions of generative models based on computer graphics have been proposed. However, applications to 3D perception from natural images have focused on low-dimensional latent scenes, due to challenges in both modeling and inference. Accounting for the enormous variability in 3D object shape and 2D appearance via realistic generative models seems intractable, as does inverting even simple versions of the many-to-many computations that link 3D scenes to 2D images. This paper proposes and evaluates an approach that addresses key aspects of both these challenges. We show that it is possible to solve challenging, real-world 3D vision problems by approximate inference in generative models for images based on rendering the outputs of probabilistic CAD (PCAD) programs. Our PCAD object geometry priors generate deformable 3D meshes corresponding to plausible objects and apply affine transformations to place them in a scene. Image likelihoods are based on similarity in a feature space based on standard mid-level image representations from the vision literature. Our inference algorithm integrates single-site and locally blocked Metropolis-Hastings proposals, Hamiltonian Monte Carlo and discriminative data-driven proposals learned from training data generated from our models. We apply this approach to 3D human pose estimation and object shape reconstruction from single images, achieving quantitative and qualitative performance improvements over state-of-the-art baselines.

artificial intelligence, inference, machine learning, (17 more...)

arXiv.org Machine Learning

1407.1339

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.89)

Add feedback

Spatial Latent Dirichlet Allocation

Wang, Xiaogang, Grimson, Eric

Neural Information Processing SystemsDec-31-2008

In recent years, the language model Latent Dirichlet Allocation (LDA), which clusters co-occurring words into topics, has been widely appled in the computer vision field. However, many of these applications have difficulty with modeling the spatial and temporal structure among visual words, since LDA assumes that a document is a ``bag-of-words''. It is also critical to properly design ``words'' and “documents” when using a language model to solve vision problems. In this paper, we propose a topic model Spatial Latent Dirichlet Allocation (SLDA), which better encodes spatial structure among visual words that are essential for solving many vision problems. The spatial information is not encoded in the value of visual words but in the design of documents. Instead of knowing the partition of words into documents \textit{a priori}, the word-document assignment becomes a random hidden variable in SLDA. There is a generative procedure, where knowledge of spatial structure can be flexibly added as a prior, grouping visual words which are close in space into the same document. We use SLDA to discover objects from a collection of images, and show it achieves better performance than LDA.

artificial intelligence, natural language, visual word, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Learning to Estimate Scenes from Images

Freeman, William T., Pasztor, Egon C.

Neural Information Processing SystemsDec-31-1999

We seek the scene interpretation that best explains image data. For example, we may want to infer the projected velocities (scene) which best explain two consecutive image frames (image).

Add feedback

Learning in Computer Vision and Image Understanding

Greenspan, Hayit

Neural Information Processing SystemsDec-31-1994

There is an increasing interest in the area of Learning in Computer Vision and Image Understanding, both from researchers in the learning community and from researchers involved with the computer vision world. The field is characterized by a shift away from the classical, purely model-based, computer vision techniques, towards data-driven learning paradigms for solving real-world vision problems. Using learning in segmentation or recognition tasks has several advantages over classical model-based techniques. These include adaptivity to noise and changing environments, as well as in many cases, a simplified system generation procedure. Yet, learning from examples introduces a new challenge - getting a representative data set of examples from which to learn.

computer vision, recognition, unsupervised learning, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > North Carolina (0.05)
North America > United States > California > Los Angeles County > Pasadena (0.05)
North America > Canada > Ontario > Toronto (0.05)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.71)

Add feedback

Learning in Computer Vision and Image Understanding

Greenspan, Hayit

Neural Information Processing SystemsDec-31-1994

There is an increasing interest in the area of Learning in Computer Vision and Image Understanding, both from researchers in the learning community and from researchers involved with the computer vision world. The field is characterized by a shift away from the classical, purely model-based, computer vision techniques, towards data-driven learning paradigms for solving real-world vision problems. Using learning in segmentation or recognition tasks has several advantages over classical model-based techniques. These include adaptivity to noise and changing environments, as well as in many cases, a simplified system generation procedure. Yet, learning from examples introduces a new challenge - getting a representative data set of examples from which to learn.

computer vision, recognition, unsupervised learning, (13 more...)

Neural Information Processing Systems

Country: