AITopics | image transformer

Collaborating Authors

image transformer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Simple, Distributed, and Accelerated Probabilistic Programming

Neural Information Processing SystemsMar-16-2026, 18:59:05 GMT

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.42)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Dustin Tran, Matthew W. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul

Neural Information Processing SystemsFeb-12-2026, 09:52:39 GMT

Neural Information Processing Systems http://nips.cc/

computation, inference, international conference, (12 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Neural Information Processing SystemsNov-20-2025, 21:53:15 GMT

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem.

accelerated probabilistic programming, name change, simple, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.42)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Dustin Tran, Matthew W. Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul

Neural Information Processing SystemsNov-20-2025, 14:58:59 GMT

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem.

artificial intelligence, international conference, machine learning, (15 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Neural Information Processing SystemsOct-8-2024, 14:57:20 GMT

Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.

accelerated probabilistic programming, image transformer, speedup, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

LiT-4-RSVQA: Lightweight Transformer-based Visual Question Answering in Remote Sensing

Hackel, Leonard, Clasen, Kai Norman, Ravanbakhsh, Mahdyar, Demir, Begüm

arXiv.org Artificial IntelligenceJun-2-2023

Visual question answering (VQA) methods in remote sensing (RS) aim to answer natural language questions with respect to an RS image. Most of the existing methods require a large amount of computational resources, which limits their application in operational scenarios in RS. To address this issue, in this paper we present an effective lightweight transformer-based VQA in RS (LiT-4-RSVQA) architecture for efficient and accurate VQA in RS. Our architecture consists of: i) a lightweight text encoder module; ii) a lightweight image encoder module; iii) a fusion module; and iv) a classification module. The experimental results obtained on a VQA benchmark dataset demonstrate that our proposed LiT-4-RSVQA architecture provides accurate VQA results while significantly reducing the computational requirements on the executing hardware. Our code is publicly available at https://git.tu-berlin.de/rsim/lit4rsvqa.

module, opération, transformer, (16 more...)

arXiv.org Artificial Intelligence

2306.00758

Country: Europe > Germany > Berlin (0.05)

Genre: Research Report (0.64)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Detecting Severity of Diabetic Retinopathy from Fundus Images using Ensembled Transformers

Adak, Chandranath, Karkera, Tejas, Chattopadhyay, Soumi, Saqib, Muhammad

arXiv.org Artificial IntelligenceJan-3-2023

Diabetic Retinopathy (DR) is considered one of the primary concerns due to its effect on vision loss among most people with diabetes globally. The severity of DR is mostly comprehended manually by ophthalmologists from fundus photography-based retina images. This paper deals with an automated understanding of the severity stages of DR. In the literature, researchers have focused on this automation using traditional machine learning-based algorithms and convolutional architectures. However, the past works hardly focused on essential parts of the retinal image to improve the model performance. In this paper, we adopt transformer-based learning models to capture the crucial features of retinal images to understand DR severity better. We work with ensembling image transformers, where we adopt four models, namely ViT (Vision Transformer), BEiT (Bidirectional Encoder representation for image Transformer), CaiT (Class-Attention in Image Transformers), and DeiT (Data efficient image Transformers), to infer the degree of DR severity from fundus photographs. For experiments, we used the publicly available APTOS-2019 blindness detection dataset, where the performances of the transformer-based models were quite encouraging.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2301.00973

Country:

Oceania > Australia (0.04)
Asia > India > Maharashtra > Mumbai (0.04)
Asia > India > Bihar > Patna (0.04)
Asia > India > Assam > Guwahati (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models

Li, Minghao, Lv, Tengchao, Chen, Jingye, Cui, Lei, Lu, Yijuan, Florencio, Dinei, Zhang, Cha, Li, Zhoujun, Wei, Furu

arXiv.org Artificial IntelligenceSep-6-2022

Text recognition is a long-standing research problem for document digitalization. Existing approaches are usually built based on CNN for image understanding and RNN for char-level text generation. In addition, another language model is usually needed to improve the overall accuracy as a post-processing step. In this paper, we propose an end-to-end text recognition approach with pre-trained image Transformer and text Transformer models, namely TrOCR, which leverages the Transformer architecture for both image understanding and wordpiece-level text generation. The TrOCR model is simple but effective, and can be pre-trained with large-scale synthetic data and fine-tuned with human-labeled datasets. Experiments show that the TrOCR model outperforms the current state-of-the-art models on the printed, handwritten and scene text recognition tasks. The TrOCR models and code are publicly available at \url{https://aka.ms/trocr}.

decoder, recognition, text recognition, (13 more...)

arXiv.org Artificial Intelligence

2109.10282

Country: Asia (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Tran, Dustin, Hoffman, Matthew W., Moore, Dave, Suter, Christopher, Vasudevan, Srinivas, Radul, Alexey

Neural Information Processing SystemsFeb-14-2020, 19:56:19 GMT

accelerated probabilistic programming, image transformer, speedup, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Simple, Distributed, and Accelerated Probabilistic Programming

Tran, Dustin, Hoffman, Matthew W., Moore, Dave, Suter, Christopher, Vasudevan, Srinivas, Radul, Alexey

Neural Information Processing SystemsDec-31-2018

We describe a simple, low-level approach for embedding probabilistic programming in a deep learning ecosystem. In particular, we distill probabilistic programming down to a single abstraction—the random variable. Our lightweight implementation in TensorFlow enables numerous applications: a model-parallel variational auto-encoder (VAE) with 2nd-generation tensor processing units (TPUv2s); a data-parallel autoregressive model (Image Transformer) with TPUv2s; and multi-GPU No-U-Turn Sampler (NUTS). For both a state-of-the-art VAE on 64x64 ImageNet and Image Transformer on 256x256 CelebA-HQ, our approach achieves an optimal linear speedup from 1 to 256 TPUv2 chips. With NUTS, we see a 100x speedup on GPUs over Stan and 37x over PyMC3.

artificial intelligence, international conference, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback