AITopics | tokenlearner

Collaborating Authors

tokenlearner

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

6a30e32e56fce5cf381895dfe6ca7b6f-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 05:02:23 GMT

computer vision, dataset, tokenlearner, (11 more...)

Neural Information Processing Systems

Country: North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

TokenLearner: Adaptive Space-Time Tokenization for Videos

Neural Information Processing SystemsDec-24-2025, 06:13:03 GMT

In this paper, we introduce a novel visual representation learning which relies on a handful of adaptively learned tokens, and which is applicable to both image and video understanding tasks. Instead of relying on hand-designed splitting strategies to obtain visual tokens and processing a large number of densely sampled patches for attention, our approach learns to mine important tokens in visual data. This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in image frames. Our experiments demonstrate strong performance on several challenging benchmarks for video recognition tasks. Importantly, due to our tokens being adaptive, we accomplish competitive results at significantly reduced computational cost. We establish new state-of-the-arts on multiple video datasets, including Kinetics-400, Kinetics-600, Charades, and AViD.

adaptive space-time tokenization, name change, tokenlearner, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.41)

Add feedback

TokenLearner: Adaptive Space-Time T okenization for Videos - Supplementary Materials - Michael S. Ryoo

Neural Information Processing SystemsAug-14-2025, 23:53:24 GMT

We train the Kinetics model for 30 epochs with the base learning rate of 0.05 with the Momentum Basically, all the settings in our Kinetics experiments follow the setting of ViViT. We provide the training details as below. We use the cosine-decay learning rate which was popularly used in many video CNN model trainings. The base learning rate of 0.8 per TPU core (which is equivalent to a single GPU) is used for the Charades dataset (multi-label action classification) and the base rate of 0.025 per TPU was used for A ViD. Label smoothing of 0.2 was used for the A ViD training. In Charades, the training was done by temporally cropping a long Charades videos (e.g., In all these experiments, ViT L/16 model was used.

experiment, tokenlearner, transformer, (13 more...)

Neural Information Processing Systems

Country: North America > United States > New York > Suffolk County > Stony Brook (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

TokenLearner: Adaptive Space-Time Tokenization for Videos Michael S. Ryoo

Neural Information Processing SystemsAug-14-2025, 23:53:20 GMT

This results in efficiently and effectively finding a few important visual tokens and enables modeling of pairwise attention between such tokens, over a longer temporal horizon for videos, or the spatial content in image frames.

computer vision, dataset, tokenlearner, (11 more...)

Neural Information Processing Systems

Country: North America > United States > New York > Suffolk County > Stony Brook (0.04)

Genre: Research Report (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.68)

Add feedback

TokenLearner: Adaptive Space-Time Tokenization for Videos

Neural Information Processing SystemsOct-10-2024, 23:15:44 GMT

adaptive space-time tokenization, tokenlearner, video

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.45)

Add feedback

DualStreamFoveaNet: A Dual Stream Fusion Architecture with Anatomical Awareness for Robust Fovea Localization

Song, Sifan, Wang, Jinfeng, Wang, Zilong, Su, Jionglong, Ding, Xiaowei, Dang, Kang

arXiv.org Artificial IntelligenceDec-26-2023

Accurate fovea localization is essential for analyzing retinal diseases to prevent irreversible vision loss. While current deep learning-based methods outperform traditional ones, they still face challenges such as the lack of local anatomical landmarks around the fovea, the inability to robustly handle diseased retinal images, and the variations in image conditions. In this paper, we propose a novel transformer-based architecture called DualStreamFoveaNet (DSFN) for multi-cue fusion. This architecture explicitly incorporates long-range connections and global features using retina and vessel distributions for robust fovea localization. We introduce a spatial attention mechanism in the dual-stream encoder to extract and fuse self-learned anatomical information, focusing more on features distributed along blood vessels and significantly reducing computational costs by decreasing token numbers. Our extensive experiments show that the proposed architecture achieves state-of-the-art performance on two public datasets and one large-scale private dataset. Furthermore, we demonstrate that the DSFN is more robust on both normal and diseased retina images and has better generalization capacity in cross-dataset experiments.

dataset, fovea localization, fundus image, (14 more...)

arXiv.org Artificial Intelligence

2302.06961

Country:

Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback