Differentiable Hierarchical Visual Tokenization

Jun-18-2026, 23:20:24 GMT–Neural Information Processing Systems

Vision Transformers rely on fixed patch tokens that ignore the spatial and semantic structure of images. In this work, we introduce an end-to-end differentiable tokenizer that adapts to image content with pixel-level granularity while remaining backward-compatible with existing architectures for retrofitting pretrained models. Our method uses hierarchical model selection with information criteria to provide competitive performance in both image-level classification and dense-prediction tasks, and even supports out-of-the-box raster-to-vector conversion.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Jun-18-2026, 23:20:24 GMT

Conferences PDF

Add feedback

Country:
- Europe > Norway (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology
  - Sensing and Signal Processing > Image Processing (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Representation & Reasoning (1.00)
    - Machine Learning > Neural Networks (1.00)
    - Natural Language > Machine Translation (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found