AITopics | patch selection

Collaborating Authors

patch selection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

EvoPS: Evolutionary Patch Selection for Whole Slide Image Analysis in Computational Pathology

Hashemian, Saya, Bidgoli, Azam Asilian

arXiv.org Artificial IntelligenceNov-12-2025

In computational pathology, the gigapixel scale of Whole-Slide Images (WSIs) necessitates their division into thousands of smaller patches. Analyzing these high-dimensional patch embeddings is computationally expensive and risks diluting key diagnostic signals with many uninformative patches. Existing patch selection methods often rely on random sampling or simple clustering heuristics and typically fail to explicitly manage the crucial trade-off between the number of selected patches and the accuracy of the resulting slide representation. To address this gap, we propose EvoPS (Evolutionary Patch Selection), a novel framework that formulates patch selection as a multi-objective optimization problem and leverages an evolutionary search to simultaneously minimize the number of selected patch embeddings and maximize the performance of a downstream similarity search task, generating a Pareto front of optimal trade-off solutions. We validated our framework across four major cancer cohorts from The Cancer Genome Atlas (TCGA) using five pretrained deep learning models to generate patch embeddings, including both supervised CNNs and large self-supervised foundation models. The results demonstrate that EvoPS can reduce the required number of training patch embeddings by over 90% while consistently maintaining or even improving the final classification F1-score compared to a baseline that uses all available patches' embeddings selected through a standard extraction pipeline. The EvoPS framework provides a robust and principled method for creating efficient, accurate, and interpretable WSI representations, empowering users to select an optimal balance between computational cost and diagnostic performance.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.0756

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology > Carcinoma (0.47)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments

Huang, Jin, Jin, Yuchao, An, Le, Park, Josh

arXiv.org Artificial IntelligenceNov-4-2025

This paper introduces an efficient Vision-Language Model (VLM) pipeline specifically optimized for deployment on embedded devices, such as those used in robotics and autonomous driving. The pipeline significantly reduces the computational overhead by jointly leveraging patch selection to filter irrelevant camera views, a token selection module to reduce input sequence length for the LLM, and speculative decoding to accelerate token generation. Evaluation on the NVIDIA DRIVE Thor platform for automonous driving application, our pipeline achieves $2.5\times$ end-to-end latency reduction without compromising task accuracy. The speed-up further increases to $3.2\times$ when applying FP8 post-training quantization. These results demonstrate our pipeline as a viable solution for enabling real-time VLM deployment in resource-constrained environments.

large language model, latency, natural language, (15 more...)

arXiv.org Artificial Intelligence

2506.07416

Country: Europe (0.28)

Genre: Research Report (0.70)

Industry:

Transportation > Ground > Road (0.36)
Information Technology > Robotics & Automation (0.36)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.55)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.36)

Add feedback

When marine radar target detection meets pretrained large language models

Hu, Qiying, Zhang, Linping, Wang, Xueqian, Li, Gang, Liu, Yu, Zhang, Xiao-Ping

arXiv.org Artificial IntelligenceSep-16-2025

Deep learning (DL) methods are widely used to extract high-dimensional patterns from the sequence features of radar echo signals. However, conventional DL algorithms face challenges such as redundant feature segments, and constraints from restricted model sizes. To address these issues, we propose a framework that integrates feature preprocessing with large language models (LLMs). Our preprocessing module tokenizes radar sequence features, applies a patch selection algorithm to filter out uninformative segments, and projects the selected patches into embeddings compatible with the feature space of pre-trained LLMs. Leveraging these refined embeddings, we incorporate a pre-trained LLM, fine-tuning only the normalization layers to reduce training burdens while enhancing performance. Experiments on measured datasets demonstrate that the proposed method significantly outperforms the state-of-the-art baselines on supervised learning tests.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.1211

Country: Asia > China (0.30)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PATHS: A Hierarchical Transformer for Efficient Whole Slide Image Analysis

Buzzard, Zak, Hemker, Konstantin, Simidjievski, Nikola, Jamnik, Mateja

arXiv.org Artificial IntelligenceNov-27-2024

Computational analysis of whole slide images (WSIs) has seen significant research progress in recent years, with applications ranging across important diagnostic and prognostic tasks such as survival or cancer subtype prediction. Many state-of-the-art models process the entire slide - which may be as large as $150,000 \times 150,000$ pixels - as a bag of many patches, the size of which necessitates computationally cheap feature aggregation methods. However, a large proportion of these patches are uninformative, such as those containing only healthy or adipose tissue, adding significant noise and size to the bag. We propose Pathology Transformer with Hierarchical Selection (PATHS), a novel top-down method for hierarchical weakly supervised representation learning on slide-level tasks in computational pathology. PATHS is inspired by the cross-magnification manner in which a human pathologist examines a slide, recursively filtering patches at each magnification level to a small subset relevant to the diagnosis. Our method overcomes the complications of processing the entire slide, enabling quadratic self-attention and providing a simple interpretable measure of region importance. We apply PATHS to five datasets of The Cancer Genome Atlas (TCGA), and achieve superior performance on slide-level prediction tasks when compared to previous methods, despite processing only a small proportion of the slide.

dataset, magnification, magnification level, (14 more...)

arXiv.org Artificial Intelligence

2411.18225

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (0.87)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data-Driven Pixel Control: Challenges and Prospects

Farkya, Saurabh, Daniels, Zachary Alan, Raghavan, Aswin, van der Wal, Gooitzen, Isnardi, Michael, Piacentino, Michael, Zhang, David

arXiv.org Artificial IntelligenceAug-8-2024

Recent advancements in sensors have led to high resolution and high data throughput at the pixel level. Simultaneously, the adoption of increasingly large (deep) neural networks (NNs) has lead to significant progress in computer vision. Currently, visual intelligence comes at increasingly high computational complexity, energy, and latency. We study a data-driven system that combines dynamic sensing at the pixel level with computer vision analytics at the video level and propose a feedback control loop to minimize data movement between the sensor front-end and computational back-end without compromising detection and tracking precision. Our contributions are threefold: (1) We introduce anticipatory attention and show that it leads to high precision prediction with sparse activation of pixels; (2) Leveraging the feedback control, we show that the dimensionality of learned feature vectors can be significantly reduced with increased sparsity; and (3) We emulate analog design choices (such as varying RGB or Bayer pixel format and analog noise) and study their impact on the key metrics of the data-driven system. Comparative analysis with traditional pixel and deep learning models shows significant performance enhancements. Our system achieves a 10X reduction in bandwidth and a 15-30X improvement in Energy-Delay Product (EDP) when activating only 30% of pixels, with a minor reduction in object detection and tracking precision. Based on analog emulation, our system can achieve a throughput of 205 megapixels/sec (MP/s) with a power consumption of only 110 mW per MP, i.e., a theoretical improvement of ~30X in EDP.

detection, patch selection, pixel, (14 more...)

arXiv.org Artificial Intelligence

2408.04767

Country: North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (0.40)

Industry:

Government > Military (0.46)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Attention-aware Semantic Communications for Collaborative Inference

Im, Jiwoong, Kwon, Nayoung, Park, Taewoo, Woo, Jiheon, Lee, Jaeho, Kim, Yongjune

arXiv.org Artificial IntelligenceMay-31-2024

We propose a communication-efficient collaborative inference framework in the domain of edge inference, focusing on the efficient use of vision transformer (ViT) models. The partitioning strategy of conventional collaborative inference fails to reduce communication cost because of the inherent architecture of ViTs maintaining consistent layer dimensions across the entire transformer encoder. Therefore, instead of employing the partitioning strategy, our framework utilizes a lightweight ViT model on the edge device, with the server deploying a complicated ViT model. To enhance communication efficiency and achieve the classification accuracy of the server model, we propose two strategies: 1) attention-aware patch selection and 2) entropy-aware image transmission. Attention-aware patch selection leverages the attention scores generated by the edge device's transformer encoder to identify and select the image patches critical for classification. This strategy enables the edge device to transmit only the essential patches to the server, significantly improving communication efficiency. Entropy-aware image transmission uses min-entropy as a metric to accurately determine whether to depend on the lightweight model on the edge device or to request the inference from the server model. In our framework, the lightweight ViT model on the edge device acts as a semantic encoder, efficiently identifying and selecting the crucial image information required for the classification task. Our experiments demonstrate that the proposed collaborative inference framework can reduce communication overhead by 68% with only a minimal loss in accuracy compared to the server model on the ImageNet dataset.

attention score, edge device, inference, (14 more...)

arXiv.org Artificial Intelligence

2404.07217

Country:

Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
North America > United States > California > Marin County > San Rafael (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SPLICE -- Streamlining Digital Pathology Image Processing

Alsaafin, Areej, Nejat, Peyman, Shafique, Abubakr, Khan, Jibran, Alfasly, Saghir, Alabtah, Ghazal, Tizhoosh, H. R.

arXiv.org Artificial IntelligenceApr-26-2024

Digital pathology and the integration of artificial intelligence (AI) models have revolutionized histopathology, opening new opportunities. With the increasing availability of Whole Slide Images (WSIs), there's a growing demand for efficient retrieval, processing, and analysis of relevant images from vast biomedical archives. However, processing WSIs presents challenges due to their large size and content complexity. Full computer digestion of WSIs is impractical, and processing all patches individually is prohibitively expensive. In this paper, we propose an unsupervised patching algorithm, Sequential Patching Lattice for Image Classification and Enquiry (SPLICE). This novel approach condenses a histopathology WSI into a compact set of representative patches, forming a "collage" of WSI while minimizing redundancy. SPLICE prioritizes patch quality and uniqueness by sequentially analyzing a WSI and selecting non-redundant representative features. We evaluated SPLICE for search and match applications, demonstrating improved accuracy, reduced computation time, and storage requirements compared to existing state-of-the-art methods. As an unsupervised method, SPLICE effectively reduces storage requirements for representing tissue images by 50%. This reduction enables numerous algorithms in computational pathology to operate much more efficiently, paving the way for accelerated adoption of digital pathology.

dataset, splice, yottixel, (16 more...)

arXiv.org Artificial Intelligence

2404.17704

Country: North America > United States > Minnesota > Olmsted County > Rochester (0.04)

Genre:

Research Report > Promising Solution (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Differentiable Patch Selection for Image Recognition

Cordonnier, Jean-Baptiste, Mahendran, Aravindh, Dosovitskiy, Alexey, Weissenborn, Dirk, Uszkoreit, Jakob, Unterthiner, Thomas

arXiv.org Artificial IntelligenceApr-7-2021

Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained endto-end Figure 1: Examples of large images where patch extraction using backpropagation. We show results for traffic allows (top-left) to focus on details for fine-grained recognition, sign recognition, inter-patch relationship reasoning, and (bottom-left) to reason across patches, and (right) to fine-grained recognition without using object/part bounding efficiently capture very localized information.

dataset, recognition, top-k, (17 more...)

arXiv.org Artificial Intelligence

2104.03059

Country:

North America > United States > California (0.04)
Europe > Switzerland (0.04)

Genre: Research Report (0.64)

Industry:

Transportation (0.46)
Information Technology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition > Image Matching (0.41)

Add feedback

RPATTACK: Refined Patch Attack on General Object Detectors

Huang, Hao, Wang, Yongtao, Chen, Zhaoyu, Tang, Zhi, Zhang, Wenqiang, Ma, Kai-Kuang

arXiv.org Artificial IntelligenceMar-23-2021

Nowadays, general object detectors like YOLO and Faster R-CNN as well as their variants are widely exploited in many applications. Many works have revealed that these detectors are extremely vulnerable to adversarial patch attacks. The perturbed regions generated by previous patch-based attack works on object detectors are very large which are not necessary for attacking and perceptible for human eyes. To generate much less but more efficient perturbation, we propose a novel patch-based method for attacking general object detectors. Firstly, we propose a patch selection and refining scheme to find the pixels which have the greatest importance for attack and remove the inconsequential perturbations gradually. Then, for a stable ensemble attack, we balance the gradients of detectors to avoid over-optimizing one of them during the training phase. Our RPAttack can achieve an amazing missed detection rate of 100% for both Yolo v4 and Faster R-CNN while only modifies 0.32% pixels on VOC 2007 test set. Our code is available at https://github.com/VDIGPKU/RPAttack.

detector, perturbation, rpattack, (14 more...)

arXiv.org Artificial Intelligence

2103.12469

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(7 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback