counting
- Oceania > Australia > Western Australia > Perth (0.04)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
GD: Multi-Modal Open-World Counting
GD is comparable to or outperforms all previous text-only works, and when using both text and visual exemplars, we outperform all previous models; third, we carry out a preliminary study into different interactions between the text and visual exemplar prompts, including the cases where they reinforce each other and where one restricts the other.
- Health & Medicine (0.46)
- Information Technology (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)
CountFormer: A Transformer Framework for Learning Visual Repetition and Structure in Class-Agnostic Object Counting
Hossain, Md Tanvir, Islam, Akif, Ameen, Mohd Ruhul
Humans can effortlessly count diverse objects by perceiving visual repetition and structural relationships rather than relying on class identity. However, most existing counting models fail to replicate this ability; they often miscount when objects exhibit complex shapes, internal symmetry, or overlapping components. In this work, we introduce CountFormer, a transformer-based framework that learns to recognize repetition and structural coherence for class-agnostic object counting. Built upon the CounTR architecture, our model replaces its visual encoder with the self-supervised foundation model DINOv2, which produces richer and spatially consistent feature representations. We further incorporate positional embedding fusion to preserve geometric relationships before decoding these features into density maps through a lightweight convolutional decoder. Evaluated on the FSC-147 dataset, our model achieves performance comparable to current state-of-the-art methods while demonstrating superior accuracy on structurally intricate or densely packed scenes. Our findings indicate that integrating foundation models such as DINOv2 enables counting systems to approach human-like structural perception, advancing toward a truly general and exemplar-free counting paradigm.
- Asia > Bangladesh (0.04)
- Oceania > Australia > Western Australia > Perth (0.04)
- North America > United States > West Virginia (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Research Report > Promising Solution (0.34)
- Research Report > New Finding (0.34)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Oceania > Australia > Western Australia > Perth (0.04)
- Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
GD: Multi-Modal Open-World Counting
GD is comparable to or outperforms all previous text-only works, and when using both text and visual exemplars, we outperform all previous models; third, we carry out a preliminary study into different interactions between the text and visual exemplar prompts, including the cases where they reinforce each other and where one restricts the other.
- Health & Medicine (0.46)
- Information Technology (0.46)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
- Information Technology > Artificial Intelligence > Vision > Image Understanding (0.46)
How bad is California's housing shortage? It depends on who's doing the counting
Things to Do in L.A. Tap to enable a layout that focuses on the article. How bad is California's housing shortage? It depends on who's doing the counting This is read by an automated voice. Please report any issues or inconsistencies here . Imagine you've finally taken your car to the mechanic to investigate that mysterious warning light that's been flashing on your dashboard for the past week and a half.
- North America > United States > California > San Francisco County > San Francisco (0.05)
- North America > United States > California > Los Angeles County > Los Angeles > Hollywood Hills (0.04)
- North America > United States > California > Los Angeles County > Beverly Hills (0.04)
- (3 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)
- Banking & Finance > Real Estate (1.00)
TasselNetV4: A vision foundation model for cross-scene, cross-scale, and cross-species plant counting
Hu, Xiaonan, Li, Xuebing, Xu, Jinyu, Adan, Abdulkadir Duran, Zhou, Letian, Zhu, Xuhui, Li, Yanan, Guo, Wei, Liu, Shouyang, Liu, Wenzhong, Lu, Hao
Accurate plant counting provides valuable information for agriculture such as crop yield prediction, plant density assessment, and phenotype quantification. Vision-based approaches are currently the mainstream solution. Prior art typically uses a detection or a regression model to count a specific plant. However, plants have biodiversity, and new cultivars are increasingly bred each year. It is almost impossible to exhaust and build all species-dependent counting models. Inspired by class-agnostic counting (CAC) in computer vision, we argue that it is time to rethink the problem formulation of plant counting, from what plants to count to how to count plants. In contrast to most daily objects with spatial and temporal invariance, plants are dynamic, changing with time and space. Their non-rigid structure often leads to worse performance than counting rigid instances like heads and cars such that current CAC and open-world detection models are suboptimal to count plants. In this work, we inherit the vein of the TasselNet plant counting model and introduce a new extension, TasselNetV4, shifting from species-specific counting to cross-species counting. TasselNetV4 marries the local counting idea of TasselNet with the extract-and-match paradigm in CAC. It builds upon a plain vision transformer and incorporates novel multi-branch box-aware local counters used to enhance cross-scale robustness. Two challenging datasets, PAC-105 and PAC-Somalia, are harvested. Extensive experiments against state-of-the-art CAC models show that TasselNetV4 achieves not only superior counting performance but also high efficiency.Our results indicate that TasselNetV4 emerges to be a vision foundation model for cross-scene, cross-scale, and cross-species plant counting.
- Africa > Middle East > Somalia (0.26)
- Asia > China > Hubei Province > Wuhan (0.04)
- Asia > Middle East > Jordan (0.04)
- (7 more...)
Count2Density: Crowd Density Estimation without Location-level Annotations
Litrico, Mattia, Chen, Feng, Pound, Michael, Tsaftaris, Sotirios A, Battiato, Sebastiano, Giuffrida, Mario Valerio
Crowd density estimation is a well-known computer vision task aimed at estimating the density distribution of people in an image. The main challenge in this domain is the reliance on fine-grained location-level annotations, (i.e. points placed on top of each individual) to train deep networks. Collecting such detailed annotations is both tedious, time-consuming, and poses a significant barrier to scalability for real-world applications. To alleviate this burden, we present Count2Density: a novel pipeline designed to predict meaningful density maps containing quantitative spatial information using only count-level annotations (i.e., total number of people) during training. To achieve this, Count2Density generates pseudo-density maps leveraging past predictions stored in a Historical Map Bank, thereby reducing confirmation bias. This bank is initialised using an unsupervised saliency estimator to provide an initial spatial prior and is iteratively updated with an EMA of predicted density maps. These pseudo-density maps are obtained by sampling locations from estimated crowd areas using a hypergeometric distribution, with the number of samplings determined by the count-level annotations. To further enhance the spatial awareness of the model, we add a self-supervised contrastive spatial regulariser to encourage similar feature representations within crowded regions while maximising dissimilarity with background regions. Experimental results demonstrate that our approach significantly outperforms cross-domain adaptation methods and achieves better results than recent state-of-the-art approaches in semi-supervised settings across several datasets. Additional analyses validate the effectiveness of each individual component of our pipeline, confirming the ability of Count2Density to effectively retrieve spatial information from count-level annotations and enabling accurate subregion counting.
- Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- (3 more...)
- Research Report > New Finding (0.34)
- Research Report > Promising Solution (0.34)
Weighted First Order Model Counting for Two-variable Logic with Axioms on Two Relations
Kuang, Qipeng, Kůla, Václav, Kuželka, Ondřej, Wang, Yuanhong, Wang, Yuyi
The Weighted First-Order Model Counting Problem (WFOMC) asks to compute the weighted sum of models of a given first-order logic sentence over a given domain. The boundary between fragments for which WFOMC can be computed in polynomial time relative to the domain size lies between the two-variable fragment ($\text{FO}^2$) and the three-variable fragment ($\text{FO}^3$). It is known that WFOMC for \FOthree{} is $\mathsf{\#P_1}$-hard while polynomial-time algorithms exist for computing WFOMC for $\text{FO}^2$ and $\text{C}^2$, possibly extended by certain axioms such as the linear order axiom, the acyclicity axiom, and the connectedness axiom. All existing research has concentrated on extending the fragment with axioms on a single distinguished relation, leaving a gap in understanding the complexity boundary of axioms on multiple relations. In this study, we explore the extension of the two-variable fragment by axioms on two relations, presenting both negative and positive results. We show that WFOMC for $\text{FO}^2$ with two linear order relations and $\text{FO}^2$ with two acyclic relations are $\mathsf{\#P_1}$-hard. Conversely, we provide an algorithm in time polynomial in the domain size for WFOMC of $\text{C}^2$ with a linear order relation, its successor relation and another successor relation.
- Europe > Czechia > Prague (0.04)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
CAPTURe: Evaluating Spatial Reasoning in Vision Language Models via Occluded Object Counting
Pothiraj, Atin, Stengel-Eskin, Elias, Cho, Jaemin, Bansal, Mohit
Recognizing and reasoning about occluded (partially or fully hidden) objects is vital to understanding visual scenes, as occlusions frequently occur in real-world environments and act as obstacles for spatial comprehension. To test models' ability to reason about multiple occluded objects, we introduce a novel task, Counting Amodally for Patterns Through Unseen REgions (CAPTURe), which requires a model to count objects arranged in a pattern by inferring how the pattern continues behind an occluder (an object which blocks parts of the scene). CAPTURe requires both recognizing visual patterns and reasoning, making it a useful testbed for evaluating vision-language models (VLMs) on whether they understand occluded patterns and possess spatial understanding skills. By requiring models to reason about occluded objects, CAPTURe also tests VLMs' ability to form world models that would allow them to fill in missing information. CAPTURe consists of two parts: (1) CAPTURe-real, with manually filtered images of real objects in patterns and (2) CAPTURe-synthetic, a controlled diagnostic with generated patterned images. We evaluate four strong VLMs (GPT-4o, Intern-VL2, Molmo, and Qwen2-VL) on CAPTURe, finding that models struggle to count on both occluded and unoccluded patterns. Crucially, we find that models perform worse with occlusion, suggesting that VLMs are also deficient in inferring unseen spatial relationships: even the strongest VLMs like GPT-4o fail to count with occlusion. In contrast, we find that humans achieve very little error on CAPTURe. We also find that providing auxiliary information of occluded object locations increases performance, underscoring that the model error comes both from an inability to handle occlusion as well as difficulty in counting in images. Code and data: https://github.com/atinpothiraj/CAPTURe
- North America > Canada (0.04)
- Asia > Middle East > Jordan (0.04)