AITopics

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Neural Information Processing SystemsFeb-7-2026, 12:45:27 GMT

118bd558033a1016fcc82560c65cca5f-Paper.pdf

DM-Count reduced the error of the state-of-the-art published result by approximately16%.

artificial intelligence, dm-count, machine learning, (18 more...)

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Neural Information Processing SystemsNov-21-2025, 16:12:27 GMT

Incorporating Side Information by Adaptive Convolution

incorporating side information, name change, side information, (9 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

arXiv.org Artificial IntelligenceNov-4-2025

CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward

Wang, Zhiqiang, Feng, Pengbin, Lin, Yanbin, Cai, Shuzhang, Bian, Zongao, Yan, Jinghua, Zhu, Xingquan

CrowdVLM-R1: Expanding R1 Ability to Vision Language Model for Crowd Counting using Fuzzy Group Relative Policy Reward 1 st Zhiqiang Wang Florida Atlantic University Boca Raton, USA zwang2022@fau.edu 2 nd Pengbin Feng University of Southern California Los Angeles, USA fengpengbin.apply@gmail.com Abstract--We propose CrowdVLM-R1, which expands the R1 base model for accurate crowd counting, using a novel framework that integrates the fuzzy group relative policy optimization reward function (FGRPR) to enhance learning efficiency. Unlike the conventional binary (0/1) accuracy reward, our fuzzy reward model, FGRPR, which contains both format and precision rewards, provides nuanced incentives to encourage the R1 model to learn to adjust policies towards precise outputs. Supervised fine-tuning (SFT) is also integrated for the CrowdVLM-R1 model to learn from a handful of inputs to enable both in-domain and out-of-domain counting. Experimental results demonstrate that GRPO with a standard binary accuracy reward underperforms compared to SFT . In contrast, FGRPR, applied to Qwen2.5-VL-(3B/7B), surpasses all baseline models, including GPT -4o, LLaMA2-70B and SFT, in five domain datasets. For out-of-domain datasets, FGRPR achieves performance comparable to SFT but excels when target values are larger, as its fuzzy reward function assigns higher rewards to closer approximations. This approach is broadly applicable to tasks where the precision of the answer is critical. I. INTRODUCTION Recently, DeepSeek R1 [1] has drawn much attention among advances in large language models (LLMs), as it demonstrates how reinforcement learning (RL) can be the primary driver of reasoning.

large language model, machine learning, natural language, (15 more...)

2504.03724

Country: North America > United States > California > Los Angeles County > Los Angeles (0.54)

Genre: Research Report > New Finding (0.88)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 11:16:34 GMT

Modeling Noisy Annotations for Crowd Counting

The annotation noise in crowd counting is not modeled in traditional crowd counting algorithms based on crowd density maps.

artificial intelligence, density map, machine learning, (13 more...)

Country: Asia > China (0.14)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 02:50:44 GMT

Distribution Matching for Crowd Counting Supplementary Material

DM-Count and investigate the robustness of different methods to noisy annotations. Assume for all x D and g G we have |g ( x) | B . We propose the following five lemmas which are essential for proving the proposed theorems. Lemmas A, B, C and D give the Lipschitz constants of different loss functions. Consider the dual form of Eq. (15) W ( µ, ν) = max α The first inequality in Eq. (20) is achieved because The second equality in Eq. (20) is achieved because We restate Theorem 1 in the main paper below.

artificial intelligence, dataset, machine learning, (15 more...)

Country: North America (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-2-2025, 02:50:37 GMT

Distribution Matching for Crowd Counting Boyu Wang

Instead, we propose to use Distribution Matching for crowd COUNTing (DM-Count).

artificial intelligence, machine learning, proceedings, (14 more...)

Country: North America (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Litrico, Mattia, Chen, Feng, Pound, Michael, Tsaftaris, Sotirios A, Battiato, Sebastiano, Giuffrida, Mario Valerio

Count2Density: Crowd Density Estimation without Location-level Annotations

arXiv.org Artificial IntelligenceSep-4-2025

Crowd density estimation is a well-known computer vision task aimed at estimating the density distribution of people in an image. The main challenge in this domain is the reliance on fine-grained location-level annotations, (i.e. points placed on top of each individual) to train deep networks. Collecting such detailed annotations is both tedious, time-consuming, and poses a significant barrier to scalability for real-world applications. To alleviate this burden, we present Count2Density: a novel pipeline designed to predict meaningful density maps containing quantitative spatial information using only count-level annotations (i.e., total number of people) during training. To achieve this, Count2Density generates pseudo-density maps leveraging past predictions stored in a Historical Map Bank, thereby reducing confirmation bias. This bank is initialised using an unsupervised saliency estimator to provide an initial spatial prior and is iteratively updated with an EMA of predicted density maps. These pseudo-density maps are obtained by sampling locations from estimated crowd areas using a hypergeometric distribution, with the number of samplings determined by the count-level annotations. To further enhance the spatial awareness of the model, we add a self-supervised contrastive spatial regulariser to encourage similar feature representations within crowded regions while maximising dissimilarity with background regions. Experimental results demonstrate that our approach significantly outperforms cross-domain adaptation methods and achieves better results than recent state-of-the-art approaches in semi-supervised settings across several datasets. Additional analyses validate the effectiveness of each individual component of our pipeline, confirming the ability of Count2Density to effectively retrieve spatial information from count-level annotations and enabling accurate subregion counting.

annotation, artificial intelligence, machine learning, (16 more...)

2509.0317

Country: Europe > United Kingdom (0.46)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.62)

arXiv.org Artificial IntelligenceJun-5-2025

Crowd Scene Analysis using Deep Learning Techniques

Asif, Muhammad Junaid

With the recent advancement in the field of deep learning and computer vision, crowd scene analysis has gained significant attention. UN predicts world population growth of 0.82% by 2035, driving people to cities for better lifestyles and social events like concerts, shopping, political gatherings, and educational conferences. Crowd scene analysis is crucial for ensuring a safe environment in public spaces, but manual monitoring can be laborious due to the risk of missing important information. An automatic solution is needed for efficient real-life applications. Our research is focused on two main applications of crowd scene analysis: crowd counting, and anomaly detection.

artificial intelligence, detection, machine learning, (16 more...)

2505.08834

Country:

Europe (0.92)
Asia > China (0.27)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Instructional Material (1.00)
Research Report > Promising Solution (0.92)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Government (1.00)
Transportation > Infrastructure & Services (0.92)
(4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-30-2025

A Transformer-based Multimodal Fusion Model for Efficient Crowd Counting Using Visual and Wireless Signals

Cui, Zhe, Li, Yuli, Tran, Le-Nam

--Current crowd-counting models often rely on single-modal inputs, such as visual images or wireless signal data, which can result in significant information loss and suboptimal recognition performance. T o address these shortcomings, we propose TransFusion, a novel multimodal fusion-based crowd-counting model that integrates Channel State Information (CSI) with image data. By leveraging the powerful capabilities of Transformer networks, TransFusion effectively combines these two distinct data modalities, enabling the capture of comprehensive global contextual information that is critical for accurate crowd estimation. However, while transformers are well capable of capturing global features, they potentially fail to identify finer-grained, local details essential for precise crowd counting. T o mitigate this, we incorporate Convolutional Neural Networks (CNNs) into the model architecture, enhancing its ability to extract detailed local features that complement the global context provided by the Transformer . Extensive experimental evaluations demonstrate that TransFusion achieves high accuracy with minimal counting errors while maintaining superior efficiency.

artificial intelligence, machine learning, modality, (13 more...)

2504.20178

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)