AITopics | motion vector

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceNov-21-2025

Linear time small coresets for k-mean clustering of segments with applications

Denisov, David, Dolev, Shlomi, Felmdan, Dan, Segal, Michael

We study the $k$-means problem for a set $\mathcal{S} \subseteq \mathbb{R}^d$ of $n$ segments, aiming to find $k$ centers $X \subseteq \mathbb{R}^d$ that minimize $D(\mathcal{S},X) := \sum_{S \in \mathcal{S}} \min_{x \in X} D(S,x)$, where $D(S,x) := \int_{p \in S} |p - x| dp$ measures the total distance from each point along a segment to a center. Variants of this problem include handling outliers, employing alternative distance functions such as M-estimators, weighting distances to achieve balanced clustering, or enforcing unique cluster assignments. For any $\varepsilon > 0$, an $\varepsilon$-coreset is a weighted subset $C \subseteq \mathbb{R}^d$ that approximates $D(\mathcal{S},X)$ within a factor of $1 \pm \varepsilon$ for any set of $k$ centers, enabling efficient streaming, distributed, or parallel computation. We propose the first coreset construction that provably handles arbitrary input segments. For constant $k$ and $\varepsilon$, it produces a coreset of size $O(\log^2 n)$ computable in $O(nd)$ time. Experiments, including a real-time video tracking application, demonstrate substantial speedups with minimal loss in clustering accuracy, confirming both the practical efficiency and theoretical guarantees of our method.

artificial intelligence, coreset, machine learning, (18 more...)

2511.12564

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Singapore (0.04)
Asia > Malaysia (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Hardware (0.68)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsNov-14-2025, 13:51:35 GMT

Compressed Video Contrastive Learning

Existing state-of-the-art methods [Han et al. , 2020b; Tao et al. , 2020; Huo et al. , 2021] mainly focus More details can be found in Table 1. This clearly hinders large-scale video self-supervised training.

artificial intelligence, machine learning, motion vector, (17 more...)

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-13-2025, 23:57:55 GMT

180f6184a3458fa19c28c5483bc61877-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (17 more...)

Country:

Europe > United Kingdom > England > Staffordshire (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

arXiv.org Artificial IntelligenceNov-5-2025

Wireless Video Semantic Communication with Decoupled Diffusion Multi-frame Compensation

Xie, Bingyan, Wu, Yongpeng, Shi, Yuxuan, Feng, Biqian, Zhang, Wenjun, Park, Jihong, Quek, Tony

Existing wireless video transmission schemes directly conduct video coding in pixel level, while neglecting the inner semantics contained in videos. In this paper, we propose a wireless video semantic communication framework with decoupled diffusion multi-frame compensation (DDMFC), abbreviated as WVSC-D, which integrates the idea of semantic communication into wireless video transmission scenarios. WVSC-D first encodes original video frames as semantic frames and then conducts video coding based on such compact representations, enabling the video coding in semantic level rather than pixel level. Moreover, to further reduce the communication overhead, a reference semantic frame is introduced to substitute motion vectors of each frame in common video coding methods. At the receiver, DDMFC is proposed to generate compensated current semantic frame by a two-stage conditional diffusion process. With both the reference frame transmission and DDMFC frame compensation, the bandwidth efficiency improves with satisfying video transmission performance. Experimental results verify the performance gain of WVSC-D over other DL-based methods e.g. DVSC about 1.8 dB in terms of PSNR.

artificial intelligence, machine learning, natural language, (20 more...)

2511.02478

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Shanghai > Shanghai (0.04)
North America > Canada > Quebec > Montreal (0.04)
(13 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.91)

Neural Information Processing SystemsOct-8-2025, 19:46:33 GMT

656678aa961a99a6a3d59bfbf88daf77-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (10 more...)

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Neural Information Processing SystemsOct-8-2025, 04:59:02 GMT

180f6184a3458fa19c28c5483bc61877-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (17 more...)

Country: Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.68)

arXiv.org Artificial IntelligenceOct-6-2025

Representation Learning for Compressed Video Action Recognition via Attentive Cross-modal Interaction with Motion Enhancement

Li, Bing, Chen, Jiaxin, Zhang, Dongming, Bao, Xiuguo, Huang, Di

Compressed video action recognition has recently drawn growing attention, since it remarkably reduces the storage and computational cost via replacing raw videos by sparsely sampled RGB frames and compressed motion cues ( e.g., motion vectors and residuals). However, this task severely suffers from the coarse and noisy dynamics and the insufficient fusion of the heterogeneous RGB and motion modalities. To address the two issues above, this paper proposes a novel framework, namely Attentive Cross-modal Interaction Network with Motion Enhancement (MEACI-Net). It follows the two-stream architecture, i.e. one for the RGB modality and the other for the motion modality. Particularly, the motion stream employs a multi-scale block embedded with a denoising module to enhance representation learning. The interaction between the two streams is then strengthened by introducing the Selective Motion Complement (SMC) and Cross-Modality Augment (CMA) modules, where SMC complements the RGB modality with spatio-temporally attentive local motion features and CMA further combines the two modalities with selective feature augmentation. Extensive experiments on the UCF-101, HMDB-51 and Kinetics-400 benchmarks demonstrate the effectiveness and efficiency of MEACI-Net.

artificial intelligence, deep learning, machine learning, (18 more...)

2205.03569

Country:

Asia > China > Beijing > Beijing (0.04)
Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
Asia > China > Guangxi Province > Nanning (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Neural Information Processing SystemsAug-15-2025, 06:27:56 GMT

Compressed Video Contrastive Learning

Existing state-of-the-art methods [Han et al. , 2020b; Tao et al. , 2020; Huo et al. , 2021] mainly focus More details can be found in Table 1. This clearly hinders large-scale video self-supervised training.

artificial intelligence, machine learning, motion vector, (17 more...)

Country:

Asia > China > Beijing > Beijing (0.05)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJul-10-2025

LOVON: Legged Open-Vocabulary Object Navigator

Peng, Daojie, Cao, Jiahang, Zhang, Qiang, Ma, Jun

--Object navigation in open-world environments remains a formidable and pervasive challenge for robotic systems, particularly when it comes to executing long-horizon tasks that require both open-world object detection and high-level task planning. Traditional methods often struggle to integrate these components effectively, and this limits their capability to deal with complex, long-range navigation missions. In this paper, we propose LOVON, a novel framework that integrates large language models (LLMs) for hierarchical task planning with open-vocabulary visual detection models, tailored for effective long-range object navigation in dynamic, unstructured environments. T o tackle real-world challenges including visual jittering, blind zones, and temporary target loss, we design dedicated solutions such as Laplacian V ariance Filtering for visual stabilization. We also develop a functional execution logic for the robot that guarantees LOVON's capabilities in autonomous navigation, task adaptation, and robust task completion. Extensive evaluations demonstrate the successful completion of long-sequence tasks involving real-time detection, search, and navigation toward open-vocabulary dynamic targets. In recent years, large language models (LLMs) [1] and vision models [2]-[5] have achieved revolutionary breakthroughs in the field of artificial intelligence.

large language model, machine learning, natural language, (20 more...)