Goto

Collaborating Authors

 East Irish Sea




MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results

Kondo, Yuki, Ukita, Norimichi, Kanayama, Riku, Yoshida, Yuki, Yamaguchi, Takayuki, Yu, Xiang, Liang, Guang, Liu, Xinyao, Wang, Guan-Zhang, Chu, Wei-Ta, Chuang, Bing-Cheng, Lee, Jia-Hua, Kuo, Pin-Tseng, Chu, I-Hsuan, Hsiao, Yi-Shein, Wu, Cheng-Han, Wu, Po-Yi, Tsou, Jui-Chien, Liu, Hsuan-Chi, Lee, Chun-Yi, Yang, Yuan-Fu, Shigematsu, Kosuke, Shin, Asuka, Tran, Ba

arXiv.org Artificial Intelligence

Small Multi-Object Tracking (SMOT) is particularly challenging when targets occupy only a few dozen pixels, rendering detection and appearance-based association unreliable. Building on the success of the MVA2023 SOD4SB challenge, this paper introduces the SMOT4SB challenge, which leverages temporal information to address limitations of single-frame detection. Our three main contributions are: (1) the SMOT4SB dataset, consisting of 211 UAV video sequences with 108,192 annotated frames under diverse real-world conditions, designed to capture motion entanglement where both camera and targets move freely in 3D; (2) SO-HOTA, a novel metric combining Dot Distance with HOTA to mitigate the sensitivity of IoU-based metrics to small displacements; and (3) a competitive MVA2025 challenge with 78 participants and 308 submissions, where the winning method achieved a 5.1x improvement over the baseline. This work lays a foundation for advancing SMOT in UAV scenarios with applications in bird strike avoidance, agriculture, fisheries, and ecological monitoring.


Small and Dim Target Detection in IR Imagery: A Review

Kumar, Nikhil, Singh, Pravendra

arXiv.org Artificial Intelligence

While there has been significant progress in object detection using conventional image processing and machine learning algorithms, exploring small and dim target detection in the IR domain is a relatively new area of study. The majority of small and dim target detection methods are derived from conventional object detection algorithms, albeit with some alterations. The task of detecting small and dim targets in IR imagery is complex. This is because these targets often need distinct features, the background is cluttered with unclear details, and the IR signatures of the scene can change over time due to fluctuations in thermodynamics. The primary objective of this review is to highlight the progress made in this field. This is the first review in the field of small and dim target detection in infrared imagery, encompassing various methodologies ranging from conventional image processing to cutting-edge deep learning-based approaches. The authors have also introduced a taxonomy of such approaches. There are two main types of approaches: methodologies using several frames for detection, and single-frame-based detection techniques. Single frame-based detection techniques encompass a diverse range of methods, spanning from traditional image processing-based approaches to more advanced deep learning methodologies. Our findings indicate that deep learning approaches perform better than traditional image processing-based approaches. In addition, a comprehensive compilation of various available datasets has also been provided. Furthermore, this review identifies the gaps and limitations in existing techniques, paving the way for future research and development in this area.


Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Cui, Hejie, Fang, Xinyu, Zhang, Zihan, Xu, Ran, Kan, Xuan, Liu, Xin, Yu, Yue, Li, Manling, Song, Yangqiu, Yang, Carl

arXiv.org Artificial Intelligence

Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.


Adaptive and Collaborative Bathymetric Channel-Finding Approach for Multiple Autonomous Marine Vehicles

Gershfeld, Nikolai, Paine, Tyler M, Benjamin, Michael R.

arXiv.org Artificial Intelligence

This paper reports an investigation into the problem of rapid identification of a channel that crosses a body of water using one or more Unmanned Surface Vehicles (USV). A new algorithm called Proposal Based Adaptive Channel Search (PBACS) is presented as a potential solution that improves upon current methods. The empirical performance of PBACS is compared to lawnmower surveying and to Markov decision process (MDP) planning with two state-of-the-art reward functions: Upper Confidence Bound (UCB) and Maximum Value Information (MVI). The performance of each method is evaluated through comparison of the time it takes to identify a continuous channel through an area, using one, two, three, or four USVs. The performance of each method is compared across ten simulated bathymetry scenarios and one field area, each with different channel layouts. The results from simulations and field trials indicate that on average multi-vehicle PBACS outperforms lawnmower, UCB, and MVI based methods, especially when at least three vehicles are used.


14 relaxing video games to help you destress

Engadget

In recent years, we've seen an influx of self-proclaimed "cozy games," video games explicitly designed to invoke good vibes. To help those who could use some help winding down, we've rounded up a selection of games that purposefully deemphasize fail states, violence, overwhelming grinds, intense competition and other aggressive urges, but aren't overly cute for the sake of it or so stripped-down that they're boring. This open-ended sim has you fix up a dilapidated farm and interact with nearby townsfolk. Apart from being one of our favorite couch co-op games, the farming life sim Stardew Valley is also notable for its relaxing qualities. It's a game that's willing to meet you at your pace: If you want to putter around your farm, casually chat up townsfolk, brew beer or fish for a few hours, you can.


MLB's top prospects deal with good, bad of 'robot' umpires

FOX News

Fox News Flash top headlines for Oct. 25 are here. Check out what's clicking on Foxnews.com First baseman Ali Sanchez was standing in the on-deck circle so he had a great vantage point of the two-strike breaking ball to Jacob Heyward. It finished so low that by the time it reached the catcher it nearly bounced in the dirt. Sanchez -- like everybody else who was watching this game on a Tuesday night in the Arizona Fall League -- had an immediate mental reaction.


Contrastive Learning for Image Captioning

Dai, Bo, Lin, Dahua

Neural Information Processing Systems

Image captioning, a popular topic in computer vision, has achieved substantial progress in recent years. However, the distinctiveness of natural descriptions is often overlooked in previous work. It is closely related to the quality of captions, as distinctive captions are more likely to describe images with their unique aspects. In this work, we propose a new learning method, Contrastive Learning (CL), for image captioning. Specifically, via two constraints formulated on top of a reference model, the proposed method can encourage distinctiveness, while maintaining the overall quality of the generated captions. We tested our method on two challenging datasets, where it improves the baseline model by significant margins. We also showed in our studies that the proposed method is generic and can be used for models with various structures.