AITopics | object localization

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization

Neural Information Processing SystemsDec-25-2025, 18:26:26 GMT

Aiming to locate the object that emits a specified sound in complex scenes, the task of sounding object localization bridges two perception-oriented modalities of vision and acoustics, and brings enormous research value to the comprehensive perceptual understanding of machine intelligence. Although there are massive training data collected in this field, few of them contain accurate bounding box annotations, hindering the learning process and further application of proposed models. In order to address this problem, we try to explore an effective multi-modal knowledge transfer strategy to obtain precise knowledge from other similar tasks and transfer it through well-aligned multi-modal data to deal with this task in a zero-resource manner. Concretely, we design and propose a novel \textit{Two-stream Universal Referring localization Network} (TURN), which is composed of a localization stream and an alignment stream to carry out different functions. The former is utilized to extract the knowledge related to referring object localization from the image grounding task, while the latter is devised to learn a universal semantic space shared between texts and audios. Moreover, we further develop an adaptive sampling strategy to automatically identify the overlap between different data domains, thus boosting the performance and stability of our model. The extensive experiments on various publicly-available benchmarks demonstrate that TURN can achieve competitive performance compared with the state-of-the-art approaches without using any data in this field, which verifies the feasibility of our proposed mechanisms and strategies.

effective multi-modal interchange, name change, object localization, (3 more...)

Neural Information Processing Systems

Industry:

Transportation > Infrastructure & Services (0.43)
Transportation > Ground > Road (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Object Localization based on Structural SVM using Privileged Information

Neural Information Processing SystemsSep-30-2025, 10:01:54 GMT

We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information. Privileged information provides useful high-level knowledge for image understanding and facilitates learning a reliable model even with a small number of training examples. In our setting, we assume that such information is available only at training time since it may be difficult to obtain from visual data accurately without human supervision. Our goal is to improve performance by incorporating privileged information into ordinary learning framework and adjusting model parameters for better generalization. We tackle object localization problem based on a novel structural SVM using privileged information, where an alternating loss-augmented inference procedure is employed to handle the term in the objective function corresponding to privileged information. We apply the proposed algorithm to the Caltech-UCSD Birds 200-2011 dataset, and obtain encouraging results suggesting further investigation into the benefit of privileged information in structured prediction.

object localization, privileged information, structural svm, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.61)

Add feedback

Object Localization based on Structural SVM using Privileged Information

Jan Feyereisl, Suha Kwak, Jeany Son, Bohyung Han

Neural Information Processing SystemsFeb-9-2025, 06:56:45 GMT

We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information. Privileged information provides useful high-level knowledge for image understanding and facilitates learning a reliable model even with a small number of training examples. In our setting, we assume that such information is available only at training time since it may be difficult to obtain from visual data accurately without human supervision. Our goal is to improve performance by incorporating privileged information into ordinary learning framework and adjusting model parameters for better generalization. We tackle object localization problem based on a novel structural SVM using privileged information, where an alternating loss-augmented inference procedure is employed to handle the term in the objective function corresponding to privileged information. We apply the proposed algorithm to the Caltech-UCSD Birds 200-2011 dataset, and obtain encouraging results suggesting further investigation into the benefit of privileged information in structured prediction.

artificial intelligence, information, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Review for NeurIPS paper: Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Neural Information Processing SystemsJan-25-2025, 17:41:29 GMT

Additional Feedback: The paper presents a framework for localizing sounding objects in an audiovisual scene. Overall, I liked the paper. The proposed approach is neat and makes sense to the most extent. I have a few points of concern and I would like to see the author's responses on them. I would be happy to raise my overall score if the responses are satisfactory.

neurips paper, object localization, self-supervised audiovisual matching, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.39)

Add feedback

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization

Neural Information Processing SystemsJan-19-2025, 07:41:05 GMT

Aiming to locate the object that emits a specified sound in complex scenes, the task of sounding object localization bridges two perception-oriented modalities of vision and acoustics, and brings enormous research value to the comprehensive perceptual understanding of machine intelligence. Although there are massive training data collected in this field, few of them contain accurate bounding box annotations, hindering the learning process and further application of proposed models. In order to address this problem, we try to explore an effective multi-modal knowledge transfer strategy to obtain precise knowledge from other similar tasks and transfer it through well-aligned multi-modal data to deal with this task in a zero-resource manner. Concretely, we design and propose a novel \textit{Two-stream Universal Referring localization Network} (TURN), which is composed of a localization stream and an alignment stream to carry out different functions. The former is utilized to extract the knowledge related to referring object localization from the image grounding task, while the latter is devised to learn a universal semantic space shared between texts and audios.

effective multi-modal interchange, knowledge, object localization

Neural Information Processing Systems

Industry:

Transportation > Infrastructure & Services (0.40)
Transportation > Ground > Road (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.79)

Add feedback

YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization

Pöllabauer, Thomas

arXiv.org Artificial IntelligenceNov-20-2024

Localizing target objects in images is an important task in computer vision. Often it is the first step towards solving a variety of applications in autonomous driving, maintenance, quality insurance, robotics, and augmented reality. Best in class solutions for this task rely on deep neural networks, which require a set of representative training data for best performance. Creating sets of sufficient quality, variety, and size is often difficult, error prone, and expensive. This is where the method of luminance keying [10,8] can help: it provides a simple yet effective solution to record high quality data for training object detection and segmentation. We extend previous work that presented luminance keying on the common YCB-V set of household objects [14] by recording the remaining objects of the YCB superset. The additional variety of objects - addition of transparency, multiple color variations, non-rigid objects - further demonstrates the usefulness of luminance keying and might be used to test the applicability of the approach on new 2D object detection and segmentation algorithms.

dataset, luminance, pose estimation, (14 more...)

arXiv.org Artificial Intelligence

2411.13149

Country: Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Sports (0.47)
Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Neural Information Processing SystemsOct-10-2024, 12:56:58 GMT

Discriminatively localizing sounding objects in cocktail-party, i.e., mixed sound scenes, is commonplace for humans, but still challenging for machines. In this paper, we propose a two-stage learning framework to perform self-supervised class-aware sounding object localization. First, we propose to learn robust object representations by aggregating the candidate sound localization results in the single source scenes. Then, class-aware object localization maps are generated in the cocktail-party scenarios by referring the pre-learned object knowledge, and the sounding objects are accordingly selected by matching audio and visual object category distributions, where the audiovisual consistency is viewed as the self-supervised signal. Experimental results in both realistic and synthesized cocktail-party videos demonstrate that our model is superior in filtering out silent objects and pointing out the location of sounding objects of different classes.

object localization, self-supervised audiovisual matching

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.45)

Add feedback

Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos

Murtaza, Shakeeb, Pedersoli, Marco, Sarraf, Aydin, Granger, Eric

arXiv.org Artificial IntelligenceJul-8-2024

Weakly-Supervised Video Object Localization (WSVOL) involves localizing an object in videos using only video-level labels, also referred to as tags. State-of-the-art WSVOL methods like Temporal CAM (TCAM) rely on class activation mapping (CAM) and typically require a pre-trained CNN classifier. However, their localization accuracy is affected by their tendency to minimize the mutual information between different instances of a class and exploit temporal information during training for downstream tasks, e.g., detection and tracking. In the absence of bounding box annotation, it is challenging to exploit precise information about objects from temporal cues because the model struggles to locate objects over time. To address these issues, a novel method called transformer based CAM for videos (TrCAM-V), is proposed for WSVOL. It consists of a DeiT backbone with two heads for classification and localization. The classification head is trained using standard classification loss (CL), while the localization head is trained using pseudo-labels that are extracted using a pre-trained CLIP model. From these pseudo-labels, the high and low activation values are considered to be foreground and background regions, respectively. Our TrCAM-V method allows training a localization network by sampling pseudo-pixels on the fly from these regions. Additionally, a conditional random field (CRF) loss is employed to align the object boundaries with the foreground map. During inference, the model can process individual frames for real-time localization applications. Extensive experiments on challenging YouTube-Objects unconstrained video datasets show that our TrCAM-V method achieves new state-of-the-art performance in terms of classification and localization accuracy.

localization, segmentation, video, (17 more...)

arXiv.org Artificial Intelligence

2407.06018

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Object Localization based on Structural SVM using Privileged Information

Neural Information Processing SystemsMar-13-2024, 09:30:23 GMT

We propose a structured prediction algorithm for object localization based on Support Vector Machines (SVMs) using privileged information. Privileged information provides useful high-level knowledge for image understanding and facilitates learning a reliable model even with a small number of training examples. In our setting, we assume that such information is available only at training time since it may be difficult to obtain from visual data accurately without human supervision. Our goal is to improve performance by incorporating privileged information into ordinary learning framework and adjusting model parameters for better generalization. We tackle object localization problem based on a novel structural SVM using privileged information, where an alternating loss-augmented inference procedure is employed to handle the term in the objective function corresponding to privileged information. We apply the proposed algorithm to the Caltech-UCSD Birds 200-2011 dataset, and obtain encouraging results suggesting further investigation into the benefit of privileged information in structured prediction.

information, localization, privileged information, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)
Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

Seeing Beyond Cancer: Multi-Institutional Validation of Object Localization and 3D Semantic Segmentation using Deep Learning for Breast MRI

Pekis, Arda, Kannan, Vignesh, Kaklamanos, Evandros, Antony, Anu, Patel, Snehal, Earnest, Tyler

arXiv.org Artificial IntelligenceNov-27-2023

The clinical management of breast cancer depends on an accurate understanding of the tumor and its anatomical context to adjacent tissues and landmark structures. This context may be provided by semantic segmentation methods; however, previous works have been largely limited to a singular focus on the tumor alone and rarely other tissue types. In contrast, we present a method that exploits tissue-tissue interactions to accurately segment every major tissue type in the breast including: chest wall, skin, adipose tissue, fibroglandular tissue, vasculature and tumor via standard-of-care Dynamic Contrast Enhanced MRI. Comparing our method to prior state-of-the-art, we achieved a superior Dice score on tumor segmentation while maintaining competitive performance on other studied tissues across multiple institutions. Briefly, our method proceeds by localizing the tumor using 2D object detectors, then segmenting the tumor and surrounding tissues independently using two 3D U-nets, and finally integrating these results while mitigating false positives by checking for anatomically plausible tissue-tissue contacts. The object detection models were pre-trained on ImageNet and COCO, and operated on MIP (maximum intensity projection) images in the axial and sagittal planes, establishing a 3D tumor bounding box. By integrating multiple relevant peri-tumoral tissues, our work enables clinical applications in breast cancer staging, prognosis and surgical planning.

segmentation, semantic segmentation, tumor, (13 more...)

arXiv.org Artificial Intelligence

2311.16213

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

object localization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization

Object Localization based on Structural SVM using Privileged Information

Object Localization based on Structural SVM using Privileged Information

Review for NeurIPS paper: Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Towards Effective Multi-Modal Interchanges in Zero-Resource Sounding Object Localization

YCB-LUMA: YCB Object Dataset with Luminance Keying for Object Localization

Discriminative Sounding Objects Localization via Self-supervised Audiovisual Matching

Leveraging Transformers for Weakly Supervised Object Localization in Unconstrained Videos

Object Localization based on Structural SVM using Privileged Information

Seeing Beyond Cancer: Multi-Institutional Validation of Object Localization and 3D Semantic Segmentation using Deep Learning for Breast MRI