Gyeongsangbuk-do
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.04)
- (2 more...)
- Workflow (0.51)
- Research Report (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- North America > United States > North Carolina (0.05)
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (0.68)
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.04)
- Asia > Middle East > Jordan (0.04)
VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation
Sarowar, Md Selim, Kim, Sungho
Abstract--Vision Foundation Models (VFMs) and Vision Language Models (VLMs) have revolutionized computer vision by providing rich semantic and geometric representations. This paper presents a comprehensive visual comparison between CLIP based and DINOv2 based approaches for 3D pose estimation in hand object grasping scenarios. We evaluate both models on the task of 6D object pose estimation and demonstrate their complementary strengths: CLIP excels in semantic understanding through language grounding, while DINOv2 provides superior dense geometric features. Through extensive experiments on benchmark datasets, we show that CLIP based methods achieve better semantic consistency, while DINOv2 based approaches demonstrate competitive performance with enhanced geometric precision. Our analysis provides insights for selecting appropriate vision models for robotic manipulation and grasping, picking applications.
A CNN-Based Technique to Assist Layout-to-Generator Conversion for Analog Circuits
Jeong, Sungyu, Kim, Minsu, Kim, Byungsub
We propose a technique to assist in converting a reference layout of an analog circuit into the procedural layout generator by efficiently reusing available generators for sub-cell creation. The proposed convolutional neural network (CNN) model automatically detects sub-cells that can be generated by available generator scripts in the library, and suggests using them in the hierarchically correct places of the generator software. In experiments, the CNN model examined sub-cells of a high-speed wireline receiver that has a total of 4,885 sub-cell instances including different 145 sub-cell designs. The CNN model classified the sub-cell instances into 51 generatable and one not-generatable classes. One not-generatable class indicates that no available generator can generate the classified sub-cell. The CNN model achieved 99.3% precision in examining the 145 different sub-cell designs. The CNN model greatly reduced the examination time to 18 seconds from 88 minutes required in manual examination. Also, the proposed CNN model could correctly classify unfamiliar sub-cells that are very different from the training dataset.
- Asia > South Korea > Gyeongsangbuk-do > Pohang (0.05)
- North America > United States > Oregon > Washington County > Hillsboro (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (2 more...)
- Workflow (0.94)
- Research Report (0.64)
MNM : Multi-level Neuroimaging Meta-analysis with Hyperbolic Brain-Text Representations
Baek, Seunghun, Lee, Jaejin, Sim, Jaeyoon, Jeong, Minjae, Kim, Won Hwa
Various neuroimaging studies suffer from small sample size problem which often limit their reliability. Meta-analysis addresses this challenge by aggregating findings from different studies to identify consistent patterns of brain activity. However, traditional approaches based on keyword retrieval or linear mappings often overlook the rich hierarchical structure in the brain. In this work, we propose a novel framework that leverages hyperbolic geometry to bridge the gap between neuroscience literature and brain activation maps. By embedding text from research articles and corresponding brain images into a shared hyperbolic space via the Lorentz model, our method captures both semantic similarity and hierarchical organization inherent in neuroimaging data. In the hyperbolic space, our method performs multi-level neuroimaging meta-analysis (MNM) by 1) aligning brain and text embeddings for semantic correspondence, 2) guiding hierarchy between text and brain activations, and 3) preserving the hierarchical relationships within brain activation patterns. Experimental results demonstrate that our model outperforms baselines, offering a robust and interpretable paradigm of multi-level neuroimaging meta-analysis via hyperbolic brain-text representation.
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Health & Medicine > Health Care Technology (1.00)
Do Reasoning Vision-Language Models Inversely Scale in Test-Time Compute? A Distractor-centric Empirical Analysis
Bae, Jiyun, Ok, Hyunjong, Mo, Sangwoo, Lee, Jaeho
How does irrelevant information (i.e., distractors) affect test-time scaling in vision-language models (VLMs)? Prior studies on language models have reported an inverse scaling effect, where textual distractors lead to longer but less effective reasoning. To investigate whether similar phenomena occur in multimodal settings, we introduce Idis (Images with distractors), a visual question-answering dataset that systematically varies distractors along semantic, numerical, and spatial dimensions. Our analyses reveal that visual distractors differ fundamentally from textual ones: although inverse scaling persists, adding visual distractors reduces accuracy without increasing reasoning length. We further show that tracking attribute counts within reasoning traces provides key insights into how distractors, reasoning length, and accuracy interact. Finally, we demonstrate that these trends extend to established visual bias benchmarks such as Waterbirds, and we propose a simple prompting strategy to mitigate bias-driven predictions in reasoning models.
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
BERT-APC: A Reference-free Framework for Automatic Pitch Correction via Musical Context Inference
Kim, Sungjae, Na, Kihyun, Choi, Jinyoung, Kim, Injung
Automatic Pitch Correction (APC) enhances vocal recordings by aligning pitch deviations with the intended musical notes. However, existing APC systems either rely on reference pitches, which limits their practical applicability, or employ simple pitch estimation algorithms that often fail to preserve expressiveness and naturalness. We propose BERT-APC, a novel reference-free APC framework that corrects pitch errors while maintaining the natural expressiveness of vocal performances. In BERT-APC, a novel stationary pitch predictor first estimates the perceived pitch of each note from the detuned singing voice. A context-aware note pitch predictor estimates the intended pitch sequence by leveraging a music language model repurposed to incorporate musical context. Finally, a note-level correction algorithm fixes pitch errors while preserving intentional pitch deviations for emotional expression. In addition, we introduce a learnable data augmentation strategy that improves the robustness of the music language model by simulating realistic detuning patterns. Compared to two recent singing voice transcription models, BERT-APC demonstrated superior performance in note pitch prediction, outperforming the second-best model, ROSVOT, by 10.49%p on highly detuned samples in terms of the raw pitch accuracy. In the MOS test, BERT-APC achieved the highest score of $4.32 \pm 0.15$, which is significantly higher than those of the widely-used commercial APC tools, AutoTune ($3.22 \pm 0.18$) and Melodyne ($3.08 \pm 0.18$), while maintaining a comparable ability to preserve expressive nuances. To the best of our knowledge, this is the first APC model that leverages a music language model to achieve reference-free pitch correction with symbolic musical context. The corrected audio samples of BERT-APC are available online.
- Europe > Austria > Vienna (0.14)
- South America > Colombia > Meta Department > Villavicencio (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (2 more...)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)