AITopics | Ryu, Soohyun

Plotting

Ryu, Soohyun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Preserve or Modify? Context-Aware Evaluation for Balancing Preservation and Modification in Text-Guided Image Editing

Kim, Yoonjeon, Ryu, Soohyun, Jung, Yeonsung, Lee, Hyunkoo, Kim, Joowon, Yang, June Yong, Hwang, Jaeryong, Yang, Eunho

arXiv.org Artificial IntelligenceDec-4-2024

The development of vision-language and generative models has significantly advanced text-guided image editing, which seeks the \textit{preservation} of core elements in the source image while implementing \textit{modifications} based on the target text. However, existing metrics have a \textbf{context-blindness} problem, indiscriminately applying the same evaluation criteria on completely different pairs of source image and target text, biasing towards either modification or preservation. Directional CLIP similarity, the only metric that considers both source image and target text, is also biased towards modification aspects and attends to irrelevant editing regions of the image. We propose \texttt{AugCLIP}, a \textbf{context-aware} metric that adaptively coordinates preservation and modification aspects, depending on the specific context of a given source image and target text. This is done by deriving the CLIP representation of an ideally edited image, that preserves the source image with necessary modifications to align with target text. More specifically, using a multi-modal large language model, \texttt{AugCLIP} augments the textual descriptions of the source and target, then calculates a modification vector through a hyperplane that separates source and target attributes in CLIP space. Extensive experiments on five benchmark datasets, encompassing a diverse range of editing scenarios, show that \texttt{AugCLIP} aligns remarkably well with human evaluation standards, outperforming existing metrics. The code will be open-sourced for community use.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.11374

Genre: Research Report (0.82)

Industry: Media > Photography (0.62)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback

Mode-GS: Monocular Depth Guided Anchored 3D Gaussian Splatting for Robust Ground-View Scene Rendering

Lee, Yonghan, Choi, Jaehoon, Jung, Dongki, Yun, Jaeseong, Ryu, Soohyun, Manocha, Dinesh, Yeon, Suyong

arXiv.org Artificial IntelligenceOct-6-2024

We present a novel-view rendering algorithm, Mode-GS, for ground-robot trajectory datasets. Our approach is based on using anchored Gaussian splats, which are designed to overcome the limitations of existing 3D Gaussian splatting algorithms. Prior neural rendering methods suffer from severe splat drift due to scene complexity and insufficient multi-view observation, and can fail to fix splats on the true geometry in ground-robot datasets. Our method integrates pixel-aligned anchors from monocular depths and generates Gaussian splats around these anchors using residual-form Gaussian decoders. To address the inherent scale ambiguity of monocular depth, we parameterize anchors with per-view depth-scales and employ scale-consistent depth loss for online scale calibration. Our method results in improved rendering performance, based on PSNR, SSIM, and LPIPS metrics, in ground scenes with free trajectory patterns, and achieves state-of-the-art rendering performance on the R3LIVE odometry dataset and the Tanks and Temples dataset.

artificial intelligence, dataset, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2410.04646

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback