AITopics | Kim, Hyeongwoo

Collaborating Authors

Kim, Hyeongwoo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Contextual Gesture: Co-Speech Gesture Video Generation through Context-aware Gesture Representation

Liu, Pinxin, Zhang, Pengfei, Kim, Hyeongwoo, Garrido, Pablo, Sharpio, Ari, Olszewski, Kyle

arXiv.org Artificial IntelligenceFeb-10-2025

Co-speech gesture generation is crucial for creating lifelike avatars and enhancing human-computer interactions by synchronizing gestures with speech. Despite recent advancements, existing methods struggle with accurately identifying the rhythmic or semantic triggers from audio for generating contextualized gesture patterns and achieving pixel-level realism. To address these challenges, we introduce Contextual Gesture, a framework that improves co-speech gesture video generation through three innovative components: (1) a chronological speech-gesture alignment that temporally connects two modalities, (2) a contextualized gesture tokenization that incorporate speech context into motion pattern representation through distillation, and (3) a structure-aware refinement module that employs edge connection to link gesture keypoints to improve video generation. Our extensive experiments demonstrate that Contextual Gesture not only produces realistic and speech-aligned gesture videos but also supports long-sequence generation and video gesture editing applications, shown in Fig.1 Project Page: https://andypinxinliu.github.io/Contextual-Gesture/.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.07239

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Discrete Diffusion Schr\"odinger Bridge Matching for Graph Transformation

Kim, Jun Hyeong, Kim, Seonghwan, Moon, Seokhyun, Kim, Hyeongwoo, Woo, Jeheon, Kim, Woo Youn

arXiv.org Artificial IntelligenceOct-2-2024

Transporting between arbitrary distributions is a fundamental goal in generative modeling. Recently proposed diffusion bridge models provide a potential solution, but they rely on a joint distribution that is difficult to obtain in practice. Furthermore, formulations based on continuous domains limit their applicability to discrete domains such as graphs. To overcome these limitations, we propose Discrete Diffusion Schrödinger Bridge Matching (DDSBM), a novel framework that utilizes continuous-time Markov chains to solve the SB problem in a highdimensional discrete state space. Our approach extends Iterative Markovian Fitting to discrete domains, and we have proved its convergence to the SB. Furthermore, we adapt our framework for the graph transformation and show that our design choice of underlying dynamics characterized by independent modifications of nodes and edges can be interpreted as the entropy-regularized version of optimal transport with a cost function described by the graph edit distance. To demonstrate the effectiveness of our framework, we have applied DDSBM to molecular optimization in the field of chemistry. Experimental results demonstrate that DDSBM effectively optimizes molecules' property-of-interest with minimal graph transformation, successfully retaining other features. Transporting an initial distribution to a target distribution is a foundational concept in modern generative modeling. Denoising diffusion models (DDMs) have been highly influential in this area, with a primary focus on generating data distributions from simple prior (Sohl-Dickstein et al., 2015; Song & Ermon, 2019; Ho et al., 2020; Song et al., 2020; Kim et al., 2024b). Despite their promising results, setting the initial distribution as a simple prior makes DDMs hard to work in tasks where the initial distribution becomes a data distribution, such as image-to-image translation. To tackle this, diffusion bridge models (DBMs) extend DDMs to transport data between arbitrary distributions (Liu & Wu, 2023; Liu et al., 2023; Zhou et al., 2023).

artificial intelligence, machine learning, molecule, (19 more...)

arXiv.org Artificial Intelligence

2410.015

Country: Asia (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Add feedback

Deep Video Portraits

Kim, Hyeongwoo, Garrido, Pablo, Tewari, Ayush, Xu, Weipeng, Thies, Justus, Nießner, Matthias, Pérez, Patrick, Richardt, Christian, Zollhöfer, Michael, Theobalt, Christian

arXiv.org Artificial IntelligenceMay-29-2018

We present a novel approach that enables photo-realistic re-animation of portrait videos using only an input video. In contrast to existing approaches that are restricted to manipulations of facial expressions only, we are the first to transfer the full 3D head position, head rotation, face expression, eye gaze, and eye blinking from a source actor to a portrait video of a target actor. The core of our approach is a generative neural network with a novel space-time architecture. The network takes as input synthetic renderings of a parametric face model, based on which it predicts photo-realistic video frames for a given target actor. The realism in this rendering-to-video transfer is achieved by careful adversarial training, and as a result, we can create modified target videos that mimic the behavior of the synthetically-created input. In order to enable source-to-target video re-animation, we render a synthetic target video with the reconstructed head animation parameters from a source video, and feed it into the trained network -- thus taking full control of the target. With the ability to freely recombine source and target parameters, we are able to demonstrate a large variety of video rewrite applications without explicitly modeling hair, body or background. For instance, we can reenact the full head using interactive user-controlled editing, and realize high-fidelity visual dubbing. To demonstrate the high quality of our output, we conduct an extensive series of experiments and evaluations, where for instance a user study shows that our video edits are hard to detect.

artificial intelligence, neural network, video, (15 more...)

arXiv.org Artificial Intelligence

1805.11714

Country:

Europe > United Kingdom (0.93)
North America > United States > California > Santa Clara County (0.14)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > Promising Solution (0.48)

Industry: Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Partial Sum Minimization of Singular Values in Robust PCA: Algorithm and Applications

Oh, Tae-Hyun, Tai, Yu-Wing, Bazin, Jean-Charles, Kim, Hyeongwoo, Kweon, In So

arXiv.org Artificial IntelligenceAug-13-2015

Robust Principal Component Analysis (RPCA) via rank minimization is a powerful tool for recovering underlying low-rank structure of clean data corrupted with sparse noise/outliers. In many low-level vision problems, not only it is known that the underlying structure of clean data is low-rank, but the exact rank of clean data is also known. Yet, when applying conventional rank minimization for those problems, the objective function is formulated in a way that does not fully utilize a priori target rank information about the problems. This observation motivates us to investigate whether there is a better alternative solution when using rank minimization. In this paper, instead of minimizing the nuclear norm, we propose to minimize the partial sum of singular values, which implicitly encourages the target rank constraint. Our experimental analyses show that, when the number of samples is deficient, our approach leads to a higher success rate than conventional rank minimization, while the solutions obtained by the two approaches are almost identical when the number of samples is more than sufficient. We apply our approach to various low-level vision problems, e.g. high dynamic range imaging, motion edge detection, photometric stereo, image alignment and recovery, and show that our results outperform those obtained by the conventional nuclear norm rank minimization method.

health & medicine, optimization problem, singular value, (21 more...)

arXiv.org Artificial Intelligence

1503.01444

Country:

Asia > South Korea (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Industry:

Health & Medicine (0.54)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback