AITopics | Lee, Junhyeok

Collaborating Authors

Lee, Junhyeok

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation

Lee, Junhyeok, Oh, Yujin, Lee, Dahyoun, Joh, Hyon Keun, Sohn, Chul-Ho, Baik, Sung Hyun, Jung, Cheol Kyu, Park, Jung Hyun, Choi, Kyu Sung, Kim, Byung-Hoon, Ye, Jong Chul

arXiv.org Artificial IntelligenceNov-23-2024

Acute ischemic stroke (AIS) requires time-critical management, with hours of delayed intervention leading to an irreversible disability of the patient. Since diffusion weighted imaging (DWI) using the magnetic resonance image (MRI) plays a crucial role in the detection of AIS, automated prediction of AIS from DWI has been a research topic of clinical importance. While text radiology reports contain the most relevant clinical information from the image findings, the difficulty of mapping across different modalities has limited the factuality of conventional direct DWI-to-report generation methods. Here, we propose paired image-domain retrieval and text-domain augmentation (PIRTA), a cross-modal retrieval-augmented generation (RAG) framework for providing clinician-interpretative AIS radiology reports with improved factuality. PIRTA mitigates the need for learning cross-modal mapping, which poses difficulty in image-to-text generation, by casting the cross-modal mapping problem as an in-domain retrieval of similar DWI images that have paired ground-truth text radiology reports. By exploiting the retrieved radiology reports to augment the report generation process of the query image, we show by experiments with extensive in-house and public datasets that PIRTA can accurately retrieve relevant reports from 3D DWI images. This approach enables the generation of radiology reports with significantly higher accuracy compared to direct image-to-text generation using state-of-the-art multimodal language models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.1549

Country: North America > United States > Massachusetts (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

JenGAN: Stacked Shifted Filters in GAN-Based Speech Synthesis

Cho, Hyunjae, Lee, Junhyeok, Jung, Wonbin

arXiv.org Artificial IntelligenceJun-10-2024

Non-autoregressive GAN-based neural vocoders are widely used due to their fast inference speed and high perceptual quality. However, they often suffer from audible artifacts such as tonal artifacts in their generated results. Therefore, we propose JenGAN, a new training strategy that involves stacking shifted low-pass filters to ensure the shift-equivariant property. This method helps prevent aliasing and reduce artifacts while preserving the model structure used during inference. In our experimental evaluation, JenGAN consistently enhances the performance of vocoder models, yielding significantly superior scores across the majority of evaluation metrics.

artificial intelligence, jengan, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.06111

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.42)

Add feedback

LLM-Based Cooperative Agents using Information Relevance and Plan Validation

Seo, SeungWon, Lee, Junhyeok, Noh, SeongRae, Kang, HyeongYeop

arXiv.org Artificial IntelligenceMay-26-2024

We address the challenge of multi-agent cooperation, where agents achieve a common goal by interacting with a 3D scene and cooperating with decentralized agents under complex partial observations. This involves managing communication costs and optimizing interaction trajectories in dynamic environments. Our research focuses on three primary limitations of existing cooperative agent systems. Firstly, current systems demonstrate inefficiency in managing acquired information through observation, resulting in declining planning performance as the environment becomes more complex with additional objects or goals. Secondly, the neglect of false plans in partially observable settings leads to suboptimal cooperative performance, as agents struggle to adapt to environmental changes influenced by the unseen actions of other agents. Lastly, the failure to incorporate spatial data into decision-making processes restricts the agent's ability to construct optimized trajectories. To overcome these limitations, we propose the RElevance and Validation-Enhanced Cooperative Language Agent (REVECA), a novel cognitive architecture powered by GPT-3.5. REVECA leverages relevance assessment, plan validation, and spatial information to enhance the efficiency and robustness of agent cooperation in dynamic and partially observable environments while minimizing continuous communication costs and effectively managing irrelevant dummy objects. Our extensive experiments demonstrate the superiority of REVECA over previous approaches, including those driven by GPT-4.0. Additionally, a user study highlights REVECA's potential for achieving trustworthy human-AI cooperation. We expect that REVECA will have significant applications in gaming, XR applications, educational tools, and humanoid robots, contributing to substantial economic, commercial, and academic advancements.

information, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.16751

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Direct Preference-based Policy Optimization without Reward Modeling

An, Gaon, Lee, Junhyeok, Zuo, Xingdong, Kosaka, Norio, Kim, Kyung-Min, Song, Hyun Oh

arXiv.org Artificial IntelligenceOct-27-2023

Preference-based reinforcement learning (PbRL) is an approach that enables RL agents to learn from preference, which is particularly useful when formulating a reward function is challenging. Existing PbRL methods generally involve a two-step procedure: they first learn a reward model based on given preference data and then employ off-the-shelf reinforcement learning algorithms using the learned reward model. However, obtaining an accurate reward model solely from preference information, especially when the preference is from human teachers, can be difficult. Instead, we propose a PbRL algorithm that directly learns from preference without requiring any reward modeling. To achieve this, we adopt a contrastive learning framework to design a novel policy scoring metric that assigns a high score to policies that align with the given preferences. We apply our algorithm to offline RL tasks with actual human preference labels and show that our algorithm outperforms or is on par with the existing PbRL methods. Notably, on high-dimensional control tasks, our algorithm surpasses offline RL methods that learn with ground-truth reward information. Finally, we show that our algorithm can be successfully applied to fine-tune large language models.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2301.12842

Genre: Research Report > New Finding (0.46)

Industry: Education (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

VIFS: An End-to-End Variational Inference for Foley Sound Synthesis

Lee, Junhyeok, Nam, Hyeonuk, Park, Yong-Hwa

arXiv.org Artificial IntelligenceJun-8-2023

The goal of DCASE 2023 Challenge Task 7 is to generate various sound clips for Foley sound synthesis (FSS) by "category-to-sound" approach. "Category" is expressed by a single index while corresponding "sound" covers diverse and different sound examples. To generate diverse sounds for a given category, we adopt VITS, a text-to-speech (TTS) model with variational inference. In addition, we apply various techniques from speech synthesis including PhaseAug and Avocodo. Different from TTS models which generate short pronunciation from phonemes and speaker identity, the category-to-sound problem requires generating diverse sounds just from a category index. To compensate for the difference while maintaining consistency within each audio clip, we heavily modified the prior encoder to enhance consistency with posterior latent variables. This introduced additional Gaussian on the prior encoder which promotes variance within the category. With these modifications, we propose VIFS, variational inference for end-to-end Foley sound synthesis, which generates diverse high-quality sounds.

artificial intelligence, category, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2306.05004

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.56)

Add feedback

PITS: Variational Pitch Inference without Fundamental Frequency for End-to-End Pitch-controllable TTS

Lee, Junhyeok, Jung, Wonbin, Cho, Hyunjae, Kim, Jaeyeon, Kim, Jaehwan

arXiv.org Artificial IntelligenceJun-6-2023

Previous pitch-controllable text-to-speech (TTS) models rely on directly modeling fundamental frequency, leading to low variance in synthesized speech. To address this issue, we propose PITS, an end-to-end pitch-controllable TTS model that utilizes variational inference to model pitch. Based on VITS, PITS incorporates the Yingram encoder, the Yingram decoder, and adversarial training of pitch-shifted synthesis to achieve pitch-controllability. Experiments demonstrate that PITS generates high-quality speech that is indistinguishable from ground truth speech and has high pitch-controllability without quality degradation. Code, audio samples, and demo are available at https://github.com/anonymous-pits/pits.

artificial intelligence, machine learning, speech, (15 more...)

arXiv.org Artificial Intelligence

2302.12391

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Speech (0.69)

Add feedback

PhaseAug: A Differentiable Augmentation for Speech Synthesis to Simulate One-to-Many Mapping

Lee, Junhyeok, Han, Seungu, Cho, Hyunjae, Jung, Wonbin

arXiv.org Artificial IntelligenceMar-13-2023

Previous generative adversarial network (GAN)-based neural vocoders are trained to reconstruct the exact ground truth waveform from the paired mel-spectrogram and do not consider the one-to-many relationship of speech synthesis. This conventional training causes overfitting for both the discriminators and the generator, leading to the periodicity artifacts in the generated audio signal. In this work, we present PhaseAug, the first differentiable augmentation for speech synthesis that rotates the phase of each frequency bin to simulate one-to-many mapping. With our proposed method, we outperform baselines without any architecture modification. Code and audio samples will be available at https://github.com/mindslab-ai/phaseaug.

artificial intelligence, machine learning, phaseaug, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10096374

2211.0461

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.93)

Add feedback

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

Lee, Junhyeok, Han, Seungu

arXiv.org Artificial IntelligenceJun-17-2021

In this work, we introduce NU-Wave, the first neural audio upsampling model to produce waveforms of sampling rate 48kHz from coarse 16kHz or 24kHz inputs, while prior works could generate only up to 16kHz. NU-Wave is the first diffusion probabilistic model for audio super-resolution which is engineered based on neural vocoders. NU-Wave generates high-quality audio that achieves high performance in terms of signal-to-noise ratio (SNR), log-spectral distance (LSD), and accuracy of the ABX test. In all cases, NU-Wave outperforms the baseline models despite the substantially smaller model capacity (3.0M parameters) than baselines (5.4-21%). The audio samples of our model are available at https://mindslab-ai.github.io/nuwave, and the code will be made available soon.

artificial intelligence, international conference, machine learning, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2021-36

2104.02321

Genre: Research Report (0.64)

Industry: Health & Medicine (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.62)

Add feedback

Low-Power Computer Vision: Status, Challenges, Opportunities

Alyamkin, Sergei, Ardi, Matthew, Berg, Alexander C., Brighton, Achille, Chen, Bo, Chen, Yiran, Cheng, Hsin-Pai, Fan, Zichen, Feng, Chen, Fu, Bo, Gauen, Kent, Goel, Abhinav, Goncharenko, Alexander, Guo, Xuyang, Ha, Soonhoi, Howard, Andrew, Hu, Xiao, Huang, Yuanjun, Kang, Donghyun, Kim, Jaeyoun, Ko, Jong Gook, Kondratyev, Alexander, Lee, Junhyeok, Lee, Seungjae, Lee, Suwoong, Li, Zichao, Liang, Zhiyu, Liu, Juzheng, Liu, Xin, Lu, Yang, Lu, Yung-Hsiang, Malik, Deeptanshu, Nguyen, Hong Hanh, Park, Eunbyung, Repin, Denis, Shen, Liang, Sheng, Tao, Sun, Fei, Svitov, David, Thiruvathukal, George K., Zhang, Baiwu, Zhang, Jingchi, Zhang, Xiaopeng, Zhuo, Shaojie

arXiv.org Artificial IntelligenceApr-15-2019

Computer vision has achieved impressive progress in recent years. Meanwhile, mobile phones have become the primary computing platforms for millions of people. In addition to mobile phones, many autonomous systems rely on visual data for making decisions and some of these systems have limited energy (such as unmanned aerial vehicles also called drones and mobile robots). These systems rely on batteries and energy efficiency is critical. This article serves two main purposes: (1) Examine the state-of-the-art for low-power solutions to detect objects in images. Since 2015, the IEEE Annual International Low-Power Image Recognition Challenge (LPIRC) has been held to identify the most energy-efficient computer vision solutions. This article summarizes 2018 winners' solutions. (2) Suggest directions for research as well as opportunities for low-power computer vision.

computer vision, deep learning, soccer, (24 more...)

arXiv.org Artificial Intelligence

1904.07714

Country:

North America > United States (0.46)
Asia (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology > Services (0.68)
Leisure & Entertainment > Sports > Soccer (0.46)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
(2 more...)

Add feedback