AITopics | Lee, Juheon

Plotting

Lee, Juheon

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System

Kim, Hyeongju, Yang, Jinhyeok, Yu, Yechan, Ji, Seunghun, Morton, Jacob, Bous, Frederik, Byun, Joon, Lee, Juheon

arXiv.org Artificial IntelligenceMar-29-2025

We present a novel text-to-speech (TTS) system, namely SupertonicTTS, for improved scalability and efficiency in speech synthesis. SupertonicTTS is comprised of three components: a speech autoencoder for continuous latent representation, a text-to-latent module leveraging flow-matching for text-to-latent mapping, and an utterance-level duration predictor. To enable a lightweight architecture, we employ a low-dimensional latent space, temporal compression of latents, and ConvNeXt blocks. We further simplify the TTS pipeline by operating directly on raw character-level text and employing cross-attention for text-speech alignment, thus eliminating the need for grapheme-to-phoneme (G2P) modules and external aligners. In addition, we introduce context-sharing batch expansion that accelerates loss convergence and stabilizes text-speech alignment. Experimental results demonstrate that SupertonicTTS achieves competitive performance while significantly reducing architectural complexity and computational overhead compared to contemporary TTS models. Audio samples demonstrating the capabilities of SupertonicTTS are available at: https://supertonictts.github.io/.

artificial intelligence, machine learning, representation, (14 more...)

arXiv.org Artificial Intelligence

2503.23108

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)

Add feedback

GraphCompNet: A Position-Aware Model for Predicting and Compensating Shape Deviations in 3D Printing

Lei, null, Chen, null, Lee, Juheon, Catana, Juan Carlos, Yhdego, Tsegai, Moroney, Nathan, Nabian, Mohammad Amin, Wang, Hui, Zeng, Jun

arXiv.org Artificial IntelligenceFeb-17-2025

This paper introduces a data-driven algorithm for modeling and compensating shape deviations in additive manufacturing (AM), addressing challenges in geometric accuracy and batch production. While traditional methods, such as analytical models and metrology, laid the groundwork for geometric precision, they are often impractical for large-scale production. Recent advancements in machine learning (ML) have improved compensation precision, but issues remain in generalizing across complex geometries and adapting to position-dependent variations. We present a novel approach for powder bed fusion (PBF) processes, using GraphCompNet, which is a computational framework combining graph-based neural networks with a generative adversarial network (GAN)-inspired training process. By leveraging point cloud data and dynamic graph convolutional neural networks (DGCNNs), GraphCompNet models complex shapes and incorporates position-specific thermal and mechanical factors. A two-stage adversarial training procedure iteratively refines compensated designs via a compensator-predictor architecture, offering real-time feedback and optimization. Experimental validation across diverse shapes and positions shows the framework significantly improves compensation accuracy (35 to 65 percent) across the entire print space, adapting to position-dependent variations. This work advances the development of Digital Twin technology for AM, enabling scalable, real-time monitoring and compensation, and addressing critical gaps in AM process control. The proposed method supports high-precision, automated industrial-scale design and manufacturing systems.

artificial intelligence, compensating shape deviation, machine learning, (3 more...)

arXiv.org Artificial Intelligence

2502.09652

Genre: Research Report (1.00)

Industry: Machinery > Industrial Machinery (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Neural varifolds: an aggregate representation for quantifying the geometry of point clouds

Lee, Juheon, Cai, Xiaohao, Schönlieb, Carola-Bibian, Masnou, Simon

arXiv.org Artificial IntelligenceJul-5-2024

Point clouds are popular 3D representations for real-life objects (such as in LiDAR and Kinect) due to their detailed and compact representation of surface-based geometry. Recent approaches characterise the geometry of point clouds by bringing deep learning based techniques together with geometric fidelity metrics such as optimal transportation costs (e.g., Chamfer and Wasserstein metrics). In this paper, we propose a new surface geometry characterisation within this realm, namely a neural varifold representation of point clouds. Here the surface is represented as a measure/distribution over both point positions and tangent spaces of point clouds. The varifold representation quantifies not only the surface geometry of point clouds through the manifold-based discrimination, but also subtle geometric consistencies on the surface due to the combined product space. This study proposes neural varifold algorithms to compute the varifold norm between two point clouds using neural networks on point clouds and their neural tangent kernel representations. The proposed neural varifold is evaluated on three different sought-after tasks -- shape matching, few-shot shape classification and shape reconstruction. Detailed evaluation and comparison to the state-of-the-art methods demonstrate that the proposed versatile neural varifold is superior in shape matching and few-shot shape classification, and is competitive for shape reconstruction.

artificial intelligence, machine learning, point cloud, (17 more...)

arXiv.org Artificial Intelligence

2407.04844

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Virtual Foundry Graphnet for Metal Sintering Deformation Prediction

Rachel, null, Chen, null, Lee, Juheon, Gan, Chuang, Yang, Zijiang, Nabian, Mohammad Amin, Zeng, Jun

arXiv.org Artificial IntelligenceApr-17-2024

Metal Sintering is a necessary step for Metal Injection Molded parts and binder jet such as HP's metal 3D printer. The metal sintering process introduces large deformation varying from 25 to 50% depending on the green part porosity. In this paper, we use a graph-based deep learning approach to predict the part deformation, which can speed up the deformation simulation substantially at the voxel level. Running a well-trained Metal Sintering inferencing engine only takes a range of seconds to obtain the final sintering deformation value. The tested accuracy on example complex geometry achieves 0.7um mean deviation for a 63mm testing part.

accuracy, artificial intelligence, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2404.11753

Country: North America > United States > California > Santa Clara County > Palo Alto (0.14)

Genre: Research Report (0.50)

Industry:

Education (0.66)
Machinery > Industrial Machinery (0.49)
Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations

Oh, Yoori, Lee, Juheon, Han, Yoseob, Lee, Kyogu

arXiv.org Artificial IntelligenceMay-29-2023

Recent text-to-speech models have reached the level of generating natural speech similar to what humans say. But there still have limitations in terms of expressiveness. The existing emotional speech synthesis models have shown controllability using interpolated features with scaling parameters in emotional latent space. However, the emotional latent space generated from the existing models is difficult to control the continuous emotional intensity because of the entanglement of features like emotions, speakers, etc. In this paper, we propose a novel method to control the continuous intensity of emotions using semi-supervised learning. The model learns emotions of intermediate intensity using pseudo-labels generated from phoneme-level sequences of speech information. An embedding space built from the proposed model satisfies the uniform grid geometry with an emotional basis. The experimental results showed that the proposed method was superior in controllability and naturalness.

artificial intelligence, emotion, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2211.0616

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.85)

Add feedback

Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

Choi, Hyeong-Seok, Lee, Juheon, Kim, Wansoo, Lee, Jie Hwan, Heo, Hoon, Lee, Kyogu

arXiv.org Artificial IntelligenceOct-28-2021

We present a neural analysis and synthesis (NANSY) framework that can manipulate voice, pitch, and speed of an arbitrary speech signal. Most of the previous works have focused on using information bottleneck to disentangle analysis features for controllable synthesis, which usually results in poor reconstruction quality. We address this issue by proposing a novel training strategy based on information perturbation. The idea is to perturb information in the original input signal (e.g., formant, pitch, and frequency response), thereby letting synthesis networks selectively take essential attributes to reconstruct the input signal. Because NANSY does not need any bottleneck structures, it enjoys both high reconstruction quality and controllability. Furthermore, NANSY does not require any labels associated with speech data such as text and speaker information, but rather uses a new set of analysis features, i.e., wav2vec feature and newly proposed pitch feature, Yingram, which allows for fully self-supervised training. Taking advantage of fully self-supervised training, NANSY can be easily extended to a multilingual setting by simply training it with a multilingual dataset. The experiments show that NANSY can achieve significant improvement in performance in several applications such as zero-shot voice conversion, pitch shift, and time-scale modification.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2110.14513

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

Audio Cover Song Identification using Convolutional Neural Network

Chang, Sungkyun, Lee, Juheon, Choe, Sang Keun, Lee, Kyogu

arXiv.org Artificial IntelligenceOct-26-2020

In this paper, we propose a new approach to cover song identification using a CNN (convolutional neural network). Most previous studies extract the feature vectors that characterize the cover song relation from a pair of songs and used it to compute the (dis)similarity between the two songs. Based on the observation that there is a meaningful pattern between cover songs and that this can be learned, we have reformulated the cover song identification problem in a machine learning framework. To do this, we first build the CNN using as an input a cross-similarity matrix generated from a pair of songs. We then construct the data set composed of cover song pairs and non-cover song pairs, which are used as positive and negative training samples, respectively. The trained CNN outputs the probability of being in the cover song relation given a cross-similarity matrix generated from any two pieces of music and identifies the cover song by ranking on the probability. Experimental results show that the proposed algorithm achieves performance better than or comparable to the state-of-the-art.

arXiv.org Artificial Intelligence

1712.00166

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)

Add feedback