AITopics | Hwang, Inchul

Collaborating Authors

Hwang, Inchul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improved Text Emotion Prediction Using Combined Valence and Arousal Ordinal Classification

Mitsios, Michael, Vamvoukakis, Georgios, Maniati, Georgia, Ellinas, Nikolaos, Dimitriou, Georgios, Markopoulos, Konstantinos, Kakoulidis, Panos, Vioni, Alexandra, Christidou, Myrsini, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Vardaxoglou, Georgios, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros

arXiv.org Artificial IntelligenceApr-2-2024

Emotion detection in textual data has received growing interest in recent years, as it is pivotal for developing empathetic human-computer interaction systems. This paper introduces a method for categorizing emotions from text, which acknowledges and differentiates between the diversified similarities and distinctions of various emotions. Initially, we establish a baseline by training a transformer-based model for standard emotion classification, achieving state-of-the-art performance. We argue that not all misclassifications are of the same importance, as there are perceptual similarities among emotional classes. We thus redefine the emotion labeling problem by shifting it from a traditional classification model to an ordinal classification one, where discrete emotions are arranged in a sequential order according to their valence levels. Finally, we propose a method that performs ordinal classification in the two-dimensional emotion space, considering both valence and arousal scales. The results show that our approach not only preserves high accuracy in emotion prediction but also significantly reduces the magnitude of errors in cases of misclassification.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2404.01805

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Low-Resource Cross-Domain Singing Voice Synthesis via Reduced Self-Supervised Speech Representations

Kakoulidis, Panos, Ellinas, Nikolaos, Vamvoukakis, Georgios, Christidou, Myrsini, Vioni, Alexandra, Maniati, Georgia, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Tsiakoulis, Pirros, Chalamandaris, Aimilios

arXiv.org Artificial IntelligenceFeb-2-2024

In this paper, we propose a singing voice synthesis model, Karaoker-SSL, that is trained only on text and speech data as a typical multi-speaker acoustic model. It is a low-resource pipeline that does not utilize any singing data end-to-end, since its vocoder is also trained on speech data. Karaoker-SSL is conditioned by self-supervised speech representations in an unsupervised manner. We preprocess these representations by selecting only a subset of their task-correlated dimensions. The conditioning module is indirectly guided to capture style information during training by multi-tasking. This is achieved with a Conformer-based module, which predicts the pitch from the acoustic model's output. Thus, Karaoker-SSL allows singing voice synthesis without reliance on hand-crafted and domain-specific features. There are also no requirements for text alignments or lyrics timestamps. To refine the voice quality, we employ a U-Net discriminator that is conditioned on the target speaker and follows a Diffusion GAN training scheme.

artificial intelligence, machine learning, proc, (14 more...)

arXiv.org Artificial Intelligence

2402.0152

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.97)

Add feedback

Generating Multilingual Gender-Ambiguous Text-to-Speech Voices

Markopoulos, Konstantinos, Maniati, Georgia, Vamvoukakis, Georgios, Ellinas, Nikolaos, Vardaxoglou, Georgios, Kakoulidis, Panos, Oh, Junkwang, Jho, Gunu, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros, Raptis, Spyros

arXiv.org Artificial IntelligenceJun-11-2023

The gender of any voice user interface is a key element of its perceived identity. Recently, there has been increasing interest in interfaces where the gender is ambiguous rather than clearly identifying as female or male. This work addresses the task of generating novel gender-ambiguous TTS voices in a multi-speaker, multilingual setting. This is accomplished by efficiently sampling from a latent speaker embedding space using a proposed gender-aware method. Extensive objective and subjective evaluations clearly indicate that this method is able to efficiently generate a range of novel, diverse voices that are consistent and perceived as more gender-ambiguous than a baseline voice across all the languages examined. Interestingly, the gender perception is found to be robust across two demographic factors of the listeners: native language and gender. To our knowledge, this is the first systematic and validated approach that can reliably generate a variety of gender-ambiguous voices.

artificial intelligence, machine learning, optical character recognition, (17 more...)

arXiv.org Artificial Intelligence

2211.00375

Genre:

Research Report (0.64)
Questionnaire & Opinion Survey (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.52)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.51)

Add feedback

Investigating Content-Aware Neural Text-To-Speech MOS Prediction Using Prosodic and Linguistic Features

Vioni, Alexandra, Maniati, Georgia, Ellinas, Nikolaos, Sung, June Sig, Hwang, Inchul, Chalamandaris, Aimilios, Tsiakoulis, Pirros

arXiv.org Artificial IntelligenceMay-7-2023

Current state-of-the-art methods for automatic synthetic speech evaluation are based on MOS prediction neural models. Such MOS prediction models include MOSNet and LDNet that use spectral features as input, and SSL-MOS that relies on a pretrained self-supervised learning model that directly uses the speech signal as input. In modern high-quality neural TTS systems, prosodic appropriateness with regard to the spoken content is a decisive factor for speech naturalness. For this reason, we propose to include prosodic and linguistic features as additional inputs in MOS prediction systems, and evaluate their impact on the prediction outcome. We consider phoneme level F0 and duration features as prosodic inputs, as well as Tacotron encoder outputs, POS tags and BERT embeddings as higher-level linguistic inputs. All MOS prediction systems are trained on SOMOS, a neural TTS-only dataset with crowdsourced naturalness MOS evaluations. Results show that the proposed additional features are beneficial in the MOS prediction task, by improving the predicted MOS scores' correlation with the ground truths, both at utterance-level and system-level predictions.

machine learning, natural language, ssl-mo, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49357.2023.10096255

2211.00342

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation

Han, Hyojung, Indurthi, Sathish, Zaidi, Mohd Abbas, Lakumarapu, Nikhil Kumar, Lee, Beomseok, Kim, Sangha, Kim, Chanwoo, Hwang, Inchul

arXiv.org Artificial IntelligenceDec-29-2020

Recently, simultaneous translation has gathered a lot of attention since it enables compelling applications such as subtitle translation for a live event or real-time video-call translation. Some of these translation applications allow editing of partial translation giving rise to re-translation approaches. The current re-translation approaches are based on autoregressive sequence generation models (ReTA), which generate tar-get tokens in the (partial) translation sequentially. The multiple re-translations with sequential generation inReTAmodelslead to an increased inference time gap between the incoming source input and the corresponding target output as the source input grows. Besides, due to the large number of inference operations involved, the ReTA models are not favorable for resource-constrained devices. In this work, we propose a faster re-translation system based on a non-autoregressive sequence generation model (FReTNA) to overcome the aforementioned limitations. We evaluate the proposed model on multiple translation tasks and our model reduces the inference times by several orders and achieves a competitive BLEUscore compared to the ReTA and streaming (Wait-k) models.The proposed model reduces the average computation time by a factor of 20 when compared to the ReTA model by incurring a small drop in the translation quality. It also outperforms the streaming-based Wait-k model both in terms of computation time (1.5 times lower) and translation quality.

artificial intelligence, machine translation, translation, (18 more...)

arXiv.org Artificial Intelligence

2012.14681

Country:

Asia > India (0.17)
Asia > South Korea (0.14)
Europe > Belgium (0.14)
(2 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Ensemble-Based Deep Reinforcement Learning for Chatbots

Cuayáhuitl, Heriberto, Lee, Donghyeon, Ryu, Seonghan, Cho, Yongjin, Choi, Sungja, Indurthi, Satish, Yu, Seunghak, Choi, Hyungtak, Hwang, Inchul, Kim, Jihie

arXiv.org Artificial IntelligenceAug-27-2019

Such an agent is typically characterised by: (i) a finite set of states 6 S {s i} that describe all possible situations in the environment; (ii) a finite set of actions A {a j} to change in the environment from one situation to another; (iii) a state transition function T (s,a,s null) that specifies the next state s null for having taken action a in the current state s; (iv) a reward function R (s,a,s null) that specifies a numerical value given to the agent for taking action a in state s and transitioning to state s null; and (v) a policy π: S A that defines a mapping from states to actions [2, 30]. The goal of a reinforcement learning agent is to find an optimal policy by maximising its cumulative discounted reward defined as Q (s,a) max π E[r t γr t 1 γ 2 r t 1 ... s t s,a t a,π ], where function Q represents the maximum sum of rewards r t discounted by factor γ at each time step. While a reinforcement learning agent takes actions with probability Pr ( a s) during training, it selects the best action at test time according to π (s) arg max a A Q (s,a). A deep reinforcement learning agent approximates Q using a multi-layer neural network [31]. The Q function is parameterised as Q(s,a; θ), where θ are the parameters or weights of the neural network (recurrent neural network in our case). Estimating these weights requires a dataset of learning experiences D {e 1,...e N} (also referred to as'experience replay memory'), where every experience is described as a tuple e t ( s t,a t,r t,s t 1). Inducing a Q function consists in applying Q-learning updates over minibatches of experience MB {( s,a,r,s null) U (D)} drawn uniformly at random from the full dataset D . This process is implemented in learning algorithms using Deep Q-Networks (DQN) such as those described in [31, 32, 33], and the following section describes a DQN-based algorithm for human-chatbot interaction.

deep learning, dialogue, neural network, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2019.08.007

1908.10422

Country:

Oceania > Australia (0.14)
North America > United States (0.14)
Europe (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards

Cuayáhuitl, Heriberto, Lee, Donghyeon, Ryu, Seonghan, Choi, Sungja, Hwang, Inchul, Kim, Jihie

arXiv.org Artificial IntelligenceAug-27-2019

Training chatbots using the reinforcement learning paradigm is challenging due to high-dimensional states, infinite action spaces and the difficulty in specifying the reward function. We address such problems using clustered actions instead of infinite actions, and a simple but promising reward function based on human-likeness scores derived from human-human dialogue data. We train Deep Reinforcement Learning (DRL) agents using chitchat data in raw text---without any manual annotations. Experimental results using different splits of training data report the following. First, that our agents learn reasonable policies in the environments they get familiarised with, but their performance drops substantially when they are exposed to a test set of unseen dialogues. Second, that the choice of sentence embedding size between 100 and 300 dimensions is not significantly different on test data. Third, that our proposed human-likeness rewards are reasonable for training chatbots as long as they use lengthy dialogue histories of >=10 sentences.

deep learning, dialogue, neural network, (21 more...)

arXiv.org Artificial Intelligence

1908.10331

Country:

Asia > South Korea (0.15)
Europe > United Kingdom (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Chatti: A Conversational Chatbot Platform

AAAI ConferencesApr-6-2018

We demonstrate the conversational Chatbot platform named Chatti which supports developers with a tool to develop their chatbot easily without full understanding technologies inside a conversational chatbot. To develop a chatbot with Chatti, a developer inputs customized domain data and deploys his Chatbot with a tool. Then users can interact with the Chatbot based on natural language conversation via messengers and so on. Chatti includes natural language understanding, dialog management, action planning, natural language generation and chitchat component which run on g models learned from developers' input data as in common in conversational assistants such as Bixby, Siri, Alexa and etc. With Chatti, the developer could make his Chatbot support two types of conversation simultaneously – basic chitchat and task-oriented dialog. In contrast to prior chatbot building tools are mainly focused on the Natural Language Understanding, Chatti is more focused on full dialog system – dialog management, action planning, natural language generation and chitchat. We believe Chatti could accelerate a wide possibility of conversational Chatbot for services as well as IoT devices.

chatti, conversational chatbot platform

AAAI Conferences

Workshops at the Thirty-Second AAAI Conference on Artificial Intelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

An Intelligent Dialogue Agent for the IoT Home

Jeon, Heesik (Samsung Electronics) | Oh, Hyung Rai (Samsung Electronics) | Hwang, Inchul (Samsung Electronics) | Kim, Jihie (Samsung Electronics)

AAAI ConferencesApr-12-2016

In this paper, we propose an intelligent dialogue agent for the IoT home. The goal of the proposed system is to efficiently control IoT devices with natural spoken dialogue. This system is made up of the following components: Spoken Language Understanding for analyzing textual input and understanding user intention, Dialogue Management with a State Manager that consists of dialogue policies, Context Manager for understanding the environment, Action Planner responsible for generating a sequence of actions to achieve user intention, Things Manager for observing and controlling IoT devices, and Natural Language Generation that generates natural language from computer-based representation. This system is fully implemented in software and is evaluated in a real IoT home environment.

Add feedback