AITopics | Konan, Joseph

Collaborating Authors

Konan, Joseph

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

aoip.ai: An Open-Source P2P SDK

Konan, Joseph, Agnihotri, Shikhar, Hsieh, Chia-Chun

arXiv.org Artificial IntelligenceDec-1-2023

This white paper introduces aoip.ai, a groundbreaking open-source SDK incorporating peer-to-peer technology and advanced AI integration to transform VoIP and IoT applications. It addresses key market challenges by enhancing data security, elevating communication quality, and providing greater flexibility for developers and users. Developed in collaboration with Carnegie Mellon University, aoip.ai sets a new standard for decentralized and democratized communication solutions.

artificial intelligence, machine learning, speech recognition, (20 more...)

arXiv.org Artificial Intelligence

2312.14934

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Software (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Internet of Things (1.00)
(3 more...)

Add feedback

Psychoacoustic Challenges Of Speech Enhancement On VoIP Platforms

Konan, Joseph, Bhargave, Ojas, Agnihotri, Shikhar, Han, Shuo, Zeng, Yunyang, Shah, Ankit, Raj, Bhiksha

arXiv.org Artificial IntelligenceNov-21-2023

Within the ambit of VoIP (Voice over Internet Protocol) telecommunications, the complexities introduced by acoustic transformations merit rigorous analysis. This research, rooted in the exploration of proprietary sender-side denoising effects, meticulously evaluates platforms such as Google Meets and Zoom. The study draws upon the Deep Noise Suppression (DNS) 2020 dataset, ensuring a structured examination tailored to various denoising settings and receiver interfaces. A methodological novelty is introduced via the Oaxaca decomposition, traditionally an econometric tool, repurposed herein to analyze acoustic-phonetic perturbations within VoIP systems. To further ground the implications of these transformations, psychoacoustic metrics, specifically PESQ and STOI, were harnessed to furnish a comprehensive understanding of speech alterations. Cumulatively, the insights garnered underscore the intricate landscape of VoIP-influenced acoustic dynamics. In addition to the primary findings, a multitude of metrics are reported, extending the research purview. Moreover, out-of-domain benchmarking for both time and time-frequency domain speech enhancement models is included, thereby enhancing the depth and applicability of this inquiry. Repository: github.com/deepology/VoIP-DNS-Challenge

artificial intelligence, google meet, speech recognition, (16 more...)

arXiv.org Artificial Intelligence

2310.07161

Country: North America > Mexico > Oaxaca (0.26)

Genre: Research Report (1.00)

Industry: Telecommunications (0.49)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)

Add feedback

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

Shah, Muhammad Ahmed, Sharma, Roshan, Dhamyal, Hira, Olivier, Raphael, Shah, Ankit, Konan, Joseph, Alharthi, Dareen, Bukhari, Hazim T, Baali, Massa, Deshmukh, Soham, Kuhlmann, Michael, Raj, Bhiksha, Singh, Rita

arXiv.org Artificial IntelligenceOct-21-2023

It has been shown that Large Language Model (LLM) alignments can be circumvented by appending specially crafted attack suffixes with harmful queries to elicit harmful responses. To conduct attacks against private target models whose characterization is unknown, public models can be used as proxies to fashion the attack, with successful attacks being transferred from public proxies to private target models. The success rate of attack depends on how closely the proxy model approximates the private model. We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query. Therefore, in this paper, we propose \emph{Local Fine-Tuning (LoFT)}, \textit{i.e.}, fine-tuning proxy models on similar queries that lie in the lexico-semantic neighborhood of harmful queries to decrease the divergence between the proxy and target models. First, we demonstrate three approaches to prompt private target models to obtain similar queries given harmful queries. Next, we obtain data for local fine-tuning by eliciting responses from target models for the generated similar queries. Then, we optimize attack suffixes to generate attack prompts and evaluate the impact of our local fine-tuning on the attack's success rate. Experiments show that local fine-tuning of proxy models improves attack transferability and increases attack success rate by $39\%$, $7\%$, and $0.5\%$ (absolute) on target models ChatGPT, GPT-4, and Claude respectively.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2310.04445

Genre:

Questionnaire & Opinion Survey (0.69)
Research Report (0.64)

Industry:

Banking & Finance > Credit (0.68)
Information Technology > Security & Privacy (0.66)
Government > Military (0.42)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Improving Perceptual Quality, Intelligibility, and Acoustics on VoIP Platforms

Konan, Joseph, Bhargave, Ojas, Agnihotri, Shikhar, Lee, Hojeong, Shah, Ankit, Han, Shuo, Zeng, Yunyang, Shu, Amanda, Liu, Haohui, Chang, Xuankai, Khalid, Hamza, Gwak, Minseon, Lee, Kawon, Kim, Minjeong, Raj, Bhiksha

arXiv.org Artificial IntelligenceMar-15-2023

In this paper, we present a method for fine-tuning models trained on the Deep Noise Suppression (DNS) 2020 Challenge to improve their performance on Voice over Internet Protocol (VoIP) applications. Our approach involves adapting the DNS 2020 models to the specific acoustic characteristics of VoIP communications, which includes distortion and artifacts caused by compression, transmission, and platform-specific processing. To this end, we propose a multi-task learning framework for VoIP-DNS that jointly optimizes noise suppression and VoIP-specific acoustics for speech enhancement. We evaluate our approach on a diverse VoIP scenarios and show that it outperforms both industry performance and state-of-the-art methods for speech enhancement on VoIP applications. Our results demonstrate the potential of models trained on DNS-2020 to be improved and tailored to different VoIP platforms using VoIP-DNS, whose findings have important applications in areas such as speech recognition, voice assistants, and telecommunication.

application, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.09048

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Telecommunications (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Speech Enhancement for Virtual Meetings on Cellular Networks

Lee, Hojeong, Gwak, Minseon, Lee, Kawon, Kim, Minjeong, Konan, Joseph, Bhargave, Ojas

arXiv.org Artificial IntelligenceFeb-16-2023

We study speech enhancement using deep learning (DL) for virtual meetings on cellular devices, where transmitted speech has background noise and transmission loss that affects speech quality. Since the Deep Noise Suppression (DNS) Challenge dataset of Interspeech 2020 does not contain practical disturbance, we collect a transmitted DNS (t-DNS) dataset using Zoom Meetings over T-Mobile network. We select two baseline models: Demucs and FullSubNet. The Demucs is an endto-end model that takes time-domain inputs and outputs time-domain denoised speech, and the FullSubNet takes time-frequency-domain inputs and outputs the energy ratio of the target speech in the inputs. The goal of this project is to enhance the speech transmitted over the cellular networks using deep learning models.

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

2302.00868

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

Yang, Muqiao, Konan, Joseph, Bick, David, Zeng, Yunyang, Han, Shuo, Kumar, Anurag, Watanabe, Shinji, Raj, Bhiksha

arXiv.org Artificial IntelligenceFeb-16-2023

Despite rapid advancement in recent years, current speech enhancement models often produce speech that differs in perceptual quality from real clean speech. We propose a learning objective that formalizes differences in perceptual quality, by using domain knowledge of acoustic-phonetics. We identify temporal acoustic parameters -- such as spectral tilt, spectral flux, shimmer, etc. -- that are non-differentiable, and we develop a neural network estimator that can accurately predict their time-series values across an utterance. We also model phoneme-specific weights for each feature, as the acoustic parameters are known to show different behavior in different phonemes. We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features. Experimentally we show that it improves speech enhancement workflows in both time-domain and time-frequency domain, as measured by standard evaluation metrics. We also provide an analysis of phoneme-dependent improvement on acoustic parameters, demonstrating the additional interpretability that our method provides. This analysis can suggest which features are currently the bottleneck for improvement.

acoustic parameter, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2302.08095

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement

Zeng, Yunyang, Konan, Joseph, Han, Shuo, Bick, David, Yang, Muqiao, Kumar, Anurag, Watanabe, Shinji, Raj, Bhiksha

arXiv.org Artificial IntelligenceFeb-15-2023

Speech enhancement models have greatly progressed in recent years, but still show limits in perceptual quality of their speech outputs. We propose an objective for perceptual quality based on temporal acoustic parameters. These are fundamental speech features that play an essential role in various applications, including speaker recognition and paralinguistic analysis. We provide a differentiable estimator for four categories of low-level acoustic descriptors involving: frequency-related parameters, energy or amplitude-related parameters, spectral balance parameters, and temporal features. Unlike prior work that looks at aggregated acoustic parameters or a few categories of acoustic parameters, our temporal acoustic parameter (TAP) loss enables auxiliary optimization and improvement of many fine-grain speech characteristics in enhancement workflows. We show that adding TAPLoss as an auxiliary objective in speech enhancement produces speech with improved perceptual quality and intelligibility. We use data from the Deep Noise Suppression 2020 Challenge to demonstrate that both time-domain models and time-frequency domain models can benefit from our method.

acoustic parameter, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.08088

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Cellular Network Speech Enhancement: Removing Background and Transmission Noise

Shu, Amanda, Khalid, Hamza, Liu, Haohui, Agnihotri, Shikhar, Konan, Joseph, Bhargave, Ojas

arXiv.org Artificial IntelligenceJan-21-2023

The primary objective of speech enhancement is to reduce background noise while preserving the target's speech. A common dilemma occurs when a speaker is confined to a noisy environment and receives a call with high background and transmission noise. To address this problem, the Deep Noise Suppression (DNS) Challenge focuses on removing the background noise with the next-generation deep learning models to enhance the target's speech; however, researchers fail to consider Voice Over IP (VoIP) applications their transmission noise. Focusing on Google Meet and its cellular application, our work achieves state-of-the-art performance on the Google Meet To Phone Track of the VoIP DNS Challenge. This paper demonstrates how to beat industrial performance and achieve 1.92 PESQ and 0.88 STOI, as well as superior acoustic fidelity, perceptual quality, and intelligibility in various metrics.

artificial intelligence, machine learning, noise, (17 more...)

arXiv.org Artificial Intelligence

2301.09027

Genre: Research Report (0.50)

Industry: Telecommunications (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback