AITopics | Wang, Quan

Collaborating Authors

Wang, Quan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interpretable and Efficient Heterogeneous Graph Convolutional Network

Yang, Yaming, Guan, Ziyu, Li, Jianxin, Zhao, Wei, Cui, Jiangtao, Wang, Quan

arXiv.org Machine LearningJun-22-2020

Graph Convolutional Network (GCN) has achieved extraordinary success in learning effective task-specific representations of nodes in graphs. However, regarding Heterogeneous Information Network (HIN), existing HIN-oriented GCN methods still suffer from two deficiencies: (1) they cannot flexibly explore all possible meta-paths and extract the most useful ones for a target object, which hinders both effectiveness and interpretability; (2) they often need to generate intermediate meta-path based dense graphs, which leads to high computational complexity. To address the above issues, we propose an interpretable and efficient Heterogeneous Graph Convolutional Network (ie-HGCN) to learn the representations of objects in HINs. It is designed as a hierarchical aggregation architecture, i.e., object-level aggregation first, followed by type-level aggregation. The novel architecture can automatically extract useful meta-paths for each object from all possible meta-paths (within a length limit), which brings good model interpretability. It can also reduce the computational cost by avoiding intermediate HIN transformation and neighborhood attention. We provide theoretical analysis about the proposed ie-HGCN in terms of evaluating the usefulness of all possible meta-paths, its connection to the spectral graph convolution on HINs, and its quasi-linear time complexity. Extensive experiments on three real network datasets demonstrate the superiority of ie-HGCN over the state-of-the-art methods.

deep learning, neural network, representation, (20 more...)

arXiv.org Machine Learning

2005.13183

Country: Asia > China > Shaanxi Province (0.14)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Communications (0.93)
Information Technology > Information Management (0.93)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Personal VAD: Speaker-Conditioned Voice Activity Detection

Ding, Shaojin, Wang, Quan, Chang, Shuo-yiin, Wan, Li, Moreno, Ignacio Lopez

arXiv.org Machine LearningAug-12-2019

ABSTRACT In this paper, we propose "personal V AD", a system to detect the voice activity of a target speaker at the frame level. This system is useful for gating the inputs to a streaming speech recognition system, such that it only triggers for the target user, which helps reduce the computational cost and battery consumption. We achieve this by training a V ADalike neural network that is conditioned on the target speaker embedding or the speaker verification score. With our optimal setup, we are able to train a 130KB model that outperforms a baseline system where individually trained standard V AD and speaker recognition network are combined to perform the same task. Index T erms-- Personal V AD, voice activity detection, speaker recognition, speech recognition 1. INTRODUCTION In modern speech processing systems, voice activity detection (V AD) usually lives in the upstream of other speech components such as speech recognition and speaker recognition. As a gating module, V AD not only improves the performance of downstream components by discarding non-speech signal, but also significantly reduces the overall computational cost due to its relatively small size.

deep learning, speech recognition, target speaker, (18 more...)

arXiv.org Machine Learning

1908.04284

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Information Technology (0.46)
Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Jia, Ye, Zhang, Yu, Weiss, Ron, Wang, Quan, Shen, Jonathan, Ren, Fei, Chen, zhifeng, Nguyen, Patrick, Pang, Ruoming, Moreno, Ignacio Lopez, Wu, Yonghui

Neural Information Processing SystemsDec-31-2018

We describe a neural network-based system for text-to-speech (TTS) synthesis that is able to generate speech audio in the voice of many different speakers, including those unseen during training. Our system consists of three independently trained components: (1) a speaker encoder network, trained on a speaker verification task using an independent dataset of noisy speech from thousands of speakers without transcripts, to generate a fixed-dimensional embedding vector from seconds of reference speech from a target speaker; (2) a sequence-to-sequence synthesis network based on Tacotron 2, which generates a mel spectrogram from text, conditioned on the speaker embedding; (3) an auto-regressive WaveNet-based vocoder that converts the mel spectrogram into a sequence of time domain waveform samples. We demonstrate that the proposed model is able to transfer the knowledge of speaker variability learned by the discriminatively-trained speaker encoder to the new task, and is able to synthesize natural speech from speakers that were not seen during training. We quantify the importance of training the speaker encoder on a large and diverse speaker set in order to obtain the best generalization performance. Finally, we show that randomly sampled speaker embeddings can be used to synthesize speech in the voice of novel speakers dissimilar from those used in training, indicating that the model has learned a high quality speaker representation.

acoustic processing, speech, speech synthesis, (22 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.86)

Add feedback

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis

Jia, Ye, Zhang, Yu, Weiss, Ron, Wang, Quan, Shen, Jonathan, Ren, Fei, Chen, zhifeng, Nguyen, Patrick, Pang, Ruoming, Moreno, Ignacio Lopez, Wu, Yonghui

Neural Information Processing SystemsDec-31-2018

acoustic processing, speech, speech synthesis, (22 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (0.86)

Add feedback

Tuplemax Loss for Language Identification

Wan, Li, Sridhar, Prashant, Yu, Yang, Wang, Quan, Moreno, Ignacio Lopez

arXiv.org Machine LearningNov-29-2018

ABSTRACT In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages. We want to model such prior knowledge into the way we train our neural networks, by replacing the commonly used softmax loss function with a novel loss function named tuplemax loss. As a matter of fact, a typical language identification system launched in North America has about 95% users who could speak no more than two languages. Using the tuplemax loss, our system achieved a 2 . Index Terms-- Language identification, tuplemax loss, LSTM 1. INTRODUCTION Large vocabulary continuous speech recognition (L VCSR) systems are becoming increasingly relevant for industry, tracking the technological trend toward increased human interaction using voice-operated devices [1].

deep learning, neural network, tuplemax loss, (19 more...)

arXiv.org Machine Learning

1811.1229

Country: North America (0.34)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fully Supervised Speaker Diarization

Zhang, Aonan, Wang, Quan, Zhu, Zhenyao, Paisley, John, Wang, Chong

arXiv.org Machine LearningOct-27-2018

In this paper, we propose a fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN). Given extracted speaker-discriminative embeddings (a.k.a. d-vectors) from input utterances, each individual speaker is modeled by a parameter-sharing RNN, while the RNN states for different speakers interleave in the time domain. This RNN is naturally integrated with a distance-dependent Chinese restaurant process (ddCRP) to accommodate an unknown number of speakers. Our system is fully supervised and is able to learn from examples where time-stamped speaker labels are annotated. We achieved a 7.6% diarization error rate on NIST SRE 2000 CALLHOME, which is better than the state-of-the-art method using spectral clustering. Moreover, our method decodes in an online fashion while most state-of-the-art systems rely on offline clustering.

deep learning, neural network, utterance, (18 more...)

arXiv.org Machine Learning

1810.04719

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)

Add feedback

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

Wang, Quan, Muckenhirn, Hannah, Wilson, Kevin, Sridhar, Prashant, Wu, Zelin, Hershey, John, Saurous, Rif A., Weiss, Ron J., Jia, Ye, Moreno, Ignacio Lopez

arXiv.org Machine LearningOct-27-2018

ABSTRACT In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker. We achieve this by training two separate neural networks: (1) A speaker recognition network that produces speaker-discriminative embeddings; (2) A spectrogram masking network that takes both noisy spectrogram and speaker embedding as input, and produces a mask. Our system significantly reduces the speech recognition WER on multi-speaker signals, with minimal WER degradation on single-speaker signals. Index Terms-- Source separation, speaker recognition, spectrogram masking, speech recognition 1. INTRODUCTION Recent advances in speech recognition have led to performance improvement in challenging scenarios such as noisy and far-field conditions. However, speech recognition systems still perform poorly when the speaker of interest is recorded in crowded environments, i.e., with interfering speakers in the foreground or background. One way to deal with this issue is to first apply a speech separation system on the noisy audio in order to separate the voices from different speakers.

deep learning, speech recognition, voicefilter, (20 more...)

arXiv.org Machine Learning

1810.04826

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.83)

Add feedback

Sample Efficient Adaptive Text-to-Speech

Chen, Yutian, Assael, Yannis, Shillingford, Brendan, Budden, David, Reed, Scott, Zen, Heiga, Wang, Quan, Cobo, Luis C., Trask, Andrew, Laurie, Ben, Gulcehre, Caglar, Oord, Aäron van den, Vinyals, Oriol, de Freitas, Nando

arXiv.org Machine LearningSep-27-2018

We present a meta-learning approach for adaptive text-to-speech (TTS) with few data. During training, we learn a multi-speaker model using a shared conditional WaveNet core and independent learned embeddings for each speaker. The aim of training is not to produce a neural network with fixed weights, which is then deployed as a TTS system. Instead, the aim is to produce a network that requires few data at deployment time to rapidly adapt to new speakers. We introduce and benchmark three strategies: (i) learning the speaker embedding while keeping the WaveNet core fixed, (ii) fine-tuning the entire architecture with stochastic gradient descent, and (iii) predicting the speaker embedding with a trained neural network encoder. The experiments show that these approaches are successful at adapting the multi-speaker neural network to new speakers, obtaining state-of-the-art results in both sample naturalness and voice similarity with merely a few minutes of audio data from new speakers.

deep learning, neural network, utterance, (19 more...)

arXiv.org Machine Learning

1809.1046

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Improving Knowledge Graph Embedding Using Simple Constraints

Ding, Boyang, Wang, Quan, Wang, Bin, Guo, Li

arXiv.org Artificial IntelligenceMay-7-2018

Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Early works performed this task via simple models developed over KG triples. Recent attempts focused on either designing more complicated triple scoring models, or incorporating extra information beyond triples. This paper, by contrast, investigates the potential of using very simple constraints to improve KG embedding. We examine non-negativity constraints on entity representations and approximate entailment constraints on relation representations. The former help to learn compact and interpretable representations for entities. The latter further encode regularities of logical entailment between relations into their distributed representations. These constraints impose prior beliefs upon the structure of the embedding space, without negative impacts on efficiency or scalability. Evaluation on WordNet, Freebase, and DBpedia shows that our approach is simple yet surprisingly effective, significantly and consistently outperforming competitive baselines. The constraints imposed indeed improve model interpretability, leading to a substantially increased structuring of the embedding space. Code and data are available at https://github.com/iieir-km/ComplEx-NNE_AER.

neural network, representation, survey article, (18 more...)

arXiv.org Artificial Intelligence

1805.02408

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Knowledge Graph Embedding With Iterative Guidance From Soft Rules

Guo, Shu (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Quan (Institute of Information Engineering, Chinese Academy of Sciences) | Wang, Lihong (National Computer Network Emergency Response Technical Team &amp) | Wang, Bin (Coordination Center of China) | Guo, Li (Institute of Information Engineering, Chinese Academy of Sciences)

AAAI ConferencesFeb-8-2018

Embedding knowledge graphs (KGs) into continuous vector spaces is a focus of current research. Combining such an embedding model with logic rules has recently attracted increasing attention. Most previous attempts made a one-time injection of logic rules, ignoring the interactive nature between embedding learning and logical inference. And they focused only on hard rules, which always hold with no exception and usually require extensive manual effort to create or validate. In this paper, we propose Rule-Guided Embedding (RUGE), a novel paradigm of KG embedding with iterative guidance from soft rules. RUGE enables an embedding model to learn simultaneously from 1) labeled triples that have been directly observed in a given KG, 2) unlabeled triples whose labels are going to be predicted iteratively, and 3) soft rules with various confidence levels extracted automatically from the KG. In the learning process, RUGE iteratively queries rules to obtain soft labels for unlabeled triples, and integrates such newly labeled triples to update the embedding model. Through this iterative procedure, knowledge embodied in logic rules may be better transferred into the learned embeddings. We evaluate RUGE in link prediction on Freebase and YAGO. Experimental results show that: 1) with rule knowledge injected iteratively, RUGE achieves significant and consistent improvements over state-of-the-art baselines; and 2) despite their uncertainties, automatically extracted soft rules are highly beneficial to KG embedding, even those with moderate confidence levels. The code and data used for this paper can be obtained from https://github.com/iieir-km/RUGE.

logic rule, neural network, survey article, (19 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback