AITopics | Keshet, Joseph

Dr.VOT : Measuring Positive and Negative Voice Onset Time in the Wild

Shrem, Yosi, Goldrick, Matthew, Keshet, Joseph

arXiv.org Machine LearningOct-27-2019

Voice Onset Time (VOT), a key measurement of speech for basic research and applied medical studies, is the time between the onset of a stop burst and the onset of voicing. When the voicing onset precedes burst onset the VOT is negative; if voicing onset follows the burst, it is positive. In this work, we present a deep-learning model for accurate and reliable measurement of VOT in naturalistic speech. The proposed system addresses two critical issues: it can measure positive and negative VOT equally well, and it is trained to be robust to variation across annotations. Our approach is based on the structured prediction framework, where the feature functions are defined to be RNNs. These learn to capture segmental variation in the signal. Results suggest that our method substantially improves over the current state-of-the-art. In contrast to previous work, our Deep and Robust VOT annotator, Dr.VOT, can successfully estimate negative VOTs while maintaining state-of-the-art performance on positive VOTs. This high level of performance generalizes to new corpora without further retraining. Index Terms: structured prediction, multi-task learning, adversarial training, recurrent neural networks, sequence segmentation.

deep learning, neural network, vot, (20 more...)

arXiv.org Machine Learning

1910.13255

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SpeechYOLO: Detection and Localization of Speech Objects

Segal, Yael, Fuchs, Tzeviya Sylvia, Keshet, Joseph

arXiv.org Machine LearningApr-14-2019

In this paper, we propose to apply object detection methods from the vision domain on the speech recognition domain, by treating audio fragments as objects. More specifically, we present SpeechYOLO, which is inspired by the YOLO algorithm for object detection in images. The goal of SpeechYOLO is to localize boundaries of utterances within the input signal, and to correctly classify them. Our system is composed of a convolutional neural network, with a simple least-mean-squares loss function. We evaluated the system on several keyword spotting tasks, that include corpora of read speech and spontaneous speech. Our system compares favorably with other algorithms trained for both localization and classification.

deep learning, speech recognition, speechyolo, (20 more...)

arXiv.org Machine Learning

1904.07704

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Hide and Speak: Deep Neural Networks for Speech Steganography

Kreuk, Felix, Adi, Yossi, Raj, Bhiksha, Singh, Rita, Keshet, Joseph

arXiv.org Machine LearningFeb-7-2019

Steganography is the science of hiding a secret message within an ordinary public message, which referred to as Carrier. Traditionally, digital signal processing techniques, such as least significant bit encoding, were used for hiding messages. In this paper, we explore the use of deep neural networks as steganographic functions for speech data. To this end, we propose to jointly optimize two neural networks: the first network encodes the message inside a carrier, while the second network decodes the message from the modified carrier. We demonstrated the effectiveness of our method on several speech data-sets and analyzed the results quantitatively and qualitatively. Moreover, we showed that our approach could be applied to conceal multiple messages in a single carrier using multiple decoders or a single conditional decoder. Qualitative experiments suggest that modifications to the carrier are unnoticeable by human listeners and that the decoded messages are highly intelligible.

artificial intelligence, deep learning, neural network, (20 more...)

arXiv.org Machine Learning

1902.03083

Country: North America > United States (0.46)

Genre: Research Report > Experimental Study (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Out-of-Distribution Detection using Multiple Semantic Label Representations

Shalev, Gabi, Adi, Yossi, Keshet, Joseph

Neural Information Processing SystemsDec-31-2018

Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks. These models are shown to be extremely efficient when training and test data are drawn from the same distribution. However, it is not clear how a network will act when it is fed with an out-of-distribution example. In this work, we consider the problem of out-of-distribution detection in neural networks. We propose to use multiple semantic dense representations instead of sparse representation as the target label. Specifically, we propose to use several word representations obtained from different corpora or architectures as target labels. We evaluated the proposed model on computer vision, and speech commands detection tasks and compared it to previous methods. Results suggest that our method compares favorably with previous work. Besides, we present the efficiency of our approach for detecting wrongly classified and adversarial examples.

artificial intelligence, latexit sha1, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Out-of-Distribution Detection using Multiple Semantic Label Representations

Shalev, Gabi, Adi, Yossi, Keshet, Joseph

Neural Information Processing SystemsDec-31-2018

Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks. These models are shown to be extremely efficient when training and test data are drawn from the same distribution. However, it is not clear how a network will act when it is fed with an out-of-distribution example. In this work, we consider the problem of out-of-distribution detection in neural networks. We propose to use multiple semantic dense representations instead of sparse representation as the target label. Specifically, we propose to use several word representations obtained from different corpora or architectures as target labels. We evaluated the proposed model on computer vision, and speech commands detection tasks and compared it to previous methods. Results suggest that our method compares favorably with previous work. Besides, we present the efficiency of our approach for detecting wrongly classified and adversarial examples.

deep learning, neural network, out-of-distribution example, (18 more...)

Neural Information Processing Systems

Country:

Asia > Middle East (0.14)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Out-of-Distribution Detection using Multiple Semantic Label Representations

Shalev, Gabi, Adi, Yossi, Keshet, Joseph

arXiv.org Machine LearningAug-20-2018

Deep Neural Networks are powerful models that attained remarkable results on a variety of tasks. These models are shown to be extremely efficient when training and test data are drawn from the same distribution. However, it is not clear how a network will act when it is fed with an out-of-distribution example. In this work, we consider the problem of out-of-distribution detection in neural networks. We propose to use multiple semantic dense representations instead of sparse representation as the target label. Specifically, we propose to use several word representations obtained from different corpora or architectures as target labels. We evaluated the proposed model on computer vision, and speech commands detection tasks and compared it to previous methods. Results suggest that our method compares favorably with previous work. Besides, we present the efficiency of our approach for detecting wrongly classified and adversarial examples.

deep learning, neural network, representation, (18 more...)

arXiv.org Machine Learning

1808.06664

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples

Cisse, Moustapha M., Adi, Yossi, Neverova, Natalia, Keshet, Joseph

Neural Information Processing SystemsDec-31-2017

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.

adversarial example, deep learning, speech recognition, (20 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.14)

Genre: Research Report (0.68)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.90)
(2 more...)

Add feedback

Houdini: Fooling Deep Structured Prediction Models

Cisse, Moustapha, Adi, Yossi, Neverova, Natalia, Keshet, Joseph

arXiv.org Machine LearningJul-17-2017

Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.

adversarial example, deep learning, neural network, (19 more...)

arXiv.org Machine Learning

1707.05373

Country: Asia > Middle East (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Automatic measurement of vowel duration via structured prediction

Adi, Yossi, Keshet, Joseph, Cibelli, Emily, Gustafson, Erin, Clopper, Cynthia, Goldrick, Matthew

arXiv.org Machine LearningOct-26-2016

A key barrier to making phonetic studies scalable and replicable is the need to rely on subjective, manual annotation. To help meet this challenge, a machine learning algorithm was developed for automatic measurement of a widely used phonetic measure: vowel duration. Manually-annotated data were used to train a model that takes as input an arbitrary length segment of the acoustic signal containing a single vowel that is preceded and followed by consonants and outputs the duration of the vowel. The model is based on the structured prediction framework. The input signal and a hypothesized set of a vowel's onset and offset are mapped to an abstract vector space by a set of acoustic feature functions. The learning algorithm is trained in this space to minimize the difference in expectations between predicted and manually-measured vowel durations. The trained model can then automatically estimate vowel durations without phonetic or orthographic transcription. Results comparing the model to three sets of manually annotated data suggest it out-performed the current gold standard for duration measurement, an HMM-based forced aligner (which requires orthographic or phonetic transcription as an input).

artificial intelligence, automatic measurement, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1121/1.4972527

1610.08166

Country:

North America > United States > Ohio (0.14)
North America > United States > Pennsylvania (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions

Hazan, Tamir, Maji, Subhransu, Keshet, Joseph, Jaakkola, Tommi

Neural Information Processing SystemsDec-31-2013

In this work we develop efficient methods for learning random MAP predictors for structured label problems. In particular, we construct posterior distributions over perturbations that can be adjusted via stochastic gradient methods. We show that every smooth posterior distribution would suffice to define a smooth PAC-Bayesian risk bound suitable for gradient methods. In addition, we relate the posterior distributions to computational properties of the MAP predictors. We suggest multiplicative posteriors to learn super-modular potential functions that accompany specialized MAP predictors such as graph-cuts. We also describe label-augmented posterior models that can use efficient MAP approximations, such as those arising from linear program relaxations.

Add feedback

Filters

Collaborating Authors

Keshet, Joseph

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Dr.VOT : Measuring Positive and Negative Voice Onset Time in the Wild

SpeechYOLO: Detection and Localization of Speech Objects

Hide and Speak: Deep Neural Networks for Speech Steganography

Out-of-Distribution Detection using Multiple Semantic Label Representations

Out-of-Distribution Detection using Multiple Semantic Label Representations

Out-of-Distribution Detection using Multiple Semantic Label Representations

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples

Houdini: Fooling Deep Structured Prediction Models

Automatic measurement of vowel duration via structured prediction

Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions