AITopics | melnet

Collaborating Authors

melnet

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Improving endpoint detection in end-to-end streaming ASR for conversational speech

C, Anandh, Durai, Karthik Pandia, Prakash, Jeena, Arumugam, Manickavela, Hacioglu, Kadri, Dubagunta, S. Pavankumar, Stolcke, Andreas, Venkatesan, Shankar, Ganapathiraju, Aravind

arXiv.org Artificial IntelligenceMay-26-2025

ASR endpointing (EP) plays a major role in delivering a good user experience in products supporting human or artificial agents in human-human/machine conversations. Transducer-based ASR (T-ASR) is an end-to-end (E2E) ASR modelling technique preferred for streaming. A major limitation of T-ASR is delayed emission of ASR outputs, which could lead to errors or delays in EP. Inaccurate EP will cut the user off while speaking, returning incomplete transcript while delays in EP will increase the perceived latency, degrading the user experience. We propose methods to improve EP by addressing delayed emission along with EP mistakes. To address the delayed emission problem, we introduce an end-of-word token at the end of each word, along with a delay penalty. The EP delay is addressed by obtaining a reliable frame-level speech activity detection using an auxiliary network. We apply the proposed methods on Switchboard conversational speech corpus and evaluate it against a delay penalty method.

endpoint, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.1707

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

MelNet: A Real-Time Deep Learning Algorithm for Object Detection

Azadvatan, Yashar, Kurt, Murat

arXiv.org Artificial IntelligenceJan-31-2024

In this study, a novel deep learning algorithm for object detection, named MelNet, was introduced. MelNet underwent training utilizing the KITTI dataset for object detection. Following 300 training epochs, MelNet attained an mAP (mean average precision) score of 0.732. Additionally, three alternative models -YOLOv5, EfficientDet, and Faster-RCNN-MobileNetv3- were trained on the KITTI dataset and juxtaposed with MelNet for object detection. The outcomes underscore the efficacy of employing transfer learning in certain instances. Notably, preexisting models trained on prominent datasets (e.g., ImageNet, COCO, and Pascal VOC) yield superior results. Another finding underscores the viability of creating a new model tailored to a specific scenario and training it on a specific dataset. This investigation demonstrates that training MelNet exclusively on the KITTI dataset also surpasses EfficientDet after 150 epochs. Consequently, post-training, MelNet's performance closely aligns with that of other pre-trained models.

dataset, detection, melnet, (13 more...)

arXiv.org Artificial Intelligence

2401.17972

Country:

Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(10 more...)

Genre: Research Report > New Finding (0.54)

Industry: Transportation (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Facebook's AI system can speak with Bill Gates's voice

#artificialintelligenceApr-15-2020, 03:46:45 GMT

The slow progress on realistic text-to-speech systems is not from lack of trying. Numerous teams have attempted to train deep-learning algorithms to reproduce real speech patterns using large databases of audio. The problem with this approach, say Vasquez and Lewis, is with the type of data. Until now, most work has focused on audio waveform recordings. These show how the amplitude of sound changes over time, with each second of recorded audio consisting of tens of thousands of time steps.

correlation, time scale, waveform, (16 more...)

#artificialintelligence

Industry: Information Technology > Services (0.41)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback

Listen to this AI voice clone of Bill Gates created by Facebook's engineers

#artificialintelligenceJan-23-2020, 01:27:03 GMT

We're headed for a revolution in computer-generated speech, and a voice clone of Microsoft founder Bill Gates demonstrates exactly why. In the clips embedded below, you can listen to what seems to be Gates reeling off a series of innocuous phrases. "A cramp is no small danger on a swim," he cautions. "Write a fond note to the friend you cherish," he advises. But each voice clip has been generated by a machine learning system named MelNet, designed and created by engineers at Facebook.

facebook, melnet, voice clone, (12 more...)

#artificialintelligence

Genre: Research Report (0.36)

Industry:

Information Technology > Security & Privacy (0.63)
Information Technology > Services (0.43)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.74)

Add feedback

6 Ways Speech Synthesis Is Being Powered By Deep Learning

#artificialintelligenceNov-17-2019, 00:51:45 GMT

This model was open sourced back in June 2019 as an implementation of the paper Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis. This service is being offered by Resemble.ai. With this product, one can clone any voice and create dynamic, iterable, and unique voice content. Users input a short voice sample and the model -- trained only during playback time -- can immediately deliver text-to-speech utterances in the style of the sampled voice. Bengaluru's Deepsync offers an Augmented Intelligence that learns the way you speak.

application, speech, speech synthesis, (7 more...)

#artificialintelligence

Country: Asia > India > Karnataka > Bengaluru (0.25)

Industry: Information Technology > Security & Privacy (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Bill Gates, Stephen Hawking get AI voice clones, thanks to Facebook engineers

#artificialintelligenceJun-23-2019, 05:51:32 GMT

Using Artificial Intelligence, two Facebook engineers have now successfully cloned the voices of famous personalities including Microsoft cofounder Bill Gates, late theoretical physicist Stephen Hawking, and American actor George Takei among few others. Mike Lewis and Sean Vasquez, the two Facebook engineers developed a computer generated speech system called MelNet using Artificial Intelligence. Not just the voices of famous personalities, they have also created voice and music samples using AI. In a recently published research paper, they mentioned relying on machine learning for the convincing AI generated voice clips. Apart from Bill Gates, Stephen Hawking, and George Takei, others whose voice have been cloned are – primatologist Jane Goodall, professors Daphne Koller, Fei Fei Li, scientist Stephen Wolfram and Khan Academy founder Sal Khan.

artificial intelligence, machine learning, social media, (7 more...)

#artificialintelligence

Country: North America > United States (0.18)

Industry:

Education (0.61)
Information Technology > Security & Privacy (0.43)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback

MelNet: A Generative Model for Audio in the Frequency Domain

Vasquez, Sean, Lewis, Mike

arXiv.org Machine LearningJun-4-2019

Capturing high-level structure in audio waveforms is challenging because a single second of audio spans tens of thousands of timesteps. While long-range dependencies are difficult to model directly in the time domain, we show that they can be more tractably modelled in two-dimensional time-frequency representations such as spectrograms. By leveraging this representational advantage, in conjunction with a highly expressive probabilistic model and a multiscale generation procedure, we design a model capable of generating high-fidelity audio samples which capture structure at timescales that time-domain models have yet to achieve. We apply our model to a variety of audio generation tasks, including unconditional speech generation, music generation, and text-to-speech synthesis---showing improvements over previous approaches in both density estimates and human judgments.

machine learning, melnet, natural language, (18 more...)

arXiv.org Machine Learning

1906.01083

Genre: Research Report (0.64)

Industry:

Media > Music (0.49)
Leisure & Entertainment (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Speech (0.70)

Add feedback