Goto

Collaborating Authors

 South America


Many-to-Many Voice Transformer Network

arXiv.org Machine Learning

This paper proposes a voice conversion (VC) method based on a sequence-to-sequence (S2S) learning framework, which enables simultaneous conversion of the voice characteristics, pitch contour, and duration of input speech. We previously proposed an S2S-based VC method using a transformer network architecture called the voice transformer network (VTN). The original VTN was designed to learn only a mapping of speech feature sequences from one speaker to another. The main idea we propose is an extension of the original VTN that can simultaneously learn mappings among multiple speakers. This extension called the many-to-many VTN makes it able to fully use available training data collected from multiple speakers by capturing common latent features that can be shared across different speakers. It also allows us to introduce a training loss called the identity mapping loss to ensure that the input feature sequence will remain unchanged when the source and target speaker indices are the same. Using this particular loss for model training has been found to be extremely effective in improving the performance of the model at test time. We conducted speaker identity conversion experiments and found that our model obtained higher sound quality and speaker similarity than baseline methods. We also found that our model, with a slight modification to its architecture, could handle any-to-many conversion tasks reasonably well.


Curse of Small Sample Size in Forecasting of the Active Cases in COVID-19 Outbreak

arXiv.org Machine Learning

During the COVID-19 pandemic, a massive number of attempts on the predictions of the number of cases and the other future trends of this pandemic have been made. However, they fail to predict, in a reliable way, the medium and long term evolution of fundamental features of COVID-19 outbreak within acceptable accuracy. This paper gives an explanation for the failure of machine learning models in this particular forecasting problem. The paper shows that simple linear regression models provide high prediction accuracy values reliably but only for a 2-weeks period and that relatively complex machine learning models, which have the potential of learning long term predictions with low errors, cannot achieve to obtain good predictions with possessing a high generalization ability. It is suggested in the paper that the lack of a sufficient number of samples is the source of low prediction performance of the forecasting models. The reliability of the forecasting results about the active cases is measured in terms of the cross-validation prediction errors, which are used as expectations for the generalization errors of the forecasters. To exploit the information, which is of most relevant with the active cases, we perform feature selection over a variety of variables. We apply different feature selection methods, namely the Pairwise Correlation, Recursive Feature Selection, and feature selection by using the Lasso regression and compare them to each other and also with the models not employing any feature selection. Furthermore, we compare Linear Regression, Multi-Layer Perceptron, and Long-Short Term Memory models each of which is used for prediction active cases together with the mentioned feature selection methods. Our results show that the accurate forecasting of the active cases with high generalization ability is possible up to 3 days only because of the small sample size of COVID-19 data.


Ridge Regression with Frequent Directions: Statistical and Optimization Perspectives

arXiv.org Machine Learning

Despite its impressive theory \& practical performance, Frequent Directions (\acrshort{fd}) has not been widely adopted for large-scale regression tasks. Prior work has shown randomized sketches (i) perform worse in estimating the covariance matrix of the data than \acrshort{fd}; (ii) incur high error when estimating the bias and/or variance on sketched ridge regression. We give the first constant factor relative error bounds on the bias \& variance for sketched ridge regression using \acrshort{fd}. We complement these statistical results by showing that \acrshort{fd} can be used in the optimization setting through an iterative scheme which yields high-accuracy solutions. This improves on randomized approaches which need to compromise the need for a new sketch every iteration with speed of convergence. In both settings, we also show using \emph{Robust Frequent Directions} further enhances performance.


How machine learning is giving mobile networks a shot in the arm during covid-19

#artificialintelligence

Video consumption was skyrocketing even before the lockdowns. And over the past few months Netflix, Amazon, HBO Now and most recently Disney Plus have seen their subscriber base explode. For the operators running the networks that deliver all that content, it has been a test of business agility to reconfigure networks fast to keep up with the changing situation on the ground, ensuring their subscribers could continue making Zoom calls and binge a boxset. In Italy, the first country in Europe to enforce a lockdown, peak throughput on the mobile network increased by up to 90% compared to the weeks before lockdown. In Spain, the mobile network experienced a 35% increase in throughput, while over fixed networks it increased 50%.


Brazil sets out plans to boost innovation

#artificialintelligence

The Brazilian government has published a National Innovation Policy (NIP) setting out plans to encourage and develop innovative products, processes and services across the country. The areas include improving skills; widening the innovation talent pool; encouraging international engagement; and stimulating research, development and innovation within the Brazilian private sector. The government says the NIP will promote the coordination and distribution of public funds towards the advancement of innovation. An Innovation Committee, managed by the Ministry of Science, Technology and Innovations (MCTI) and chaired by the presidential office, will oversee the wide-ranging project. It is due to publish a detailed National Innovation Strategy in the near future, the technology website reported.


HoVer: A Dataset for Many-Hop Fact Extraction And Claim Verification

arXiv.org Artificial Intelligence

We introduce HoVer (HOppy VERification), a dataset for many-hop evidence extraction and fact verification. It challenges models to extract facts from several Wikipedia articles that are relevant to a claim and classify whether the claim is Supported or Not-Supported by the facts. In HoVer, the claims require evidence to be extracted from as many as four English Wikipedia articles and embody reasoning graphs of diverse shapes. Moreover, most of the 3/4-hop claims are written in multiple sentences, which adds to the complexity of understanding long-range dependency relations such as coreference. We show that the performance of an existing state-of-the-art semantic-matching model degrades significantly on our dataset as the number of reasoning hops increases, hence demonstrating the necessity of many-hop reasoning to achieve strong results. We hope that the introduction of this challenging dataset and the accompanying evaluation task will encourage research in many-hop fact retrieval and information verification. We make the HoVer dataset publicly available at https://hover-nlp.github.io


Alignment Restricted Streaming Recurrent Neural Network Transducer

arXiv.org Artificial Intelligence

There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR) applications. RNN-T is trained with a loss function that does not enforce temporal alignment of the training transcripts and audio. As a result, RNN-T models built with uni-directional long short term memory (LSTM) encoders tend to wait for longer spans of input audio, before streaming already decoded ASR tokens. In this work, we propose a modification to the RNN-T loss function and develop Alignment Restricted RNN-T (Ar-RNN-T) models, which utilize audio-text alignment information to guide the loss computation. We compare the proposed method with existing works, such as monotonic RNN-T, on LibriSpeech and in-house datasets. We show that the Ar-RNN-T loss provides a refined control to navigate the trade-offs between the token emission delays and the Word Error Rate (WER). The Ar-RNN-T models also improve downstream applications such as the ASR End-pointing by guaranteeing token emissions within any given range of latency. Moreover, the Ar-RNN-T loss allows for bigger batch sizes and 4 times higher throughput for our LSTM model architecture, enabling faster training and convergence on GPUs.


From Note-Level to Chord-Level Neural Network Models for Voice Separation in Symbolic Music

arXiv.org Artificial Intelligence

Music is often experienced as a progression of concurrent streams of notes, or voices. The degree to which this happens depends on the position along a voice-leading continuum, ranging from monophonic, to homophonic, to polyphonic, which complicates the design of automatic voice separation models. We address this continuum by defining voice separation as the task of decomposing music into streams that exhibit both a high degree of external perceptual separation from the other streams and a high degree of internal perceptual consistency. The proposed voice separation task allows for a voice to diverge to multiple voices and also for multiple voices to converge to the same voice. Equipped with this flexible task definition, we manually annotated a corpus of popular music and used it to train neural networks that assign notes to voices either separately for each note in a chord (note-level), or jointly to all notes in a chord (chord-level). The trained neural models greedily assign notes to voices in a left to right traversal of the input chord sequence, using a diverse set of perceptually informed input features. When evaluated on the extraction of consecutive within voice note pairs, both models surpass a strong baseline based on an iterative application of an envelope extraction function, with the chord-level model consistently edging out the note-level model. The two models are also shown to outperform previous approaches on separating the voices in Bach music.


The State of AI Ethics Report (October 2020)

arXiv.org Artificial Intelligence

The 2nd edition of the Montreal AI Ethics Institute's The State of AI Ethics captures the most relevant developments in the field of AI Ethics since July 2020. This report aims to help anyone, from machine learning experts to human rights activists and policymakers, quickly digest and understand the ever-changing developments in the field. Through research and article summaries, as well as expert commentary, this report distills the research and reporting surrounding various domains related to the ethics of AI, including: AI and society, bias and algorithmic justice, disinformation, humans and AI, labor impacts, privacy, risk, and future of AI ethics. In addition, The State of AI Ethics includes exclusive content written by world-class AI Ethics experts from universities, research institutes, consulting firms, and governments. These experts include: Danit Gal (Tech Advisor, United Nations), Amba Kak (Director of Global Policy and Programs, NYU's AI Now Institute), Rumman Chowdhury (Global Lead for Responsible AI, Accenture), Brent Barron (Director of Strategic Projects and Knowledge Management, CIFAR), Adam Murray (U.S. Diplomat working on tech policy, Chair of the OECD Network on AI), Thomas Kochan (Professor, MIT Sloan School of Management), and Katya Klinova (AI and Economy Program Lead, Partnership on AI). This report should be used not only as a point of reference and insight on the latest thinking in the field of AI Ethics, but should also be used as a tool for introspection as we aim to foster a more nuanced conversation regarding the impacts of AI on the world.


Anomaly detection in average fuel consumption with XAI techniques for dynamic generation of explanations

arXiv.org Artificial Intelligence

In this paper we show a complete process for unsupervised anomaly detection for the average fuel consumption of fleet vehicles that is able to explain what variables are affecting the consumption in terms of feature relevance. For doing that, we combine the anomaly detection with a surrogate model that is able to provide that feature relevance. For this part, we evaluate both whitebox models from the literature, as well as novel variations over them, and blackbox models combined with local posthoc feature relevance techniques. The evaluation is done using real IoT data belonging to Telef\'onica, and is measured both in terms of model performance, as well as using Explainable AI metrics that compare the explanations generated in terms representativeness, fidelity, stability and contrastiveness. The explanations generate counterfactual recommendations that show what could have been done to reduce the average fuel consumption of a vehicle and turn it into an inlier. The procedure is combined with domain knowledge expressed in business rules, and is able to adequate the type of explanations depending on the target user profile.