AITopics

This paper summarizes the applied deep learning practices in the field of speaker recognition, both verification and identification. Speaker recognition has been a widely used field topic of speech technolog y. Many research works have been carried out and little progress has been achieved in the past 5 - 6 years. However, as deep learning techniques do advance in most machine learning fields, the former state - of - the - art methods are getting replaced by them in s peaker recognition too. It seems that DL becomes the now state - of - the - art solution for both speaker verification and identification. The standard x - vectors, additional to i - vectors, are used as baseline in most of the novel works. The increasing amount of gathered data opens up the territory to DL, where they are the most effective.

speaker recognition, speaker verification, vector, (11 more...)

1911.06615

Country:

North America > United States (0.14)
Europe > Hungary > Budapest > Budapest (0.04)
Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.68)

Industry: Information Technology > Security & Privacy (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Sequential Recommendation with Relation-Aware Kernelized Self-Attention

Ji, Mingi, Joo, Weonyoung, Song, Kyungwoo, Kim, Yoon-Yeong, Moon, Il-Chul

Recent studies identified that sequential Recommendation is improved by the attention mechanism. By following this development, we propose Relation-Aware Kernelized Self-Attention (RKSA) adopting a self-attention mechanism of the Transformer with augmentation of a probabilistic model. The original self-attention of Transformer is a deterministic measure without relation-awareness. Therefore, we introduce a latent space to the self-attention, and the latent space models the recommendation context from relation as a multivariate skew-normal distribution with a kernelized covariance matrix from co-occurrences, item characteristics, and user information. This work merges the self-attention of the Transformer and the sequential recommendation by adding a probabilistic model of the recommendation task specifics. We experimented RKSA over the benchmark datasets, and RKSA shows significant improvements compared to the recent baseline models. Also, RKSA were able to produce a latent space model that answers the reasons for recommendation.

dataset, recommendation, rksa, (15 more...)

1911.06478

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry:

Media > Film (0.68)
Information Technology (0.48)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Javaheripi, Mojan, Samragh, Mohammad, Javidi, Tara, Koushanfar, Farinaz

ASCAI: Adaptive Sampling for acquiring Compact AI

This paper introduces ASCAI, a novel adaptive sampling methodology that can learn how to effectively compress Deep Neural Networks (DNNs) for accelerated inference on resource-constrained platforms. Modern DNN compression techniques comprise various hyperparameters that require per-layer customization to ensure high accuracy. Choosing such hyperparameters is cumbersome as the pertinent search space grows exponentially with the number of model layers. To effectively traverse this large space, we devise an intelligent sampling mechanism that adapts the sampling strategy using customized operations inspired by genetic algorithms. As a special case, we consider the space of model compression as a vector space. The adaptively selected samples enable ASCAI to automatically learn how to tune per-layer compression hyperparameters to optimize the accuracy/model-size trade-off. Our extensive evaluations show that ASCAI outperforms rule-based and reinforcement learning methods in terms of compression rate and/or accuracy

accuracy, hyperparameter, pruning, (15 more...)

1911.06471

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre:

Instructional Material > Course Syllabus & Notes (0.54)
Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Foster, Dylan J., Rakhlin, Alexander

$\ell_{\infty}$ Vector Contraction for Rademacher Complexity

Rademacher complexity plays a fundamental role in learning theory, where it tightly bounds the supremum of the empirical process ( Koltchinskii and Panchenko, 2000; Bartlett and Mendelson, 2003) and is used to prove generalization guarantees for empiric al risk minimization and other learning rules.

complexity, inequality, rademacher complexity, (11 more...)

1911.06468

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > New York (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Perrone, Michael P., Khan, Haidar, Kim, Changhoan, Kyrillidis, Anastasios, Quinn, Jerry, Salapura, Valentina

Optimal Mini-Batch Size Selection for Fast Gradient Descent

Jerry Quinn IBM T.J. Watson Research Center Y orktown Heights, NY 10598 V alentina Salapura IBM T.J. Watson Research Center Y orktown Heights, NY 10598 Abstract This paper presents a methodology for selecting the mini-batch size that minimizes Stochastic Gradient Descent (SGD) learning time for single and multiple learner problems. By de-coupling algorithmic analysis issues from hardware and software implementation details, we reveal a robust empirical inverse law between mini-batch size and the average number of SGD updates required to converge to a specified error threshold. Combining this empirical inverse law with measured system performance, we create an accurate, closed-form model of average training time and show how this model can be used to identify quantifiable implications for both algorithmic and hardware aspects of machine learning. We demonstrate the inverse law empirically, on both image recognition (MNIST, CIFAR10 and CIFAR100) and machine translation (Europarl) tasks, and provide a theoretic justification via proving a novel bound on mini-batch SGD training. Introduction In this paper, we present an empirical law, with theoretical justification, linking the number of learning iterations to the mini-batch size. From this result, we derive a principled methodology for selecting mini-batch size w.r.t. This methodology saves training time and provides both intuition and a principled approach for optimizing machine learning algorithms and machine learning hardware system design. Further, we use our methodology to show that focusing on weak scaling can lead to suboptimal training times because, by neglecting the dependence of convergence time on the size of the mini-batch used, weak scaling does not always minimize the training time.

algorithm, learning, mini-batch size, (14 more...)

1911.06459

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Texas > Harris County > Houston (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Modelling EHR timeseries by restricting feature interaction

Zhang, Kun, Xue, Yuan, Flores, Gerardo, Rajkomar, Alvin, Cui, Claire, Dai, Andrew M.

Time series data are prevalent in electronic health records, mostly in the form of physiological parameters such as vital signs and lab tests. The patterns of these values may be significant indicators of patients' clinical states and there might be patterns that are unknown to clinicians but are highly predictive of some outcomes. Many of these values are also missing which makes it difficult to apply existing methods like decision trees. We propose a recurrent neural network model that reduces overfitting to noisy observations by limiting interactions between features. We analyze its performance on mortality, ICD-9 and AKI prediction from observational values on the Medical Information Mart for Intensive Care III (MIMIC-III) dataset. Our models result in an improvement of 1.1% [p<0.01] in AU-ROC for mortality prediction under the MetaVision subset and 1.0% and 2.2% [p<0.01] respectively for mortality and AKI under the full MIMIC-III dataset compared to existing state-of-the-art interpolation, embedding and decay-based recurrent models.

fg-lstm, percentile, prediction, (14 more...)

1911.0641

Country:

North America > United States (0.04)
North America > Canada (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Health & Medicine > Therapeutic Area > Nephrology (0.68)
Health & Medicine > Health Care Technology > Medical Record (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Mining News Events from Comparable News Corpora: A Multi-Attribute Proximity Network Modeling Approach

Kim, Hyungsul, El-Kishky, Ahmed, Ren, Xiang, Han, Jiawei

We present ProxiModel, a novel event mining framework for extracting high-quality structured event knowledge from large, redundant, and noisy news data sources. The proposed model differentiates itself from other approaches by modeling both the event correlation within each individual document as well as across the corpus. To facilitate this, we introduce the concept of a proximity-network, a novel space-efficient data structure to facilitate scalable event mining. This proximity network captures the corpus-level co-occurence statistics for candidate event descriptors, event attributes, as well as their connections. We probabilistically model the proximity network as a generative process with sparsity-inducing regularization. This allows us to efficiently and effectively extract high-quality and interpretable news events. Experiments on three different news corpora demonstrate that the proposed method is effective and robust at generating high-quality event descriptors and attributes. We briefly detail many interesting applications from our proposed framework such as news summarization, event tracking and multi-dimensional analysis on news. Finally, we explore a case study on visualizing the events for a Japan Tsunami news corpus and demonstrate ProxiModel's ability to automatically summarize emerging news events.

descriptor, information, proximity, (14 more...)

1911.06407

Country:

North America > United States > California (0.14)
Asia > Russia (0.14)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.06)
(14 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Government > Military (1.00)
Energy > Power Industry > Utilities > Nuclear (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Seq-U-Net: A One-Dimensional Causal U-Net for Efficient Sequence Modelling

Stoller, Daniel, Tian, Mi, Ewert, Sebastian, Dixon, Simon

Convolutional neural networks (CNNs) with dilated filters such as the Wavenet or the Temporal Convolutional Network (TCN) have shown good results in a variety of sequence modelling tasks. However, efficiently modelling long-term dependencies in these sequences is still challenging. Although the receptive field of these models grows exponentially with the number of layers, computing the convolutions over very long sequences of features in each layer is time and memory-intensive, prohibiting the use of longer receptive fields in practice. To increase efficiency, we make use of the "slow feature" hypothesis stating that many features of interest are slowly varying over time. For this, we use a U-Net architecture that computes features at multiple time-scales and adapt it to our auto-regressive scenario by making convolutions causal. We apply our model ("Seq-U-Net") to a variety of tasks including language and audio generation. In comparison to TCN and Wavenet, our network consistently saves memory and computation time, with speed-ups for training and inference of over 4x in the audio generation experiment in particular, while achieving a comparable performance in all tasks.

convolution, seq-u-net, sequence, (16 more...)

1911.06393

Country: North America > United States (0.04)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.68)
Media > Music (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)

González, Mario, Almansa, Andrés, Delbracio, Mauricio, Musé, Pablo, Tan, Pauline

Solving Inverse Problems by Joint Posterior Maximization with a VAE Prior

In this paper we address the problem of solving ill-posed inverse problems in imaging where the prior is a neural generative model. Specifically we consider the decoupled case where the prior is trained once and can be reused for many different log-concave degradation models without retraining. Whereas previous MAP-based approaches to this problem lead to highly non-convex optimization algorithms, our approach computes the joint (space-latent) MAP that naturally leads to alternate optimization algorithms and to the use of a stochastic encoder to accelerate computations. The resulting technique is called JPMAP because it performs Joint Posterior Maximization using an Autoencoding Prior. We show theoretical and experimental evidence that the proposed objective function is quite close to bi-convex. Indeed it satisfies a weak bi-convexity property which is sufficient to guarantee that our optimization scheme converges to a stationary point. Experimental results also show the higher quality of the solutions obtained by our JPMAP approach with respect to other non-convex MAP approaches which more often get stuck in spurious local optima.

accumulation point, approximation, sequence, (15 more...)

1911.06379

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
South America > Uruguay > Salto > Salto (0.04)
South America > Uruguay > Montevideo > Montevideo (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Dhami, Devendra Singh, Kunapuli, Gautam, Page, David, Natarajan, Sriraam

Predicting Drug-Drug Interactions from Molecular Structure Images

Adverse drug events (ADEs) are "injuries resulting from medical intervention related to a drug" (Nebeker, Barach, and Samore 2004), and are distinct from medication errors (inappropriate prescription, dispensing, usage etc.) as they are caused by drugs at normal dosages. According to the National Center for Health Statistics (NCHS 2014), 48.9% of Americans took at least one prescription drug in the last 30 days, 23.1% took at least three, and 11.9% took at least

drug-drug interaction, interaction, siamese architecture, (14 more...)

1911.06356

Country:

North America > United States > Texas > Dallas County > Dallas (0.05)
Europe (0.04)

Genre: Research Report > Experimental Study (0.89)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science > Data Mining (0.69)