Undirected Networks
An Exclusive Look at How AI and Machine Learning Work at Apple โ Backchannel
Three years earlier, Apple had been the first major tech company to integrate a smart assistant into its operating system. Siri was the company's adaptation of a standalone app it had purchased, along with the team that created it, in 2010. Initial reviews were ecstatic, but over the next few months and years, users became impatient with its shortcomings. All too often, it erroneously interpreted commands. So Apple moved Siri voice recognition to a neural-net based system for US users on that late July day (it went worldwide on August 15, 2014.)
The Speech Recognition Wiki
In acoustic modelling Artificial Neural Networks can be used as an alternative approach to Hidden Markov Models for phoneme recognition. A pre-processed feature vector is fed into the input layer of a neural network. The goal is to correctly match different phones to phonems, which can then be further processed in the language model. The dynamic nature of speech is an impairing factor when using artificial neural networks for phonem recognition. Traditional neural networks require the phones to be perfectly aligned in time to allow for flawless allocation.
Master the Basics of Machine Learning With These 6 Resources
It seems like machine learning and artificial intelligence are topics at the top of everyone's mind in tech. Be it autonomous cars, robots, or machine intelligence in general, everyone's talking about machines getting smarter and being able to do more. At the same time, for many developers, machine learning and artificial intelligence are nebulous terms representing complex mathematical and data problems they just don't have the time to explore and learn. As I've spoken with lots of developers and CTOs about Fuzzy.io and our mission to make it easy for developers to start bringing intelligent decision-making to their software without needing huge amounts of data or AI expertise, some were curious to learn more about the greater landscape of machine learning. Here are some of the links to articles, podcasts and courses discussing some of the basics of machine learning that I've shared with them.
Outlier Detection on Mixed-Type Data: An Energy-based Approach
Do, Kien, Tran, Truyen, Phung, Dinh, Venkatesh, Svetha
Outlier detection amounts to finding data points that differ significantly from the norm. Classic outlier detection methods are largely designed for single data type such as continuous or discrete. However, real world data is increasingly heterogeneous, where a data point can have both discrete and continuous attributes. Handling mixed-type data in a disciplined way remains a great challenge. In this paper, we propose a new unsupervised outlier detection method for mixed-type data based on Mixed-variate Restricted Boltzmann Machine (Mv.RBM). The Mv.RBM is a principled probabilistic method that models data density. We propose to use \emph{free-energy} derived from Mv.RBM as outlier score to detect outliers as those data points lying in low density regions. The method is fast to learn and compute, is scalable to massive datasets. At the same time, the outlier score is identical to data negative log-density up-to an additive constant. We evaluate the proposed method on synthetic and real-world datasets and demonstrate that (a) a proper handling mixed-types is necessary in outlier detection, and (b) free-energy of Mv.RBM is a powerful and efficient outlier scoring method, which is highly competitive against state-of-the-arts.
Datasets VS Algorithms - A Breakthrough in AI 6x Faster -
The past years have witnessed strong emergence for different datasets and algorithms repositories. Some inquiries accompanied this emergence. An increasing amount of market research started to investigate which is more important for the development of Artificial Intelligence (AI) sciences, which segments are of highest demand and can have greater market share in the future. By reviewing the artificial intelligence (AI) breakthroughs timeline over 30 years, Wissner-Gross found that the availability of high-quality datasets was the key limiting factor for AI advances and not algorithms. He also found that high-quality dataset availability can cause a breakthrough in the field of AI six times faster than Algorithms.
Posterior Sampling for Reinforcement Learning Without Episodes
Osband, Ian, Van Roy, Benjamin
This is a brief technical note to clarify some of the issues with applying the application of the algorithm posterior sampling for reinforcement learning (PSRL) in environments without fixed episodes. In particular, this paper aims to: - Review some of results which have been proven for finite horizon MDPs (Osband et al 2013, 2014a, 2014b, 2016) and also for MDPs with finite ergodic structure (Gopalan et al 2014). - Review similar results for optimistic algorithms in infinite horizon problems (Jaksch et al 2010, Bartlett and Tewari 2009, Abbasi-Yadkori and Szepesvari 2011), with particular attention to the dynamic episode growth. - Highlight the delicate technical issue which has led to a fault in the proof of the lazy-PSRL algorithm (Abbasi-Yadkori and Szepesvari 2015). We present an explicit counterexample to this style of argument. Therefore, we suggest that the Theorem 2 in (Abbasi-Yadkori and Szepesvari 2015) be instead considered a conjecture, as it has no rigorous proof. - Present pragmatic approaches to apply PSRL in infinite horizon problems. We conjecture that, under some additional assumptions, it will be possible to obtain bounds $O( \sqrt{T} )$ even without episodic reset. We hope that this note serves to clarify existing results in the field of reinforcement learning and provides interesting motivation for future work.
Towards Representation Learning with Tractable Probabilistic Models
Vergari, Antonio, Di Mauro, Nicola, Esposito, Floriana
Probabilistic models learned as density estimators can be exploited in representation learning beside being toolboxes used to answer inference queries only. However, how to extract useful representations highly depends on the particular model involved. We argue that tractable inference, i.e. inference that can be computed in polynomial time, can enable general schemes to extract features from black box models. We plan to investigate how Tractable Probabilistic Models (TPMs) can be exploited to generate embeddings by random query evaluations. We devise two experimental designs to assess and compare different TPMs as feature extractors in an unsupervised representation learning framework. We show some experimental results on standard image datasets by applying such a method to Sum-Product Networks and Mixture of Trees as tractable models generating embeddings.
A Distance for HMMs based on Aggregated Wasserstein Metric and State Registration
Chen, Yukun, Ye, Jianbo, Li, Jia
We propose a framework, named Aggregated Wasserstein, for computing a dissimilarity measure or distance between two Hidden Markov Models with state conditional distributions being Gaussian. For such HMMs, the marginal distribution at any time spot follows a Gaussian mixture distribution, a fact exploited to softly match, aka register, the states in two HMMs. We refer to such HMMs as Gaussian mixture model-HMM (GMM-HMM). The registration of states is inspired by the intrinsic relationship of optimal transport and the Wasserstein metric between distributions. Specifically, the components of the marginal GMMs are matched by solving an optimal transport problem where the cost between components is the Wasserstein metric for Gaussian distributions. The solution of the optimization problem is a fast approximation to the Wasserstein metric between two GMMs. The new Aggregated Wasserstein distance is a semi-metric and can be computed without generating Monte Carlo samples. It is invariant to relabeling or permutation of the states. This distance quantifies the dissimilarity of GMM-HMMs by measuring both the difference between the two marginal GMMs and the difference between the two transition matrices. Our new distance is tested on the tasks of retrieval and classification of time series. Experiments on both synthetic data and real data have demonstrated its advantages in terms of accuracy as well as efficiency in comparison with existing distances based on the Kullback-Leibler divergence.
A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments
Nakamura, Eita, Ono, Nobutaka, Sagayama, Shigeki, Watanabe, Kenji
We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indeterminacies, they are carefully incorporated into the model through its topology and parameters, and the state construction for quite general polyphonic scores is explained in detail. By analyzing piano performance data, we find significant overlaps in inter-onset-interval distributions of chordal notes, ornaments, and inter-chord events, and the data is used to determine details of the model. The model is applied for score following and offline score-performance matching, yielding highly accurate matching for performances with many ornaments and relatively frequent errors, repeats, and skips.
Multi Level Monte Carlo methods for a class of ergodic stochastic differential equations
Szpruch, Lukasz, Vollmer, Sebastian, Zygalakis, Konstantinos, Giles, Michael B.
We develop a framework that allows the use of the multi-level Monte Carlo (MLMC) methodology (Giles 2015) to calculate expectations with respect to the invariant measures of ergodic SDEs. In that context, we study the (over-damped) Langevin equations with strongly convex potential. We show that, when appropriate contracting couplings for the numerical integrators are available, one can obtain a time-uniform estimates of the MLMC variance in stark contrast to the majority of the results in the MLMC literature. As a consequence, one can approximate expectations with respect to the invariant measure in an unbiased way without the need of a Metropolis- Hastings step. In addition, a root mean square error of $\mathcal{O}(\epsilon)$ is achieved with $\mathcal{O}(\epsilon^{-2})$ complexity on par with Markov Chain Monte Carlo (MCMC) methods, which however can be computationally intensive when applied to large data sets. Finally, we present a multilevel version of the recently introduced Stochastic Gradient Langevin (SGLD) method (Welling and Teh, 2011) built for large datasets applications. We show that this is the first stochastic gradient MCMC method with complexity $\mathcal{O}(\epsilon^{-2}|\log {\epsilon}|^{3})$, which is asymptotically an order $\epsilon$ lower than the $ \mathcal{O}(\epsilon^{-3})$ complexity of all stochastic gradient MCMC methods that are currently available. Numerical experiments confirm our theoretical findings.