Perceptrons
2017's Deep Learning Papers on Investing – ITNEXT
Abstract: Three different classes of data mining methods (k-Nearest Neighbour, Ridge Regression and Multilayer Perceptron Feed-Forward Neural Networks) are applied for the purpose of quantitative trading on 10 simulated time series, as well as real world time series of 10 currency exchange rates ranging from 1.11.1999 to 12.6.2015. Each method is tested in multiple variants. The k-NN algorithm is applied alternatively with the Euclidian, Manhattan, Mahalanobis and Maximum distance function. The Ridge Regression is applied as Linear and Quadratic, and the Feed-Forward Neural Network is applied with either 1, 2 or 3 hidden layers. In addition to that Principal Component Analysis (PCA) is eventually applied for the dimensionality reduction of the predictor set and the meta-parameters of the methods are optimized on the validation sample.
Neural Architecture Search with Bayesian Optimisation and Optimal Transport
Kandasamy, Kirthevasan, Neiswanger, Willie, Schneider, Jeff, Poczos, Barnabas, Xing, Eric
Bayesian Optimisation (BO) refers to a class of methods for global optimisation of a function $f$ which is only accessible via point evaluations. It is typically used in settings where $f$ is expensive to evaluate. A common use case for BO in machine learning is model selection, where it is not possible to analytically model the generalisation performance of a statistical model, and we resort to noisy and expensive training and validation procedures to choose the best model. Conventional BO methods have focused on Euclidean and categorical domains, which, in the context of model selection, only permits tuning scalar hyper-parameters of machine learning algorithms. However, with the surge of interest in deep learning, there is an increasing demand to tune neural network \emph{architectures}. In this work, we develop NASBOT, a Gaussian process based BO framework for neural architecture search. To accomplish this, we develop a distance metric in the space of neural network architectures which can be computed efficiently via an optimal transport program. This distance might be of independent interest to the deep learning community as it may find applications outside of BO. We demonstrate that NASBOT outperforms other alternatives for architecture search in several cross validation based model selection tasks on multi-layer perceptrons and convolutional neural networks.
Perceptrons in Swift – BPXL Craft – Medium
With the new machine learning (ML) frameworks introduced during WWDC 2017, iOS/macOS/tvOS programmers now have the tools to simplify the implementation of powerful ML algorithms within their apps. A simple hand-waving explanation of how neural networks work will get you surprisingly far, but a little more knowledge and experimentation will help remove some of the mystery behind how they actually work. This first article will implement a simple perceptron network using a Swift Playground to show how a basic network works. Perceptrons were invented in the late 1950s and implemented directly in hardware to explore machine vision. Perceptrons represent a simplified model of a biological neuron.
Listening to the World Improves Speech Command Recognition
McMahan, Brian (R7 Speech Sciences, Inc.) | Rao, Delip (R7 Speech Sciences, Inc.)
We study transfer learning in convolutional network architectures applied to the task of recognizing audio, such as environmental sound events and speech commands. Our key finding is that not only is it possible to transfer representations from an unrelated task like environmental sound classification to a voice-focused task like speech command recognition, but also that doing so improves accuracies significantly. We also investigate the effect of increased model capacity for transfer learning audio, by first validating known results from the field of Computer Vision of achieving better accuracies with increasingly deeper networks on two audio datasets: UrbanSound8k and Google Speech Commands. Then we propose a simple multiscale input representation using dilated convolutions and show that it is able to aggregate larger contexts and increase classification performance. Further, the models trained using a combination of transfer learning and multiscale input representations need only 50% of the training data to achieve similar accuracies as a freshly trained model with 100% of the training data. Finally, we demonstrate a positive interaction effect for the multiscale input and transfer learning, making a case for the joint application of the two techniques.
Investigating the Role of Ensemble Learning in High-Value Wine Identification
Portinale, Luigi (Computer Science Institute, University of Piemonte Orientale) | Locatelli, Monica (University of Piemonte Orientale)
We tackle the problem of authenticating high value Italian wines through machine learning classification. The problem is a seriuos one, since protection of high quality wines from forgeries is worth several million of Euros each year. In a previous work we have identified some base models (in particular classifiers based on Bayesian network (BNC), multi-layer perceptron (MLP) and sequential minimal optimization (SMO)) that well behave using unexpensive chemical analyses of the interested wines. In the present paper, we investigate the role of esemble learning in the construction of more robust classifiers; results suggest that, while bagging and boosting may significantly improve both BNC and MLP, the SMO model is already very robust and efficient as a base learner. We report on results concerning both cross validation on two different datasets, as well as experiments with models trained with the above datasets and tested with a dataset of potentially fake wines; this has been synthesized from a generative probabilistic model learned from real samples and expert knowledge.
Improving Variational Encoder-Decoders in Dialogue Generation
Shen, Xiaoyu (Max Planck Institute Informatics) | Su, Hui (Software Institute, University of Chinese Academy of Science) | Niu, Shuzi (Software Institute, University of Chinese Academy of Science) | Demberg, Vera (Saarland University)
Variational encoder-decoders (VEDs) have shown promising results in dialogue generation. However, the latent variable distributions are usually approximated by a much simpler model than the powerful RNN structure used for encoding and decoding, yielding the KL-vanishing problem and inconsistent training objective. In this paper, we separate the training step into two phases: The first phase learns to autoencode discrete texts into continuous embeddings, from which the second phase learns to generalize latent representations by reconstructing the encoded embedding. In this case, latent variables are sampled by transforming Gaussian noise through multi-layer perceptrons and are trained with a separate VED model, which has the potential of realizing a much more flexible distribution. We compare our model with current popular models and the experiment demonstrates substantial improvement in both metric-based and human evaluations.
Bayesian Functional Optimization
Vien, Ngo Anh (Queen's University Belfast) | Zimmermann, Heiko (Univeristy of Stuttgart) | Toussaint, Marc (Univeristy of Stuttgart)
Bayesian optimization (BayesOpt) is a derivative-free approach for sequentially optimizing stochastic black-box functions. Standard BayesOpt, which has shown many successes in machine learning applications, assumes a finite dimensional domain which often is a parametric space. The parameter space is defined by the features used in the function approximations which are often selected manually. Therefore, the performance of BayesOpt inevitably depends on the quality of chosen features. This paper proposes a new Bayesian optimization framework that is able to optimize directly on the domain of function spaces. The resulting framework, Bayesian Functional Optimization (BFO), not only extends the application domains of BayesOpt to functional optimization problems but also relaxes the performance dependency on the chosen parameter space. We model the domain of functions as a reproducing kernel Hilbert space (RKHS), and use the notion of Gaussian processes on a real separable Hilbert space. As a result, we are able to define traditional improvement-based (PI and EI) and optimistic acquisition functions (UCB) as functionals. We propose to optimize the acquisition functionals using analytic functional gradients that are also proved to be functions in a RKHS. We evaluate BFO in three typical functional optimization tasks: i) a synthetic functional optimization problem, ii) optimizing activation functions for a multi-layer perceptron neural network, and iii) a reinforcement learning task whose policies are modeled in RKHS.
Quantized Memory-Augmented Neural Networks
Park, Seongsik (Seoul National University) | Kim, Seijoon (Seoul National University) | Lee, Seil (Seoul National University) | Bae, Ho (Seoul National University) | Yoon, Sungroh (Seoul National University)
Memory-augmented neural networks (MANNs) refer to a class of neural network models equipped with external memory (such as neural Turing machines and memory networks). These neural networks outperform conventional recurrent neural networks (RNNs) in terms of learning long-term dependency, allowing them to solve intriguing AI tasks that would otherwise be hard to address. This paper concerns the problem of quantizing MANNs. Quantization is known to be effective when we deploy deep models on embedded systems with limited resources. Furthermore, quantization can substantially reduce the energy consumption of the inference procedure. These benefits justify recent developments of quantized multi layer perceptrons, convolutional networks, and RNNs. However, no prior work has reported the successful quantization of MANNs. The in-depth analysis presented here reveals various challenges that do not appear in the quantization of the other networks. Without addressing them properly, quantized MANNs would normally suffer from excessive quantization error which leads to degraded performance. In this paper, we identify memory addressing (specifically, content-based addressing) as the main reason for the performance degradation and propose a robust quantization method for MANNs to address the challenge. In our experiments, we achieved a computation-energy gain of 22× with 8-bit fixed-point and binary quantization compared to the floating-point implementation. Measured on the bAbI dataset, the resulting model, named the quantized MANN (Q-MANN), improved the error rate by 46% and 30% with 8-bit fixed-point and binary quantization, respectively, compared to the MANN quantized using conventional techniques.
Constructive Preference Elicitation Over Hybrid Combinatorial Spaces
Dragone, Paolo (University of Trento) | Teso, Stefano (KU Leuven) | Passerini, Andrea (University of Trento)
Peference elicitation is the task of suggesting a highly preferred configuration to a decision maker. The preferences are typically learned by querying the user for choice feedback over pairs or sets of objects. In its constructive variant, new objects are synthesized "from scratch" by maximizing an estimate of the user utility over a combinatorial (possibly infinite) space of candidates. In the constructive setting, most existing elicitation techniques fail because they rely on exhaustive enumeration of the candidates. A previous solution explicitly designed for constructive tasks comes with no formal performance guarantees, and can be very expensive in (or unapplicable to) problems with non-Boolean attributes. We propose the Choice Perceptron, a Perceptron-like algorithm for learning user preferences from set-wise choice feedback over constructive domains and hybrid Boolean-numeric feature spaces. We provide a theoretical analysis on the attained regret that holds for a large class of query selection strategies, and devise a heuristic strategy that aims at optimizing the regret in practice. Finally, we demonstrate its effectiveness by empirical evaluation against existing competitors on constructive scenarios of increasing complexity.
Towards Shockingly Easy Structured Classification: A Search-based Probabilistic Online Learning Framework
There are two major approaches for structured classification. One is the probabilistic gradient-based methods such as conditional random fields (CRF), which has high accuracy but with drawbacks: slow training, and no support of search-based optimization (which is important in many cases). The other one is the search-based learning methods such as perceptrons and margin infused relaxed algorithm (MIRA), which have fast training but also with drawbacks: low accuracy, no probabilistic information, and non-convergence in real-world tasks. We propose a novel and "shockingly easy" solution, a search-based probabilistic online learning method, to address most of those issues. This method searches the output candidates, derives probabilities, and conduct efficient online learning. We show that this method is with fast training, support search-based optimization, very easy to implement, with top accuracy, with probabilities, and with theoretical guarantees of convergence. Experiments on well-known tasks show that our method has better accuracy than CRF and almost as fast training speed as perceptron and MIRA. Results also show that SAPO can easily beat the state-of-the-art systems on those highly-competitive tasks, achieving record-breaking accuracies. The codes can be found at https://github.com/lancopku