Goto

Collaborating Authors

 Perceptrons


Regularising Non-linear Models Using Feature Side-information

arXiv.org Machine Learning

Very often features come with their own vectorial descriptions which provide detailed information about their properties. We refer to these vectorial descriptions as feature side-information. In the standard learning scenario, input is represented as a vector of features and the feature side-information is most often ignored or used only for feature selection prior to model fitting. We believe that feature side-information which carries information about features intrinsic property will help improve model prediction if used in a proper way during learning process. In this paper, we propose a framework that allows for the incorporation of the feature side-information during the learning of very general model families to improve the prediction performance. We control the structures of the learned models so that they reflect features similarities as these are defined on the basis of the side-information. We perform experiments on a number of benchmark datasets which show significant predictive performance gains, over a number of baselines, as a result of the exploitation of the side-information.


Neural Decision Trees

arXiv.org Machine Learning

In this paper we propose a synergistic melting of neural networks and decision trees (DT) we call neural decision trees (NDT). NDT is an architecture a la decision tree where each splitting node is an independent multilayer perceptron allowing oblique decision functions or arbritrary nonlinear decision function if more than one layer is used. This way, each MLP can be seen as a node of the tree. We then show that with the weight sharing asumption among those units, we end up with a Hashing Neural Network (HNN) which is a multilayer perceptron with sigmoid activation function for the last layer as opposed to the standard softmax. The output units then jointly represent the probability to be in a particular region. The proposed framework allows for global optimization as opposed to greedy in DT and differentiability w.r.t. all parameters and the input, allowing easy integration in any learnable pipeline, for example after CNNs for computer vision tasks. We also demonstrate the modeling power of HNN allowing to learn union of disjoint regions for final clustering or classification making it more general and powerful than standard softmax MLP requiring linear separability thus reducing the need on the inner layer to perform complex data transformations. We finally show experiments for supervised, semi-suppervised and unsupervised tasks and compare results with standard DTs and MLPs.


Optimization of distributions differences for classification

arXiv.org Machine Learning

In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability.


Machine learning, emphasize certain observations?

#artificialintelligence

I have a multi-class machine learning problem for which I will try different methods on such as logistic regression, decision trees, multilayer perceptron etc. The observations in the data set have an attribute which is an index from 1-5 which defines how important it is that a certain observation gets correctly classified (index 1 very important, 5 not important at all). Question 1: How should I emphasize to the models that the lower index observations have greater importance? I am thinking of duplicating these observations so the models fit the lower index observations more well, what other approaches are possible? Question 2: What performance evaluation criterias can I use to find the models that predict these low index observations well?


Perceptron Neural Designer

#artificialintelligence

One of the hotests topics of artificial intelligence are neural networks. Neural Networks are computational models based on the structure of the brain. These are information processing structures whose most significant property is their ability to learn from data. These techniques have achieved great success in domains ranging from marketing to engineering. There are many different types of neural networks, from which the multilayer perceptron is the most important one.


Causal Regularization

arXiv.org Machine Learning

In application domains such as healthcare, we want accurate predictive models that are also causally interpretable. In pursuit of such models, we propose a causal regularizer to steer predictive models towards causally-interpretable solutions and theoretically study its properties. In a large-scale analysis of Electronic Health Records (EHR), our causally-regularized model outperforms its L1-regularized counterpart in causal accuracy and is competitive in predictive performance. We perform non-linear causality analysis by causally regularizing a special neural network architecture. We also show that the proposed causal regularizer can be used together with neural representation learning algorithms to yield up to 20% improvement over multilayer perceptron in detecting multivariate causation, a situation common in healthcare, where many causal factors should occur simultaneously to have an effect on the target variable.


Healthy Cognitive Aging: A Hybrid Random Vector Functional-Link Model for the Analysis of Alzheimer’s Disease

AAAI Conferences

Alzheimer's disease (AD) is a genetically complex neurodegenerative disease, which leads to irreversible brain damage, severe cognitive problems and ultimately death. A number of clinical trials and study initiatives have been set up to investigate AD pathology, leading to large amounts of high dimensional heterogeneous data (biomarkers) for analysis. This paper focuses on combining clinical features from different modalities, including medical imaging, cerebrospinal fluid (CSF), etc., to diagnose AD and predict potential progression. Due to privacy and legal issues involved with clinical research, the study cohort (number of patients) is relatively small, compared to thousands of available biomarkers (predictors). We propose a hybrid pathological analysis model, which integrates manifold learning and Random Vector functional-link network (RVFL) so as to achieve better ability to extract discriminant information with limited training materials. Furthermore, we model (current and future) cognitive healthiness as a regression problem about age. By comparing the difference between predicted age and actual age, we manage to show statistical differences between different pathological stages. Verification tests are conducted based on the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database. Extensive comparison is made against different machine learning algorithms, i.e. Support Vector Machine (SVM), Random Forest (RF), Decision Tree and Multilayer Perceptron (MLP). Experimental results show that our proposed algorithm achieves better results than the comparison targets, which indicates promising robustness for practical clinical implementation.


Multi-Layer Generalized Linear Estimation

arXiv.org Machine Learning

We consider the problem of reconstructing a signal from multi-layered (possibly) non-linear measurements. Using non-rigorous but standard methods from statistical physics we present the Multi-Layer Approximate Message Passing (ML-AMP) algorithm for computing marginal probabilities of the corresponding estimation problem and derive the associated state evolution equations to analyze its performance. We also give the expression of the asymptotic free energy and the minimal information-theoretically achievable reconstruction error. Finally, we present some applications of this measurement model for compressed sensing and perceptron learning with structured matrices/patterns, and for a simple model of estimation of latent variables in an auto-encoder.


Neural Networks with R – A Simple Example

#artificialintelligence

In this tutorial a neural network (or Multilayer perceptron depending on naming convention) will be build that is able to take a number and calculate the square root (or as close to as possible). Later tutorials will build upon this to make forcasting / trading models.The R library'neuralnet' will be used to train and build the neural network. There is lots of good literature on neural networks freely available on the internet, a good starting point is the neural network handout by Dr Mark Gayles at the Engineering Department Cambridge University http://mi.eng.cam.ac.uk/ mjfg/local/I10/i10_hand4.pdf, it covers just enough to get an understanding of what a neural network is and what it can do without being too mathematically advanced to overwhelm the reader. The tutorial will produce the neural network shown in the image below. It is going to take a single input (the number that you want square rooting) and produce a single output (the square root of the input).


AI Horizon: Perceptrons - Basic Neural Networking

AITopics Original Links

The main feature of perceptrons is that they can be trained (or learn) to behave a certain way. One popular beginner's assignment is to have a perceptron model (that is, learn to be) a basic boolean function such as AND or OR. Perceptron learning is guided, that is, you have to have something that the perceptron can imitate. So, the perceptron learns like this: it produces an output, compares the output to what the output should be, and then adjusts itself a little bit. After repeating this cycle enough times, the perceptron will have converged (a technical name for learned) to the correct behavior much like a child learns new words like glriber.