AITopics | Perceptrons

Collaborating Authors

Perceptrons

News Overviews Instructional Materials AI-Alerts Classics

Predtron: A Family of Online Algorithms for General Prediction Problems

Jain, Prateek, Natarajan, Nagarajan, Tewari, Ambuj

Neural Information Processing SystemsDec-31-2015

Modern prediction problems arising in multilabel learning and learning to rank pose unique challenges to the classical theory of supervised learning. These problems have large prediction and label spaces of a combinatorial nature and involve sophisticated loss functions. We offer a general framework to derive mistake driven online algorithms and associated loss bounds. The key ingredients in our framework are a general loss function, a general vector space representation of predictions, and a notion of margin with respect to a general norm. Our general algorithm, Predtron, yields the perceptron algorithm and its variants when instantiated on classic problems such as binary classification, multiclass classification, ordinal regression, and multilabel classification. For multilabel ranking and subset ranking, we derive novel algorithms, notions of margins, and loss bounds. A simulation study confirms the behavior predicted by our bounds and demonstrates the flexibility of the design choices in our framework.

artificial intelligence, inductive learning, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

A Hybrid Neural Model for Type Classification of Entity Mentions

Dong, Li (Beihang University) | Wei, Furu (Microsoft Research) | Sun, Hong (Microsoft Corporation) | Zhou, Ming (Microsoft Research) | Xu, Ke (Beihang University)

AAAI ConferencesJul-15-2015

The semantic class (i.e., type) of an entity plays a vital role in many natural language processing tasks, such as question answering. However, most of existing type classification systems extensively rely on hand-crafted features. This paper introduces a hybrid neural model which classifies entity mentions to a wide-coverage set of 22 types derived from DBpedia. It consists of two parts. The mention model uses recurrent neural networks to recursively obtain the vector representation of an entity mention from the words it contains. The context model, on the other hand, employs multilayer perceptrons to obtain the hidden representation for contextual information of a mention. Representations obtained by the two parts are used together to predict the type distribution. Using automatically generated data, these two parts are jointly learned. Experimental studies illustrate that the proposed approach outperforms baseline methods. Moreover, when type information provided by our method is used in a question answering system, we observe a 14.7% relative improvement for the top-1 accuracy of answers.

entity mention, representation, vector, (17 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

Asia > China > Beijing > Beijing (0.04)
Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.04)
North America > United States > West Virginia (0.04)
North America > United States > Iowa (0.04)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.69)
Health & Medicine > Therapeutic Area > Immunology (0.69)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.57)

Add feedback

Multitask Coactive Learning

Goetschalckx, Robby (Oregon State University) | Fern, Alan (Oregon State University) | Tadepalli, Prasad (Oregon State University)

AAAI ConferencesJul-15-2015

In this paper we investigate the use of coactive learning in a multitask setting. In coactive learning, an expert presents the learner with a problem and the learner returns a candidate solution. The expert then improves on the solution if necessary and presents the improved solution to the learner. The goal for the learner is to learn to produce solutions which cannot be further improved by the expert while minimizing the average expert effort. In this paper, we consider the setting where there are multiple experts (tasks), and in each iteration one expert presents a problem to the learner. While the experts are expected to have different solution preferences, they are also assumed to share similarities, which should enable generalization across experts. We analyze several algorithms for this setting and derive bounds on the average expert effort during learning. Our main contribution is the balanced Perceptron algorithm, which is the first coactive learning algorithm that is both able to generalize across experts when possible, while also guaranteeing convergence to optimal solutions for individual experts. Our experiments in three domains confirm that this algorithm is effective in the multitask setting, compared to natural baselines.

algorithm, generalization, vector, (15 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country:

North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.50)

Add feedback

Surrogate Functions for Maximizing Precision at the Top

Kar, Purushottam, Narasimhan, Harikrishna, Jain, Prateek

arXiv.org Machine LearningMay-26-2015

The problem of maximizing precision at the top of a ranked list, often dubbed Precision@k (prec@k), finds relevance in myriad learning applications such as ranking, multi-label classification, and learning with severe label imbalance. However, despite its popularity, there exist significant gaps in our understanding of this problem and its associated performance measure. The most notable of these is the lack of a convex upper bounding surrogate for prec@k. We also lack scalable perceptron and stochastic gradient descent algorithms for optimizing this performance measure. In this paper we make key contributions in these directions. At the heart of our results is a family of truly upper bounding surrogates for prec@k. These surrogates are motivated in a principled manner and enjoy attractive properties such as consistency to prec@k under various natural margin/noise conditions. These surrogates are then used to design a class of novel perceptron algorithms for optimizing prec@k with provable mistake bounds. We also devise scalable stochastic gradient descent style methods for this problem with provable convergence bounds. Our proofs rely on novel uniform convergence bounds which require an in-depth analysis of the structural properties of prec@k and its surrogates. We conclude with experimental results comparing our algorithms with state-of-the-art cutting plane and stochastic gradient algorithms for maximizing prec@k.

artificial intelligence, machine learning, prec, (19 more...)

arXiv.org Machine Learning

1505.06813

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.77)

Add feedback

Techniques for Learning Binary Stochastic Feedforward Neural Networks

Raiko, Tapani, Berglund, Mathias, Alain, Guillaume, Dinh, Laurent

arXiv.org Machine LearningApr-9-2015

Stochastic binary hidden units in a multi-layer perceptron (MLP) network give at least three potential benefits when compared to deterministic MLP networks. (1) They allow to learn one-to-many type of mappings. (2) They can be used in structured prediction problems, where modeling the internal structure of the output is important. (3) Stochasticity has been shown to be an excellent regularizer, which makes generalization performance potentially better in general. However, training stochastic networks is considerably more difficult. We study training using M samples of hidden activations per input. We show that the case M=1 leads to a fundamentally different behavior where the network tries to avoid stochasticity. We propose two new estimators for the training gradient and propose benchmark tests for comparing training algorithms. Our experiments confirm that training stochastic networks is difficult and show that the proposed two estimators perform favorably among all the five known estimators.

artificial intelligence, estimator, machine learning, (15 more...)

arXiv.org Machine Learning

1406.2989

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback

Joint Anaphoricity Detection and Coreference Resolution with Constrained Latent Structures

Lassalle, Emmanuel (Université Paris-Diderot) | Denis, Pascal (INRIA)

AAAI ConferencesMar-6-2015

This paper introduces a new structured model for learning anaphoricity detection and coreference resolution in a joint fashion. Specifically,we use a latent tree to represent the full coreference and anaphoric structure of a document at a global level, and we jointly learn the parameters of the two models using a version of the structured perceptron algorithm. Our joint structured model is further refined by the use of pairwise constraints which help the model to capture accurately certain patterns of coreference. Our experiments on the CoNLL-2012 English datasets show large improvements in both coreference resolution and anaphoricity detection, compared to various competing architectures. Our best coreference system obtains a CoNLL score of 81.97 on gold mentions, which is to date the best score reported on this setting.

artificial intelligence, machine learning, natural language, (16 more...)

AAAI Conferences

Twenty-Ninth AAAI Conference on Artificial Intelligence

Country: Europe (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

Distributed Decision Trees

İrsoy, Ozan, Alpaydın, Ethem

arXiv.org Machine LearningDec-19-2014

Recently proposed budding tree is a decision tree algorithm in which every node is part internal node and part leaf. This allows representing every decision tree in a continuous parameter space, and therefore a budding tree can be jointly trained with backpropagation, like a neural network. Even though this continuity allows it to be used in hierarchical representation learning, the learned representations are local: Activation makes a soft selection among all root-to-leaf paths in a tree. In this work we extend the budding tree and propose the distributed tree where the children use different and independent splits and hence multiple paths in a tree can be traversed at the same time. This ability to combine multiple paths gives the power of a distributed representation, as in a traditional perceptron layer. We show that distributed trees perform comparably or better than budding and traditional hard trees on classification and regression tasks.

artificial intelligence, decision tree, machine learning, (20 more...)

arXiv.org Machine Learning

1412.6388

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.35)

Add feedback

Reducing the Effects of Detrimental Instances

Smith, Michael R., Martinez, Tony

arXiv.org Machine LearningOct-14-2014

Not all instances in a data set are equally beneficial for inducing a model of the data. Some instances (such as outliers or noise) can be detrimental. However, at least initially, the instances in a data set are generally considered equally in machine learning algorithms. Many current approaches for handling noisy and detrimental instances make a binary decision about whether an instance is detrimental or not. In this paper, we 1) extend this paradigm by weighting the instances on a continuous scale and 2) present a methodology for measuring how detrimental an instance may be for inducing a model of the data. We call our method of identifying and weighting detrimental instances reduced detrimental instance learning (RDIL). We examine RIDL on a set of 54 data sets and 5 learning algorithms and compare RIDL with other weighting and filtering approaches. RDIL is especially useful for learning algorithms where every instance can affect the classification boundary and the training instances are considered individually, such as multilayer perceptrons trained with backpropagation (MLPs). Our results also suggest that a more accurate estimate of which instances are detrimental can have a significant positive impact for handling them.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1406.2237

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Transformed Representations for Convolutional Neural Networks in Diabetic Retinopathy Screening

Lim, Gilbert (National University of Singapore) | Lee, Mong Li (National University of Singapore) | Hsu, Wynne (National University of Singapore) | Wong, Tien Yin (Singapore National Eye Centre)

AAAI ConferencesJul-22-2014

Convolutional neural networks (CNNs) are flexible, biologically-inspired variants of multi-layer perceptrons that have proven themselves to be exceptionally suited to discriminative vision tasks. However, relatively little is known on whether they can be made both more efficient and more accurate, by introducing suitable transformations that exploit general knowledge of the target classes. We demonstrate this functionality through pre-segmentation of input images with a fast and robust but loose segmentation step, to obtain a set of candidate objects. These objects then undergo a spatial transformation into a reduced space, retaining but a compact high-level representation of their appearance. Additional attributes may be abstracted as raw features that are incorporated after the convolutional phase of the network. Finally, we compare its performance against existing approaches on the challenging problem of detecting lesions in retinal images.

convolutional neural network, diabetic retinopathy screening, transformed representation

AAAI Conferences

Workshops at the Twenty-Eighth AAAI Conference on Artificial Intelligence

Industry:

Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (0.69)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.53)

Add feedback

Data Assimilation by Artificial Neural Networks for an Atmospheric General Circulation Model: Conventional Observation

Cintra, Rosangela S., Velho, Haroldo F. de Campos

arXiv.org Artificial IntelligenceJul-16-2014

This paper presents an approach for employing artificial neural networks (NN) to emulate an ensemble Kalman filter (EnKF) as a method of data assimilation. The assimilation methods are tested in the Simplified Parameterizations PrimitivE-Equation Dynamics (SPEEDY) model, an atmospheric general circulation model (AGCM), using synthetic observational data simulating localization of balloon soundings. For the data assimilation scheme, the supervised NN, the multilayer perceptrons (MLP-NN), is applied. The MLP-NN are able to emulate the analysis from the local ensemble transform Kalman filter (LETKF). After the training process, the method using the MLP-NN is seen as a function of data assimilation. The NN were trained with data from first three months of 1982, 1983, and 1984. A hind-casting experiment for the 1985 data assimilation cycle using MLP-NN were performed with synthetic observations for January 1985. The numerical results demonstrate the effectiveness of the NN technique for atmospheric data assimilation. The results of the NN analyses are very close to the results from the LETKF analyses, the differences of the monthly average of absolute temperature analyses is of order 0.02. The simulations show that the major advantage of using the MLP-NN is better computational performance, since the analyses have similar quality. The CPU-time cycle assimilation with MLP-NN is 90 times faster than cycle assimilation with LETKF for the numerical experiment.

artificial intelligence, data assimilation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1407.436

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(5 more...)

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.69)

Add feedback