Goto

Collaborating Authors

 Perceptrons


Large-Scale Hierarchical Classification via Stochastic Perceptron

AAAI Conferences

Hierarchical classification (HC) plays an significant role in machine learning and data mining. However, most of the state-of-the-art HC algorithms suffer from high computational costs. To improve the performance of solving, we propose a stochastic perceptron (SP) algorithm in the large margin framework. In particular, a stochastic choice procedure is devised to decide the direction of next iteration. We prove that after finite iterations the SP algorithm yields a sub-optimal solution with high probability if the input instances are separable. For large-scale and high-dimensional data sets, we reform SP to the kernel version (KSP), which dramatically reduces the memory space needed. The KSP algorithm has the merit of low space complexity as well as low time complexity. The experimental results show that our KSP approach achieves almost the same accuracy as the contemporary algorithms on the real-world data sets, but with much less CPU running time.


A Data Mining Approach to Solve the Goal Scoring Problem

arXiv.org Artificial Intelligence

In soccer, scoring goals is a fundamental objective which depends on many conditions and constraints. Considering the RoboCup soccer 2D-simulator, this paper presents a data mining-based decision system to identify the best time and direction to kick the ball towards the goal to maximize the overall chances of scoring during a simulated soccer match. Following the CRISP-DM methodology, data for modeling were extracted from matches of major international tournaments (10691 kicks), knowledge about soccer was embedded via transformation of variables and a Multilayer Perceptron was used to estimate the scoring chance. Experimental performance assessment to compare this approach against previous LDA-based approach was conducted from 100 matches. Several statistical metrics were used to analyze the performance of the system and the results showed an increase of 7.7% in the number of kicks, producing an overall increase of 78% in the number of goals scored.


Spherical perceptron as a storage memory with limited errors

arXiv.org Machine Learning

It has been known for a long time that the classical spherical perceptrons can be used as storage memories. Seminal work of Gardner, \cite{Gar88}, started an analytical study of perceptrons storage abilities. Many of the Gardner's predictions obtained through statistical mechanics tools have been rigorously justified. Among the most important ones are of course the storage capacities. The first rigorous confirmations were obtained in \cite{SchTir02,SchTir03} for the storage capacity of the so-called positive spherical perceptron. These were later reestablished in \cite{TalBook} and a bit more recently in \cite{StojnicGardGen13}. In this paper we consider a variant of the spherical perceptron that operates as a storage memory but allows for a certain fraction of errors. In Gardner's original work the statistical mechanics predictions in this directions were presented sa well. Here, through a mathematically rigorous analysis, we confirm that the Gardner's predictions in this direction are in fact provable upper bounds on the true values of the storage capacity. Moreover, we then present a mechanism that can be used to lower these bounds. Numerical results that we present indicate that the Garnder's storage capacity predictions may, in a fairly wide range of parameters, be not that far away from the true values.


Discrete perceptrons

arXiv.org Machine Learning

Perceptrons have been known for a long time as a promising tool within the neural networks theory. The analytical treatment for a special class of perceptrons started in seminal work of Gardner \cite{Gar88}. Techniques initially employed to characterize perceptrons relied on a statistical mechanics approach. Many of such predictions obtained in \cite{Gar88} (and in a follow-up \cite{GarDer88}) were later on established rigorously as mathematical facts (see, e.g. \cite{SchTir02,SchTir03,TalBook,StojnicGardGen13,StojnicGardSphNeg13,StojnicGardSphErr13}). These typically related to spherical perceptrons. A lot of work has been done related to various other types of perceptrons. Among the most challenging ones are what we will refer to as the discrete perceptrons. An introductory statistical mechanics treatment of such perceptrons was given in \cite{GutSte90}. Relying on results of \cite{Gar88}, \cite{GutSte90} characterized many of the features of several types of discrete perceptrons. We in this paper, consider a similar subclass of discrete perceptrons and provide a mathematically rigorous set of results related to their performance. As it will turn out, many of the statistical mechanics predictions obtained for discrete predictions will in fact appear as mathematically provable bounds. This will in a way emulate a similar type of behavior we observed in \cite{StojnicGardGen13,StojnicGardSphNeg13,StojnicGardSphErr13} when studying spherical perceptrons.


Pushing Stochastic Gradient towards Second-Order Methods -- Backpropagation Learning with Transformations in Nonlinearities

arXiv.org Machine Learning

Recently, we proposed to transform the outputs of each hidden neuron in a multi-layer perceptron network to have zero output and zero slope on average, and use separate shortcut connections to model the linear dependencies instead. We continue the work by firstly introducing a third transformation to normalize the scale of the outputs of each hidden neuron, and secondly by analyzing the connections to second order optimization methods. We show that the transformations make a simple stochastic gradient behave closer to second-order optimization methods and thus speed up learning. This is shown both in theory and with experiments. The experiments on the third transformation show that while it further increases the speed of learning, it can also hurt performance by converging to a worse local optimum, where both the inputs and outputs of many hidden neurons are close to zero.


Online Learning with Pairwise Loss Functions

arXiv.org Machine Learning

Efficient online learning with pairwise loss functions is a crucial component in building large-scale learning system that maximizes the area under the Receiver Operator Characteristic (ROC) curve. In this paper we investigate the generalization performance of online learning algorithms with pairwise loss functions. We show that the existing proof techniques for generalization bounds of online algorithms with a univariate loss can not be directly applied to pairwise losses. In this paper, we derive the first result providing data-dependent bounds for the average risk of the sequence of hypotheses generated by an arbitrary online learner in terms of an easily computable statistic, and show how to extract a low risk hypothesis from the sequence. We demonstrate the generality of our results by applying it to two important problems in machine learning. First, we analyze two online algorithms for bipartite ranking; one being a natural extension of the perceptron algorithm and the other using online convex optimization. Secondly, we provide an analysis for the risk bound for an online algorithm for supervised metric learning.


Piecewise Linear Multilayer Perceptrons and Dropout

arXiv.org Machine Learning

We propose a new type of hidden layer for a multilayer perceptron, and demonstrate that it obtains the best reported performance for an MLP on the MNIST dataset.


Perceptron Learning of SAT

Neural Information Processing Systems

Boolean satisfiability (SAT) as a canonical NP-complete decision problem is one of the most important problems in computer science. In practice, real-world SAT sentences are drawn from a distribution that may result in efficient algorithms for their solution. Such SAT instances are likely to have shared characteristics and substructures. This work approaches the exploration of a family of SAT solvers as a learning problem. In particular, we relate polynomial time solvability of a SAT subset to a notion of margin between sentences mapped by a feature function into a Hilbert space. Provided this mapping is based on polynomial time computable statistics of a sentence, we show that the existance of a margin between these data points implies the existance of a polynomial time solver for that SAT subset based on the Davis-Putnam-Logemann-Loveland algorithm. Furthermore, we show that a simple perceptron-style learning rule will find an optimal SAT solver with a bounded number of training updates. We derive a linear time computable set of features and show analytically that margins exist for important polynomial special cases of SAT. Empirical results show an order of magnitude improvement over a state-of-the-art SAT solver on a hardware verification task.


Local Supervised Learning through Space Partitioning

Neural Information Processing Systems

We develop a novel approach for supervised learning based on adaptively partitioning the feature space into different regions and learning local region-specific classifiers. We formulate an empirical risk minimization problem that incorporates both partitioning and classification in to a single global objective. We show that space partitioning can be equivalently reformulated as a supervised learning problem and consequently any discriminative learning method can be utilized in conjunction with our approach. Nevertheless, we consider locally linear schemes by learning linear partitions and linear region classifiers. Locally linear schemes can not only approximate complex decision boundaries and ensure low training error but also provide tight control on over-fitting and generalization error. We train locally linear classifiers by using LDA, logistic regression and perceptrons, and so our scheme is scalable to large data sizes and high-dimensions. We present experimental results demonstrating improved performance over state of the art classification techniques on benchmark datasets. We also show improved robustness to label noise.


Random Spanning Trees and the Prediction of Weighted Graphs

arXiv.org Machine Learning

We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterization, we obtain a simple randomized algorithm achieving in expectation the optimal mistake bound on any polynomially connected weighted graph. Our algorithm draws a random spanning tree of the original graph and then predicts the nodes of this tree in constant expected amortized time and linear space. Experiments on real-world datasets show that our method compares well to both global (Perceptron) and local (label propagation) methods, while being generally faster in practice.