Goto

Collaborating Authors

 different learning algorithm


Review for NeurIPS paper: Identifying Learning Rules From Neural Network Observables

Neural Information Processing Systems

Weaknesses: This paper has stuck with me, and I do want to emphasize just how interesting I find it. I am very much in favor of it, but the following list of weaknesses is holding me back from backing its acceptance. Broadly, I need more convincing that (1) discrimination is not trivially due to differences in learning alg performance, (2) how learning algorithm vs. architecture can ever be dissociated in model organisms, and (3) *why* would differences at the level of weights (fig 3) be indicative of different learning algorithms in a way that cannot be deduced via first principles (related to point 2). I am suspicious that the ability to discriminate between learning algorithm is driven by differences in their performance on imagenet. While it's not obvious to me *how* this would work, it seems plausible that learning algorithm differences e.g. I think that the authors need to control for this.


Analysis and Comparison of Different Learning Algorithms for Pattern Association Problems

Neural Information Processing Systems

As test cases we use simple pat(cid:173) tern association problems, such as the XOR-problem and symmetry de(cid:173) tection problems. The algorithms considered are either versions of the Boltzmann machine learning rule or based on the backpropagation of errors. We also propose and analyze a generalized delta rule for linear threshold units. We find that the performance of a given learning algorithm depends strongly on the type of units used. In particular, we observe that networks with 1 units quite generally exhibit a significantly better learning behavior than the correspon(cid:173) ding 0,1 versions.


Model Understanding With Azure Machine Learning - AI Summary

#artificialintelligence

Data scientists and model evaluators – At training time to help them to understand their model predictions and assess the fairness of their AI systems, enhancing their ability to debug and improve models. Model performance tab: With the predefined female and male cohorts, we can observe the different prediction distributions between males and female cohorts, with females experiencing higher probability rates of being rejected for a loan. We sort our top feature importances by the Female cohort, which indicates that while the feature for "Sex" is the second most important feature to contribute towards the model's predictions for individuals in the female cohort, they do not influence how the model makes predictions for individuals in the male cohort. The dependence plot for the feature "Sex" also shows that only the female group has positive feature importance towards the prediction of being rejected for a loan, whereas the model does not look at the feature "Sex" for males when making predictions. The original fairness dashboard also enables the comparison of multiple models, such as the models produced by different learning algorithms and different mitigation approaches.


A block-random algorithm for learning on distributed, heterogeneous data

Mohan, Prakash, de Frahan, Marc T. Henry, King, Ryan, Grout, Ray W.

arXiv.org Machine Learning

Most deep learning models are based on deep neural networks with multiple layers between input and output. The parameters defining these layers are initialized using random values and are "learned" from data, typically using stochastic gradient descent based algorithms. These algorithms rely on data being randomly shuffled before optimization. The randomization of the data prior to processing in batches that is formally required for stochastic gradient descent algorithm to effectively derive a useful deep learning model is expected to be prohibitively expensive for in situ model training because of the resulting data communications across the processor nodes. We show that the stochastic gradient descent (SGD) algorithm can still make useful progress if the batches are defined on a per-processor basis and processed in random order even though (i) the batches are constructed from data samples from a single class or specific flow region, and (ii) the overall data samples are heterogeneous. We present block-random gradient descent, a new algorithm that works on distributed, heterogeneous data without having to pre-shuffle. This algorithm enables in situ learning for exascale simulations. The performance of this algorithm is demonstrated on a set of benchmark classification models and the construction of a subgrid scale large eddy simulations (LES) model for turbulent channel flow using a data model similar to that which will be encountered in exascale simulation.


PAC-Bayes Analysis of Multi-view Learning

Sun, Shiliang, Shawe-Taylor, John, Mao, Liang

arXiv.org Artificial Intelligence

This paper presents eight PAC-Bayes bounds to analyze the generalization performance of multi-view classifiers. These bounds adopt data dependent Gaussian priors which emphasize classifiers with high view agreements. The center of the prior for the first two bounds is the origin, while the center of the prior for the third and fourth bounds is given by a data dependent vector. An important technique to obtain these bounds is two derived logarithmic determinant inequalities whose difference lies in whether the dimensionality of data is involved. The centers of the fifth and sixth bounds are calculated on a separate subset of the training set. The last two bounds use unlabeled data to represent view agreements and are thus applicable to semi-supervised multi-view learning. We evaluate all the presented multi-view PAC-Bayes bounds on benchmark data and compare them with previous single-view PAC-Bayes bounds. The usefulness and performance of the multi-view bounds are discussed.


An Empirical Study of Bagging Predictors for Different Learning Algorithms

Liang, Guohua (University of Technology, Sydney) | Zhu, Xingquan (University of Technology, Sydney) | Zhang, Chengqi (University of Technology, Sydney)

AAAI Conferences

Bagging is a simple yet effective design which combines multiple single learners to form an ensemble for prediction. Despite its popular usage in many real-world applications, existing research is mainly concerned with studying unstable learners as the key to ensure the performance gain of a bagging predictor, with many key factors remaining unclear. For example, it is not clear when a bagging predictor can outperform a single learner and what is the expected performance gain when different learning algorithms were used to form a bagging predictor. In this paper, we carry out comprehensive empirical studies to evaluate bagging predictors by using 12 different learning algorithms and 48 benchmark data-sets. Our analysis uses robustness and stability decompositions to characterize different learning algorithms, through which we rank all learning algorithms and comparatively study their bagging predictors to draw conclusions. Our studies assert that both stability and robustness are key requirements to ensure the high performance for building a bagging predictor. In addition, our studies demonstrated that bagging is statistically superior to most single base learners, except for KNN and Naïve Bayes (NB). Multi-layer perception (MLP), Naïve Bayes Trees (NBTree), and PART are the learning algorithms with the best bagging performance.


Analysis and Comparison of Different Learning Algorithms for Pattern Association Problems

Bernasconi, J.

Neural Information Processing Systems

ANALYSIS AND COMPARISON OF DIFFERENT LEARNING ALGORITHMS FOR PATTERN ASSOCIATION PROBLEMS J. Bernasconi Brown Boveri Research Center CH-S40S Baden, Switzerland ABSTRACT We investigate the behavior of different learning algorithms for networks of neuron-like units. As test cases we use simple pattern association problems, such as the XOR-problem and symmetry detection problems. The algorithms considered are either versions of the Boltzmann machine learning rule or based on the backpropagation of errors. We also propose and analyze a generalized delta rule for linear threshold units. We find that the performance of a given learning algorithm depends strongly on the type of units used.


Analysis and Comparison of Different Learning Algorithms for Pattern Association Problems

Bernasconi, J.

Neural Information Processing Systems

ANALYSIS AND COMPARISON OF DIFFERENT LEARNING ALGORITHMS FOR PATTERN ASSOCIATION PROBLEMS J. Bernasconi Brown Boveri Research Center CH-S40S Baden, Switzerland ABSTRACT We investigate the behavior of different learning algorithms for networks of neuron-like units. As test cases we use simple pattern association problems, such as the XOR-problem and symmetry detection problems. The algorithms considered are either versions of the Boltzmann machine learning rule or based on the backpropagation of errors. We also propose and analyze a generalized delta rule for linear threshold units. We find that the performance of a given learning algorithm depends strongly on the type of units used.