goodness score
Mono-Forward: Backpropagation-Free Algorithm for Efficient Neural Network Training Harnessing Local Errors
Gong, James, Li, Bruce, Abdulla, Waleed
Backpropagation [1], while foundational in training neural networks [2], faces critical limitations in both deep learning and neuroscience, highlighting the importance of exploring alternative methodologies. The concept of Backward Locking exemplifies a significant bottleneck inherent to BP, where weight updates across the network must await the completion of both forward and backward passes for each data batch, hampering the efficient distribution of computation and parallelization across the network [3-5]. In BP, error gradients can exhibit significant variations in magnitude as they are propagated backwards through the network's layers, leading to two prominent issues: vanishing and exploding gradients. Vanishing gradients occur when the gradients diminish to such small values that they fail to effectively update the weights of earlier layers, thus severely hindering the training of deep neural networks. On the other hand, exploding gradients present a challenge by causing disproportionately large updates to the weights, potentially destabilizing the network.
On the Improvement of Generalization and Stability of Forward-Only Learning via Neural Polarization
Terres-Escudero, Erik B., Del Ser, Javier, Garcia-Bringas, Pablo
Forward-only learning algorithms have recently gained attention as alternatives to gradient backpropagation, replacing the backward step of this latter solver with an additional contrastive forward pass. Among these approaches, the so-called Forward-Forward Algorithm (FFA) has been shown to achieve competitive levels of performance in terms of generalization and complexity. Networks trained using FFA learn to contrastively maximize a layer-wise defined goodness score when presented with real data (denoted as positive samples) and to minimize it when processing synthetic data (corr. negative samples). However, this algorithm still faces weaknesses that negatively affect the model accuracy and training stability, primarily due to a gradient imbalance between positive and negative samples. To overcome this issue, in this work we propose a novel implementation of the FFA algorithm, denoted as Polar-FFA, which extends the original formulation by introducing a neural division (\emph{polarization}) between positive and negative instances. Neurons in each of these groups aim to maximize their goodness when presented with their respective data type, thereby creating a symmetric gradient behavior. To empirically gauge the improved learning capabilities of our proposed Polar-FFA, we perform several systematic experiments using different activation and goodness functions over image classification datasets. Our results demonstrate that Polar-FFA outperforms FFA in terms of accuracy and convergence speed. Furthermore, its lower reliance on hyperparameters reduces the need for hyperparameter tuning to guarantee optimal generalization capabilities, thereby allowing for a broader range of neural network configurations.
Promoting Research Collaboration with Open Data Driven Team Recommendation in Response to Call for Proposals
Valluru, Siva Likitha, Srivastava, Biplav, Paladi, Sai Teja, Yan, Siwen, Natarajan, Sriraam
Building teams and promoting collaboration are two very common business activities. An example of these are seen in the TeamingForFunding problem, where research institutions and researchers are interested to identify collaborative opportunities when applying to funding agencies in response to latter's calls for proposals. We describe a novel system to recommend teams using a variety of AI methods, such that (1) each team achieves the highest possible skill coverage that is demanded by the opportunity, and (2) the workload of distributing the opportunities is balanced amongst the candidate members. We address these questions by extracting skills latent in open data of proposal calls (demand) and researcher profiles (supply), normalizing them using taxonomies, and creating efficient algorithms that match demand to supply. We create teams to maximize goodness along a novel metric balancing short- and long-term objectives. We validate the success of our algorithms (1) quantitatively, by evaluating the recommended teams using a goodness score and find that more informed methods lead to recommendations of smaller number of teams but higher goodness, and (2) qualitatively, by conducting a large-scale user study at a college-wide level, and demonstrate that users overall found the tool very useful and relevant. Lastly, we evaluate our system in two diverse settings in US and India (of researchers and proposal calls) to establish generality of our approach, and deploy it at a major US university for routine use.
Assessing the value of a candidate. Comparing belief function and possibility theories
Dubois, Didier, Grabisch, Michel, Prade, Henri, Smets, Philippe
Thomson-CSF, Corporate Research Laboratory Domaine de Corbeville 91404 Orsay, France The problem of assessing the value of a candidate is viewed here as a multiple combination problem. On the one hand a candidate can be evaluated according to different criteria, and on the other hand several experts are supposed to assess the value of candidates according to each criterion. Criteria are not equally important, experts are not equally competent or reliable. Moreover levels of satisfaction of criteria, or levels of confidence are only assumed to take their values in linearly ordered scales, whose nature is often qualitative. The problem is discussed within two frameworks, the transferable belief model and the qualitative possibility theory. They respectively offer a quantitative and a qualitative setting for handling the problem, providing thus a way to compare the nature of the underlying assumptions.
Recognizing Hand-written Digits Using Hierarchical Products of Experts
Mayraz, Guy, Hinton, Geoffrey E.
The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n - l)th level model. After training, each level produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. 1 Learning products of stochastic binary experts Hinton [1] describes a learning algorithm for probabilistic generative models that are composed of a number of experts. Each expert specifies a probability distribution over the visible variables and the experts are combined by multiplying these distributions together and renormalizing.
Recognizing Hand-written Digits Using Hierarchical Products of Experts
Mayraz, Guy, Hinton, Geoffrey E.
The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n - l)th level model. After training, each level produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classification network that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods, demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. 1 Learning products of stochastic binary experts Hinton [1] describes a learning algorithm for probabilistic generative models that are composed of a number of experts. Each expert specifies a probability distribution over the visible variables and the experts are combined by multiplying these distributions together and renormalizing.
Recognizing Hand-written Digits Using Hierarchical Products of Experts
Mayraz, Guy, Hinton, Geoffrey E.
The product of experts learning procedure [1] can discover a set of stochastic binary features that constitute a nonlinear generative model of handwritten images of digits. The quality of generative models learned in this way can be assessed by learning a separate model for each class of digit and then comparing the unnormalized probabilities of test images under the 10 different class-specific models. To improve discriminative performance, it is helpful to learn a hierarchy of separate models for each digit class. Each model in the hierarchy has one layer of hidden units and the nth level model is trained on data that consists of the activities of the hidden units in the already trained (n - l)th level model. After training, eachlevel produces a separate, unnormalized log probabilty score. With a three-level hierarchy for each of the 10 digit classes, a test image produces 30 scores which can be used as inputs to a supervised, logistic classificationnetwork that is trained on separate data. On the MNIST database, our system is comparable with current state-of-the-art discriminative methods,demonstrating that the product of experts learning procedure can produce effective generative models of high-dimensional data. 1 Learning products of stochastic binary experts Hinton [1] describes a learning algorithm for probabilistic generative models that are composed ofa number of experts. Each expert specifies a probability distribution over the visible variables and the experts are combined by multiplying these distributions together and renormalizing.