Laviolette, François
PAC-Bayesian Learning of Aggregated Binary Activated Neural Networks with Probabilities over Representations
Fortier-Dubois, Louis, Letarte, Gaël, Leblanc, Benjamin, Laviolette, François, Germain, Pascal
Considering a probability distribution over parameters is known as an efficient strategy to learn a neural network with non-differentiable activation functions. We study the expectation of a probabilistic neural network as a predictor by itself, focusing on the aggregation of binary activated neural networks with normal distributions over real-valued weights. Our work leverages a recent analysis derived from the PAC-Bayesian framework that derives tight generalization bounds and learning procedures for the expected output value of such an aggregation, which is given by an analytical expression. While the combinatorial nature of the latter has been circumvented by approximations in previous works, we show that the exact computation remains tractable for deep but narrow neural networks, thanks to a dynamic programming approach. This leads us to a peculiar bound minimization learning algorithm for binary activated neural networks, where the forward pass propagates probabilities over representations instead of activation values. A stochastic counterpart that scales to wide architectures is proposed.
Implicit Variational Inference: the Parameter and the Predictor Space
Pequignot, Yann, Alain, Mathieu, Dallaire, Patrick, Yeganehparast, Alireza, Germain, Pascal, Desharnais, Josée, Laviolette, François
Having access to accurate confidence levels along with the predictions allows to determine whether making a decision is worth the risk. Under the Bayesian paradigm, the posterior distribution over parameters is used to capture model uncertainty, a valuable information that can be translated into predictive uncertainty. However, computing the posterior distribution for high capacity predictors, such as neural networks, is generally intractable, making approximate methods such as variational inference a promising alternative. While most methods perform inference in the space of parameters, we explore the benefits of carrying inference directly in the space of predictors. Relying on a family of distributions given by a deep generative neural network, we present two ways of carrying variational inference: one in \emph{parameter space}, one in \emph{predictor space}. Importantly, the latter requires us to choose a distribution of inputs, therefore allowing us at the same time to explicitly address the question of \emph{out-of-distribution} uncertainty. We explore from various perspectives the implications of working in the predictor space induced by neural networks as opposed to the parameter space, focusing mainly on the quality of uncertainty estimation for data lying outside of the training distribution. We compare posterior approximations obtained with these two methods to several standard methods and present results showing that variational approximations learned in the predictor space distinguish themselves positively from those trained in the parameter space.
Virtual Reality to Study the Gap Between Offline and Real-Time EMG-based Gesture Recognition
Côté-Allard, Ulysse, Gagnon-Turcotte, Gabriel, Phinyomark, Angkoon, Glette, Kyrre, Scheme, Erik, Laviolette, François, Gosselin, Benoit
Within sEMG-based gesture recognition, a chasm exists in the literature between offline accuracy and real-time usability of a classifier. This gap mainly stems from the four main dynamic factors in sEMG-based gesture recognition: gesture intensity, limb position, electrode shift and transient changes in the signal. These factors are hard to include within an offline dataset as each of them exponentially augment the number of segments to be recorded. On the other hand, online datasets are biased towards the sEMG-based algorithms providing feedback to the participants, limiting the usability of such datasets as benchmarks. This paper proposes a virtual reality (VR) environment and a real-time experimental protocol from which the four main dynamic factors can more easily be studied. During the online experiment, the gesture recognition feedback is provided through the leap motion camera, enabling the proposed dataset to be re-used to compare future sEMG-based algorithms. 20 able-bodied persons took part in this study, completing three to four sessions over a period spanning between 14 and 21 days. Finally, TADANN, a new transfer learning-based algorithm, is proposed for long term gesture classification and significantly (p<0.05) outperforms fine-tuning a network.
Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks
Letarte, Gaël, Germain, Pascal, Guedj, Benjamin, Laviolette, François
We present a comprehensive study of multilayer neural networks with binary activation, relying on the PAC-Bayesian theory. Our contributions are twofold: (i) we develop an end-to-end framework to train a binary activated deep neural network, overcoming the fact that binary activation function is non-differentiable; (ii) we provide nonvacuous PAC-Bayesian generalization bounds for binary activated deep neural networks. Noteworthy, our results are obtained by minimizing the expected loss of an architecture-dependent aggregation of binary activated deep neural networks. The performance of our approach is assessed on a thorough numerical experiment protocol on real-life datasets.
Deep Learning for Electromyographic Hand Gesture Signal Classification by Leveraging Transfer Learning
Côté-Allard, Ulysse, Fall, Cheikh Latyr, Drouin, Alexandre, Campeau-Lecours, Alexandre, Gosselin, Clément, Glette, Kyrre, Laviolette, François, Gosselin, Benoit
In recent years, the use of deep learning algorithms has become increasingly more prominent. Within the field of electromyography-based gesture recognition however, deep learning algorithms are seldom employed. This is due in part to the large quantity of data required for the network to train on. The data sparsity arises from the fact that it would take an unreasonable amount of time for a single person to generate tens of thousands of examples for training such algorithms. In this paper, two datasets are recorded with the Myo Armband (Thalmic Labs), a low-cost, low-sampling rate (200Hz), 8-channel, consumer-grade, dry electrode sEMG armband. These datasets, referred to as the pre-training and evaluation dataset, are comprised of 19 and 17 able-bodied participants respectively. A convolutional network (ConvNet) is augmented with transfer learning techniques to leverage inter-user data from the first dataset, alleviating the burden imposed on a single individual to generate a vast quantity of training data for sEMG-based gesture recognition. This transfer learning scheme is shown to outperform the current state-of-the-art in gesture recognition achieving an average accuracy of 98.31% for 7 hand/wrist gestures over 17 able-bodied participants. Finally, a use-case study of eight able-bodied participants is presented to evaluate the impact of feedback on the degradation accuracy normally experienced from a classifier over time.
Maximum Margin Interval Trees
Drouin, Alexandre, Hocking, Toby Dylan, Laviolette, François
Learning a regression function using censored or interval-valued output data is an important problem in fields such as genomics and medicine. The goal is to learn a real-valued prediction function, and the training output labels indicate an interval of possible values. Whereas most existing algorithms for this task are linear models, in this paper we investigate learning nonlinear tree models. We propose to learn a tree by minimizing a margin-based discriminative objective function, and we provide a dynamic programming algorithm for computing the optimal solution in log-linear time. We show empirically that this algorithm achieves state-of-the-art speed and prediction accuracy in a benchmark of several data sets.
PAC-Bayes and Domain Adaptation
Germain, Pascal, Habrard, Amaury, Laviolette, François, Morvant, Emilie
We provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different, but related, target distribution. Firstly, we propose an improvement of the previous approach we proposed in Germain et al. (2013), which relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter domain adaptation bound for the target risk. While this bound stands in the spirit of common domain adaptation works, we derive a second bound (recently introduced in Germain et al., 2016) that brings a new perspective on domain adaptation by deriving an upper bound on the target risk where the distributions' divergence-expressed as a ratio-controls the trade-off between a source error measure and the target voters' disagreement. We discuss and compare both results, from which we obtain PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian specialization to linear classifiers, we infer two learning algorithms, and we evaluate them on real data.
Large scale modeling of antimicrobial resistance with interpretable classifiers
Drouin, Alexandre, Raymond, Frédéric, St-Pierre, Gaël Letarte, Marchand, Mario, Corbeil, Jacques, Laviolette, François
Antimicrobial resistance is an important public health concern that has implications in the practice of medicine worldwide. Accurately predicting resistance phenotypes from genome sequences shows great promise in promoting better use of antimicrobial agents, by determining which antibiotics are likely to be effective in specific clinical cases. In healthcare, this would allow for the design of treatment plans tailored for specific individuals, likely resulting in better clinical outcomes for patients with bacterial infections. In this work, we present the recent work of Drouin et al. (2016) on using Set Covering Machines to learn highly interpretable models of antibiotic resistance and complement it by providing a large scale application of their method to the entire PATRIC database. We report prediction results for 36 new datasets and present the Kover AMR platform, a new web-based tool allowing the visualization and interpretation of the generated models.
PAC-Bayesian Theorems for Domain Adaptation with Specialization to Linear Classifiers
Germain, Pascal, Habrard, Amaury, Laviolette, François, Morvant, Emilie
In this paper, we provide two main contributions in PAC-Bayesian theory for domain adaptation where the objective is to learn, from a source distribution, a well-performing majority vote on a different target distribution. On the one hand, we propose an improvement of the previous approach proposed by Germain et al. (2013), that relies on a novel distribution pseudodistance based on a disagreement averaging, allowing us to derive a new tighter PAC-Bayesian domain adaptation bound for the stochastic Gibbs classifier. We specialize it to linear classifiers, and design a learning algorithm which shows interesting results on a synthetic problem and on a popular sentiment annotation task. On the other hand, we generalize these results to multisource domain adaptation allowing us to take into account different source domains. This study opens the door to tackle domain adaptation tasks by making use of all the PAC-Bayesian tools.
A New PAC-Bayesian Perspective on Domain Adaptation
Germain, Pascal, Habrard, Amaury, Laviolette, François, Morvant, Emilie
We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence---expressed as a ratio---controls the trade-off between a source error measure and the target voters' disagreement. Our bound suggests that one has to focus on regions where the source data is informative.From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithmand perform experiments on real data.