Country
An End-to-End Approach for Recognition of Modern and Historical Handwritten Numeral Strings
Hochuli, Andre G., Britto, Alceu S. Jr., Barddal, Jean P., Oliveira, Luiz E. S., Sabourin, Robert
An end-to-end solution for handwritten numeral string recognition is proposed, in which the numeral string is considered as composed of objects automatically detected and recognized by a YoLo-based model. The main contribution of this paper is to avoid heuristic-based methods for string preprocessing and segmentation, the need for task-oriented classifiers, and also the use of specific constraints related to the string length. A robust experimental protocol based on several numeral string datasets, including one composed of historical documents, has shown that the proposed method is a feasible end-to-end solution for numeral string recognition. Besides, it reduces the complexity of the string recognition task considerably since it drops out classical steps, in special preprocessing, segmentation, and a set of classifiers devoted to strings with a specific length.
Seeing The Whole Patient: Using Multi-Label Medical Text Classification Techniques to Enhance Predictions of Medical Codes
Yogarajan, Vithya, Montiel, Jacob, Smith, Tony, Pfahringer, Bernhard
Machine learning-based multi-label medical text classifications can be used to enhance the understanding of the human body and aid the need for patient care. We present a broad study on clinical natural language processing techniques to maximise a feature representing text when predicting medical codes on patients with multi-morbidity. We present results of multi-label medical text classification problems with 18, 50 and 155 labels. We compare several variations to embeddings, text tagging, and pre-processing. For imbalanced data we show that labels which occur infrequently, benefit the most from additional features incorporated in embeddings. We also show that high dimensional embeddings pre-trained using health-related data present a significant improvement in a multi-label setting, similarly to the way they improve performance for binary classification. High dimensional embeddings from this research are made available for public use.
Convex Recovery of Marked Spatio-Temporal Point Processes
Juditsky, Anatoli, Nemirovski, Arkadi, Xie, Liyan, Xie, Yao
We present a multi-dimensional Bernoulli process model for spatial-temporal discrete event data with categorical marks, where the probability of an event of a specific category in a location may be influenced by past events at this and other locations. The focus is to introduce general forms of influence function which can capture an arbitrary shape of influence from historical events, between locations, and between different categories of events. The general form of influence function differs from the commonly adapted exponential delaying function over time, and more importantly, in our model, we can learn the delayed influence of prior events, which is an aspect seemingly largely ignored in prior literature. Prior knowledge or assumptions on the influence function are incorporated into our framework by allowing general convex constraints on the parameters specifying the influence function. We develop two approaches for recovering these parameters, using the constrained least-square (LS) and maximum likelihood (ML) estimations. We demonstrate the performance of our approach on synthetic examples and illustrate its promise using real data (crime data and novel coronavirus data), in extracting knowledge about the general influences and making predictions.
Coping With Simulators That Don't Always Return
Warrington, Andrew, Naderiparizi, Saeid, Wood, Frank
Deterministic models are approximations of reality that are easy to interpret and often easier to build than stochastic alternatives. Unfortunately, as nature is capricious, observational data can never be fully explained by deterministic models in practice. Observation and process noise need to be added to adapt deterministic models to behave stochastically, such that they are capable of explaining and extrapolating from noisy data. We investigate and address computational inefficiencies that arise from adding process noise to deterministic simulators that fail to return for certain inputs; a property we describe as "brittle." We show how to train a conditional normalizing flow to propose perturbations such that the simulator succeeds with high probability, increasing computational efficiency.
Variational Inference with Vine Copulas: An efficient Approach for Bayesian Computer Model Calibration
Kejzlar, Vojtech, Maiti, Tapabrata
The ever-growing access to high performance computing in scientific communities has enabled development of complex computer models in fields such as nuclear physics, climatology, and engineering that produce massive amounts of data. These models need real-time calibration with quantified uncertainties. Bayesian methodology combined with Gaussian process modeling has been heavily utilized for calibration of computer models due to its natural way to account for various sources of uncertainty; see Higdon et al. (2015), and King et al. (2019) for examples in nuclear physics, Sexton et al. (2012) and Pollard et al. (2016) for examples in climatology, and Lawrence et al. (2010), Plumlee et al. (2016) and Zhang et al. (2019) for applications in engineering, astrophysics, and medicine. The original framework for Bayesian calibration of computer models was developed by Kennedy and O'Hagan (2001) with extensions provided by Higdon et al. (2005, 2008); Bayarri et al. (2007); Plumlee (2017, 2019), and Gu and Wang (2018), to name a few. Despite its popularity, however, Bayesian calibration becomes infeasible in big-data scenarios with complex and many-parameter models because it relies on Markov chain Monte Carlo (MCMC) algorithms to approximate posterior densities. This text presents a scalable and statistically principled approach to Bayesian calibration of computer models. We offer an alternative approximation to posterior densities using variational Bayesian inference (VBI), which originated as a machine learning algorithm that approximates a target density through optimization. Statisticians and computer scientists (starting with Peterson and Anderson (1987); Jordan et al. (1999)) have been widely using variational techniques because they tend to be faster and easier to scale to massive datasets. Moreover, the recently published frequentist consistency of variational Bayes by Wang and Blei (2018) established VBI as a theoretically valid procedure.
NPENAS: Neural Predictor Guided Evolution for Neural Architecture Search
Wei, Chen, Niu, Chuang, Tang, Yiping, Liang, Jimin
Neural architecture search (NAS) is a promising method for automatically finding excellent architectures. Commonly used search strategies such as evolutionary algorithm, Bayesian optimization, and Predictor method employs a predictor to rank sampled architectures. In this paper, we propose two predictor based algorithms NPUBO and NPENAS for neural architecture search. Firstly we propose NPUBO which takes a neural predictor with uncertainty estimation as surrogate model for Bayesian optimization. Secondly we propose a simple and effective predictor guided evolution algorithm(NPENAS), which uses neural predictor to guide evolutionary algorithm to perform selection and mutation. Finally we analyse the architecture sampling pipeline and find that mostly used random sampling pipeline tends to generate architectures in a subspace of the real underlying search space. Our proposed methods can find architecture achieves high test accuracy which is comparable with recently proposed methods on NAS-Bench-101 and NAS-Bench-201 dataset using less training and evaluated samples. Code will be publicly available after finish all the experiments.
Semi-Federated Learning
Chen, Zhikun, Li, Daofeng, Zhao, Ming, Zhang, Sihai, Zhu, Jinkang
Federated learning (FL) enables massive distributed Information and Communication Technology (ICT) devices to learn a global consensus model without any participants revealing their own data to the central server. However, the practicality, communication expense and non-independent and identical distribution (Non-IID) data challenges in FL still need to be concerned. In this work, we propose the Semi-Federated Learning (Semi-FL) which differs from the FL in two aspects, local clients clustering and in-cluster training. A sequential training manner is designed for our in-cluster training in this paper which enables the neighboring clients to share their learning models. The proposed Semi-FL can be easily applied to future mobile communication networks and require less up-link transmission bandwidth. Numerical experiments validate the feasibility, learning performance and the robustness to Non-IID data of the proposed Semi-FL. The Semi-FL extends the existing potentials of FL.
Harmonic Decompositions of Convolutional Networks
Scetbon, Meyer, Harchaoui, Zaid
The renewed interest in convolutional neural networks [12, 15] in computer vision and signal processing has lead to a major leap in generalization performance on common task benchmarks, supported by the recent advances in graphical processing hardware and the collection of huge labelled datasets for training and evaluation. Convolutional neural networks pose major a challenge to statistical learning theory. First and foremost a convolutional network learns from data, jointly, both a feature representation through its hidden layers and a prediction function through its ultimate layer. A convolutional neural network implements a function unfolding as a composition of basic functions (respectively nonlinearity, convolution, and pooling), which appear to model well visual information in images. Yet the relevant function spaces to analyze their statistical performance remain unclear. The analysis of convolutional neural networks (CNNs) has been an active research topic. Different viewpoints have been developed. A straightforward viewpoint is to dismiss completely the grid-or latticestructure of images and analyze a multi-layer perceptron (MLP) instead acting on vectorized images, which has the downside the set aside the most interesting property CNNs which is to model well images that is data with a 2D lattice structure.
Differentially Private Federated Learning for Resource-Constrained Internet of Things
Hu, Rui, Guo, Yuanxiong, Ratazzi, E. Paul., Gong, Yanmin
With the proliferation of smart devices having built-in sensors, Internet connectivity, and programmable computation capability in the era of Internet of things (IoT), tremendous data is being generated at the network edge. Federated learning is capable of analyzing the large amount of data from a distributed set of smart devices without requiring them to upload their data to a central place. However, the commonly-used federated learning algorithm is based on stochastic gradient descent (SGD) and not suitable for resource-constrained IoT environments due to its high communication resource requirement. Moreover, the privacy of sensitive data on smart devices has become a key concern and needs to be protected rigorously. This paper proposes a novel federated learning framework called DP-PASGD for training a machine learning model efficiently from the data stored across resource-constrained smart devices in IoT while guaranteeing differential privacy. The optimal schematic design of DP-PASGD that maximizes the learning performance while satisfying the limits on resource cost and privacy loss is formulated as an optimization problem, and an approximate solution method based on the convergence analysis of DP-PASGD is developed to solve the optimization problem efficiently. Numerical results based on real-world datasets verify the effectiveness of the proposed DP-PASGD scheme.
Assessing Robustness to Noise: Low-Cost Head CT Triage
Hooper, Sarah M., Dunnmon, Jared A., Lungren, Matthew P., Gambhir, Sanjiv Sam, Ré, Christopher, Wang, Adam S., Patel, Bhavik N.
Automated medical image classification with convolutional neural networks (CNNs) has great potential to impact healthcare, particularly in resource-constrained healthcare systems where fewer trained radiologists are available. However, little is known about how well a trained CNN can perform on images with the increased noise levels, different acquisition protocols, or additional artifacts that may arise when using low-cost scanners, which can be underrepresented in datasets collected from well-funded hospitals. In this work, we investigate how a model trained to triage head computed tomography (CT) scans performs on images acquired with reduced x-ray tube current, fewer projections per gantry rotation, and limited angle scans. These changes can reduce the cost of the scanner and demands on electrical power but come at the expense of increased image noise and artifacts. We first develop a model to triage head CTs and report an area under the receiver operating characteristic curve (AUROC) of 0.77. We then show that the trained model is robust to reduced tube current and fewer projections, with the AUROC dropping only 0.65% for images acquired with a 16x reduction in tube current and 0.22% for images acquired with 8x fewer projections. Finally, for significantly degraded images acquired by a limited angle scan, we show that a model trained specifically to classify such images can overcome the technological limitations to reconstruction and maintain an AUROC within 0.09% of the original model.