Country
Predictive Multi-level Patient Representations from Electronic Health Records
Wang, Zichang, Li, Haoran, Liu, Luchen, Wu, Haoxian, Zhang, Ming
The advent of the Internet era has led to an explosive growth in the Electronic Health Records (EHR) in the past decades. The EHR data can be regarded as a collection of clinical events, including laboratory results, medication records, physiological indicators, etc, which can be used for clinical outcome prediction tasks to support constructions of intelligent health systems. Learning patient representation from these clinical events for the clinical outcome prediction is an important but challenging step. Most related studies transform EHR data of a patient into a sequence of clinical events in temporal order and then use sequential models to learn patient representations for outcome prediction. However, clinical event sequence contains thousands of event types and temporal dependencies. We further make an observation that clinical events occurring in a short period are not constrained by any temporal order but events in a long term are influenced by temporal dependencies. The multi-scale temporal property makes it difficult for traditional sequential models to capture the short-term co-occurrence and the long-term temporal dependencies in clinical event sequences. In response to the above challenges, this paper proposes a Multi-level Representation Model (MRM). MRM first uses a sparse attention mechanism to model the short-term co-occurrence, then uses interval-based event pooling to remove redundant information and reduce sequence length and finally predicts clinical outcomes through Long Short-Term Memory (LSTM). Experiments on real-world datasets indicate that our proposed model largely improves the performance of clinical outcome prediction tasks using EHR data.
Schedule Earth Observation satellites with Deep Reinforcement Learning
Hadj-Salah, Adrien, Verdier, Rémi, Caron, Clément, Picard, Mathieu, Capelle, Mikaël
Requests come in a variety of size and constraints, from the urgent monitoring of small areas to large area coverage. In this work we are particularly interested in the latter case, with requests covering whole countries or even continents. Depending on weather conditions, such requests may take several months to complete, even with multiple satellites. In order to shorten the time required to fulfill requests, the mission orchestrator shall schedule acquisitions with both a short and a long-term strategy. Determining a strategy robust to an uncertain environment is a complex task, this is why current solutions mainly consist of heuristics configured by human-experts. This paper demonstrates that Reinforcement Learning (RL) might be well-suited for such a challenge. RL has proven to be of great value since these algorithms have mastered several games such as Pong on Atari 2600 (Mnih et al. 2013), Go with AlphaGo (Silver et al. 2017) and more recently Starcraft (Arulkumaran, Cully, and Togelius 2019).c
Learning from a Teacher using Unlabeled Data
Menghani, Gaurav, Ravi, Sujith
Knowledge distillation is a widely used technique for model compression. We posit that the teacher model used in a distillation setup, captures relationships between classes, that extend beyond the original dataset. We empirically show that a teacher model can transfer this knowledge to a student model even on an {\it out-of-distribution} dataset. Using this approach, we show promising results on MNIST, CIFAR-10, and Caltech-256 datasets using unlabeled image data from different sources. Our results are encouraging and help shed further light from the perspective of understanding knowledge distillation and utilizing unlabeled data to improve model quality.
Selective Brain Damage: Measuring the Disparate Impact of Model Pruning
Hooker, Sara, Courville, Aaron, Dauphin, Yann, Frome, Andrea
Neural network pruning techniques have demonstrated it is possible to remove the majority of weights in a network with surprisingly little degradation to test set accuracy. However, this measure of performance conceals significant differences in how different classes and images are impacted by pruning. We find that certain examples, which we term pruning identified exemplars (PIEs), and classes are systematically more impacted by the introduction of sparsity. Removing PIE images from the test-set greatly improves top-1 accuracy for both pruned and non-pruned models. These hard-to-generalize-to images tend to be mislabelled, of lower image quality, depict multiple objects or require fine-grained classification. These findings shed light on previously unknown trade-offs, and suggest that a high degree of caution should be exercised before pruning is used in sensitive domains.
Robustness to Capitalization Errors in Named Entity Recognition
Bodapati, Sravan, Yun, Hyokun, Al-Onaizan, Yaser
Robustness to capitalization errors is a highly desirable characteristic of named entity recognizers, yet we find standard models for the task are surprisingly brittle to such noise. Existing methods to improve robustness to the noise completely discard given orthographic information, mwhich significantly degrades their performance on well-formed text. We propose a simple alternative approach based on data augmentation, which allows the model to \emph{learn} to utilize or ignore orthographic information depending on its usefulness in the context. It achieves competitive robustness to capitalization errors while making negligible compromise to its performance on well-formed text and significantly improving generalization power on noisy user-generated text. Our experiments clearly and consistently validate our claim across different types of machine learning models, languages, and dataset sizes.
Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction
Kang, Liangyi, Liu, Jie, Liu, Lingqiao, Shi, Qinfeng, Ye, Dan
Charge prediction, determining charges for criminal cases by analyzing the textual fact descriptions, is a promising technology in legal assistant systems. In practice, the fact descriptions could exhibit a significant intra-class variation due to factors like nonnormative use of language, which makes the prediction task very challenging, especially for charge classes with too few samples to cover the expression variation. In this work, we explore to use the charge definitions from criminal law to alleviate this issue. The key idea is that the expressions in a fact description should have corresponding formal terms in charge definitions, and those terms are shared across classes and could account for the diversity in the fact descriptions. Thus, we propose to create auxiliary fact representations from charge definitions to augment fact descriptions representation. The generated auxiliary representations are created through the interaction of fact description with the relevant charge definitions and terms in those definitions by integrated sentence-and word-level attention scheme. Experimental results on two datasets show that our model achieves significant improvement than baselines, especially for classes with few samples. Introduction The task of charge prediction is to determine appropriate charges, such as theft, seizing or robbery, for criminal cases by analyzing the textual fact descriptions.
Improving Robustness of Task Oriented Dialog Systems
Einolghozati, Arash, Gupta, Sonal, Mohit, Mrinal, Shah, Rushin
Task oriented language understanding in dialog systems is often modeled using intents (task of a query) and slots (parameters for that task). Intent detection and slot tagging are, in turn, modeled using sentence classification and word tagging techniques respectively. Similar to adversarial attack problems with computer vision models discussed in existing literature, these intent-slot tagging models are often over-sensitive to small variations in input -- predicting different and often incorrect labels when small changes are made to a query, thus reducing their accuracy and reliability. However, evaluating a model's robustness to these changes is harder for language since words are discrete and an automated change (e.g. adding `noise') to a query sometimes changes the meaning and thus labels of a query. In this paper, we first describe how to create an adversarial test set to measure the robustness of these models. Furthermore, we introduce and adapt adversarial training methods as well as data augmentation using back-translation to mitigate these issues. Our experiments show that both techniques improve the robustness of the system substantially and can be combined to yield the best results.
Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps
Sivanesan, Umaseh, Braga, Luis H., Sonnadara, Ranil R., Dhindsa, Kiret
We use existing edge detection methods to construct simple edge diagrams, train a generative model to convert them into synthetic medical images, and construct a dataset of synthetic images with known segmentations using variations on extracted edge diagrams. This synthetic dataset is then used to train a supervised image segmentation model. We test our approach on a clinical dataset of kidney ultrasound images and the benchmark ISIC 2018 skin lesion dataset. We show that our unsupervised approach is more accurate than previous unsupervised methods, and performs reasonably compared to supervised image segmentation models. All code and trained models are available at https://github.com/kiretd/Unsupervised-MIseg . 1 Introduction In vivo medical imaging is one of the primary technologies available for clinical evaluation, diagnosis, and treatment planning.
Constant Curvature Graph Convolutional Networks
Bachmann, Gregor, Bécigneul, Gary, Ganea, Octavian-Eugen
Interest has been rising lately towards methods representing data in non-Euclidean spaces, e.g. hyperbolic or spherical, that provide specific inductive biases useful for certain real-world data properties, e.g. scale-free, hierarchical or cyclical. However, the popular graph neural networks are currently limited in modeling data only via Euclidean geometry and associated vector space operations. Here, we bridge this gap by proposing mathematically grounded generalizations of graph convolutional networks (GCN) to (products of) constant curvature spaces. We do this by i) introducing a unified formalism that can interpolate smoothly between all geometries of constant curvature, ii) leveraging gyro-barycentric coordinates that generalize the classic Euclidean concept of the center of mass. Our class of models smoothly recover their Euclidean counterparts when the curvature goes to zero from either side. Empirically, we outperform Euclidean GCNs in the tasks of node classification and distortion minimization for symbolic data exhibiting non-Euclidean behavior, according to their discrete curvature.
Some Considerations and a Benchmark Related to the CNF Property of the Koczy-Hirota Fuzzy Rule Interpolation
Alzubi, Maen, Kovacs, Szilveszter
The goal of this paper is twofold. Once to highlight some basic problematic properties of the KH Fuzzy Rule Interpolation through examples, secondly to set up a brief Benchmark set of Examples, which is suitable for testing other Fuzzy Rule Interpolation (FRI) methods against these ill conditions. Fuzzy Rule Interpolation methods were originally proposed to handle the situation of missing fuzzy rules (sparse rule-bases) and to reduce the decision complexity. Fuzzy Rule Interpolation is an important technique for implementing inference with sparse fuzzy rule-bases. Even if a given observation has no overlap with the antecedent of any rule from the rule-base, FRI may still conclude a conclusion. The first FRI method was the Koczy and Hirota proposed "Linear Interpolation", which was later renamed to "KH Fuzzy Interpolation" by the followers. There are several conditions and criteria have been suggested for unifying the common requirements an FRI methods have to satisfy. One of the most common one is the demand for a convex and normal fuzzy (CNF) conclusion, if all the rule antecedents and consequents are CNF sets. The KH FRI is the one, which cannot fulfill this condition. This paper is focusing on the conditions, where the KH FRI fails the demand for the CNF conclusion. By setting up some CNF rule examples, the paper also defines a Benchmark, in which other FRI methods can be tested if they can produce CNF conclusion where the KH FRI fails.