Country
Robustness to Capitalization Errors in Named Entity Recognition
Bodapati, Sravan, Yun, Hyokun, Al-Onaizan, Yaser
Robustness to capitalization errors is a highly desirable characteristic of named entity recognizers, yet we find standard models for the task are surprisingly brittle to such noise. Existing methods to improve robustness to the noise completely discard given orthographic information, mwhich significantly degrades their performance on well-formed text. We propose a simple alternative approach based on data augmentation, which allows the model to \emph{learn} to utilize or ignore orthographic information depending on its usefulness in the context. It achieves competitive robustness to capitalization errors while making negligible compromise to its performance on well-formed text and significantly improving generalization power on noisy user-generated text. Our experiments clearly and consistently validate our claim across different types of machine learning models, languages, and dataset sizes.
Creating Auxiliary Representations from Charge Definitions for Criminal Charge Prediction
Kang, Liangyi, Liu, Jie, Liu, Lingqiao, Shi, Qinfeng, Ye, Dan
Charge prediction, determining charges for criminal cases by analyzing the textual fact descriptions, is a promising technology in legal assistant systems. In practice, the fact descriptions could exhibit a significant intra-class variation due to factors like nonnormative use of language, which makes the prediction task very challenging, especially for charge classes with too few samples to cover the expression variation. In this work, we explore to use the charge definitions from criminal law to alleviate this issue. The key idea is that the expressions in a fact description should have corresponding formal terms in charge definitions, and those terms are shared across classes and could account for the diversity in the fact descriptions. Thus, we propose to create auxiliary fact representations from charge definitions to augment fact descriptions representation. The generated auxiliary representations are created through the interaction of fact description with the relevant charge definitions and terms in those definitions by integrated sentence-and word-level attention scheme. Experimental results on two datasets show that our model achieves significant improvement than baselines, especially for classes with few samples. Introduction The task of charge prediction is to determine appropriate charges, such as theft, seizing or robbery, for criminal cases by analyzing the textual fact descriptions.
Improving Robustness of Task Oriented Dialog Systems
Einolghozati, Arash, Gupta, Sonal, Mohit, Mrinal, Shah, Rushin
Task oriented language understanding in dialog systems is often modeled using intents (task of a query) and slots (parameters for that task). Intent detection and slot tagging are, in turn, modeled using sentence classification and word tagging techniques respectively. Similar to adversarial attack problems with computer vision models discussed in existing literature, these intent-slot tagging models are often over-sensitive to small variations in input -- predicting different and often incorrect labels when small changes are made to a query, thus reducing their accuracy and reliability. However, evaluating a model's robustness to these changes is harder for language since words are discrete and an automated change (e.g. adding `noise') to a query sometimes changes the meaning and thus labels of a query. In this paper, we first describe how to create an adversarial test set to measure the robustness of these models. Furthermore, we introduce and adapt adversarial training methods as well as data augmentation using back-translation to mitigate these issues. Our experiments show that both techniques improve the robustness of the system substantially and can be combined to yield the best results.
Unsupervised Medical Image Segmentation with Adversarial Networks: From Edge Diagrams to Segmentation Maps
Sivanesan, Umaseh, Braga, Luis H., Sonnadara, Ranil R., Dhindsa, Kiret
We use existing edge detection methods to construct simple edge diagrams, train a generative model to convert them into synthetic medical images, and construct a dataset of synthetic images with known segmentations using variations on extracted edge diagrams. This synthetic dataset is then used to train a supervised image segmentation model. We test our approach on a clinical dataset of kidney ultrasound images and the benchmark ISIC 2018 skin lesion dataset. We show that our unsupervised approach is more accurate than previous unsupervised methods, and performs reasonably compared to supervised image segmentation models. All code and trained models are available at https://github.com/kiretd/Unsupervised-MIseg . 1 Introduction In vivo medical imaging is one of the primary technologies available for clinical evaluation, diagnosis, and treatment planning.
Constant Curvature Graph Convolutional Networks
Bachmann, Gregor, Bécigneul, Gary, Ganea, Octavian-Eugen
Interest has been rising lately towards methods representing data in non-Euclidean spaces, e.g. hyperbolic or spherical, that provide specific inductive biases useful for certain real-world data properties, e.g. scale-free, hierarchical or cyclical. However, the popular graph neural networks are currently limited in modeling data only via Euclidean geometry and associated vector space operations. Here, we bridge this gap by proposing mathematically grounded generalizations of graph convolutional networks (GCN) to (products of) constant curvature spaces. We do this by i) introducing a unified formalism that can interpolate smoothly between all geometries of constant curvature, ii) leveraging gyro-barycentric coordinates that generalize the classic Euclidean concept of the center of mass. Our class of models smoothly recover their Euclidean counterparts when the curvature goes to zero from either side. Empirically, we outperform Euclidean GCNs in the tasks of node classification and distortion minimization for symbolic data exhibiting non-Euclidean behavior, according to their discrete curvature.
Some Considerations and a Benchmark Related to the CNF Property of the Koczy-Hirota Fuzzy Rule Interpolation
Alzubi, Maen, Kovacs, Szilveszter
The goal of this paper is twofold. Once to highlight some basic problematic properties of the KH Fuzzy Rule Interpolation through examples, secondly to set up a brief Benchmark set of Examples, which is suitable for testing other Fuzzy Rule Interpolation (FRI) methods against these ill conditions. Fuzzy Rule Interpolation methods were originally proposed to handle the situation of missing fuzzy rules (sparse rule-bases) and to reduce the decision complexity. Fuzzy Rule Interpolation is an important technique for implementing inference with sparse fuzzy rule-bases. Even if a given observation has no overlap with the antecedent of any rule from the rule-base, FRI may still conclude a conclusion. The first FRI method was the Koczy and Hirota proposed "Linear Interpolation", which was later renamed to "KH Fuzzy Interpolation" by the followers. There are several conditions and criteria have been suggested for unifying the common requirements an FRI methods have to satisfy. One of the most common one is the demand for a convex and normal fuzzy (CNF) conclusion, if all the rule antecedents and consequents are CNF sets. The KH FRI is the one, which cannot fulfill this condition. This paper is focusing on the conditions, where the KH FRI fails the demand for the CNF conclusion. By setting up some CNF rule examples, the paper also defines a Benchmark, in which other FRI methods can be tested if they can produce CNF conclusion where the KH FRI fails.
EDUQA: Educational Domain Question Answering System using Conceptual Network Mapping
Agarwal, Abhishek, Sachdeva, Nikhil, Yadav, Raj Kamal, Udandarao, Vishaal, Mittal, Vrinda, Gupta, Anubha, Mathur, Abhinav
Most of the existing question answering models can be largely compiled into two categories: i) open domain question answering models that answer generic questions and use large-scale knowledge base along with the targeted web-corpus retrieval and ii) closed domain question answering models that address focused questioning area and use complex deep learning models. Both the above models derive answers through textual comprehension methods. Due to their inability to capture the pedagogical meaning of textual content, these models are not appropriately suited to the educational field for pedagogy. In this paper, we propose an on-the-fly conceptual network model that incorporates educational semantics. The proposed model preserves correlations between conceptual entities by applying intelligent indexing algorithms on the concept network so as to improve answer generation. This model can be utilized for building interactive conversational agents for aiding classroom learning.
Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models
Lengerich, Benjamin, Tan, Sarah, Chang, Chun-Hao, Hooker, Giles, Caruana, Rich
Recent methods for training generalized additive models (GAMs) with pairwise interactions achieve state-of-the-art accuracy on a variety of datasets. Adding interactions to GAMs, however, introduces an identifiability problem: effects can be freely moved between main effects and interaction effects without changing the model predictions. In some cases, this can lead to contradictory interpretations of the same underlying function. This is a critical problem because a central motivation of GAMs is model interpretability. In this paper, we use the Functional ANOV A decomposition to uniquely define interaction effects and thus produce identifiable additive models with purified interactions. To compute this decomposition, we present a fast, exact, mass-moving algorithm that transforms any piecewise-constant function (such as a tree-based model) into a purified, canonical representation. We apply this algorithm to several datasets and show large disparity, including contradictions, between the apparent and the purified effects. An important question in data analysis is whether two variables act in concert to affect an outcome. But this unconstrained additive model has fundamental flaws.
Fairness-Aware Neural R\'eyni Minimization for Continuous Features
Grari, Vincent, Ruf, Boris, Lamprier, Sylvain, Detyniecki, Marcin
The past few years have seen a dramatic rise of academic and societal interest in fair machine learning. While plenty of fair algorithms have been proposed recently to tackle this challenge for discrete variables, only a few ideas exist for continuous ones. The objective in this paper is to ensure some independence level between the outputs of regression models and any given continuous sensitive variables. For this purpose, we use the Hirschfeld-Gebelein-R enyi (HGR) maximal correlation coefficient as a fairness metric. We propose two approaches to minimize the HGR coefficient. First, by reducing an upper bound of the HGR with a neural network estimation of the χ 2 divergence. The idea is to predict the output Y while minimizing the ability of an adversarial neural network to find the estimated transformations which are required to predict the HGR coefficient. We empirically assess and compare our approaches and demonstrate significant improvements on previously presented work in the field. 1 Introduction The use of machine learning algorithms in our day-to-day life has become ubiquitous. However, when trained on biased data, those algorithms are prone to learn, perpetuate or even reinforce these biases [6]. Because many applications have far-reaching consequences (credit rating, insurance pricing, recidivism score, etc.), there is an increasing concern in society that the use of machine learning models could reproduce discrimination based on sensitive attributes such as gender, race, age, weight, or other.
Machine Intelligence at the Edge with Learning Centric Power Allocation
Wang, Shuai, Wu, Yik-Chung, Xia, Minghua, Wang, Rui, Poor, H. Vincent
While machine-type communication (MTC) devices generate considerable amounts of data, they often cannot process the data due to limited energy and computation power. To empower MTC with intelligence, edge machine learning has been proposed. However, power allocation in this paradigm requires maximizing the learning performance instead of the communication throughput, for which the celebrated water-filling and max-min fairness algorithms become inefficient. To this end, this paper proposes learning centric power allocation (LCPA), which provides a new perspective to radio resource allocation in learning driven scenarios. By employing an empirical classification error model that is supported by learning theory, the LCPA is formulated as a nonconvex nonsmooth optimization problem, and is solved by majorization minimization (MM) framework. To get deeper insights into LCPA, asymptotic analysis shows that the transmit powers are inversely proportional to the channel gain, and scale exponentially with the learning parameters. This is in contrast to traditional power allocations where quality of wireless channels is the only consideration. Last but not least, to enable LCPA in large-scale settings, two optimization algorithms, termed mirror-prox LCPA and accelerated LCPA, are further proposed. Extensive numerical results demonstrate that the proposed LCPA algorithms outperform traditional power allocation algorithms, and the large-scale algorithms reduce the computation time by orders of magnitude compared with MM-based LCPA but still achieve competing learning performance.