Law
Cognitive Assistants for Document-Related Tasks in Law and Government
Branting, Luther Karl (The MITRE Corporation)
The legal relationship between government and citizens is mediated by documents. This paper identifies four classes of cognitive assistants that could improve the experience of citizens and government officials in using and understanding government documents: self-filling forms; error-detecting forms; proactive information search; and deductive document synthesis. Each of these classes of cognitive assistants has the potential to significantly improve access to justice and delivery of information, services, and other benefits to citizens by improving the ability of citizens to understand and correctly fill out forms and to comprehend informational documents.
Neural-based machine translation for medical text domain. Based on European Medicines Agency leaflet texts
Wołk, Krzysztof, Marasek, Krzysztof
The quality of machine translation is rapidly evolving. Today one can find several machine translation systems on the web that provide reasonable translations, although the systems are not perfect. In some specific domains, the quality may decrease. A recently proposed approach to this domain is neural machine translation. It aims at building a jointly-tuned single neural network that maximizes translation performance, a very different approach from traditional statistical machine translation. Recently proposed neural machine translation models often belong to the encoder-decoder family in which a source sentence is encoded into a fixed length vector that is, in turn, decoded to generate a translation. The present research examines the effects of different training methods on a Polish-English Machine Translation system used for medical data. The European Medicines Agency parallel text corpus was used as the basis for training of neural and statistical network-based translation systems. The main machine translation evaluation metrics have also been used in analysis of the systems. A comparison and implementation of a real-time medical translator is the main focus of our experiments.
Knowledge-Based Textual Inference via Parse-Tree Transformations
Bar-Haim, Roy, Dagan, Ido, Berant, Jonathan
Textual inference is an important component in many applications for understanding natural language. Classical approaches to textual inference rely on logical representations for meaning, which may be regarded as "external" to the natural language itself. However, practical applications usually adopt shallower lexical or lexical-syntactic representations, which correspond closely to language structure. In many cases, such approaches lack a principled meaning representation and inference framework. We describe an inference formalism that operates directly on language-based structures, particularly syntactic parse trees. New trees are generated by applying inference rules, which provide a unified representation for varying types of inferences. We use manual and automatic methods to generate these rules, which cover generic linguistic structures as well as specific lexical-based inferences. We also present a novel packed data-structure and a corresponding inference algorithm that allows efficient implementation of this formalism. We proved the correctness of the new algorithm and established its efficiency analytically and empirically. The utility of our approach was illustrated on two tasks: unsupervised relation extraction from a large corpus, and the Recognizing Textual Entailment (RTE) benchmarks.
Community Detection in Networks with Node Features
Zhang, Yuan, Levina, Elizaveta, Zhu, Ji
Many methods have been proposed for community detection in networks, but most of them do not take into account additional information on the nodes that is often available in practice. In this paper, we propose a new joint community detection criterion that uses both the network edge information and the node features to detect community structures. One advantage our method has over existing joint detection approaches is the flexibility of learning the impact of different features which may differ across communities. Another advantage is the flexibility of choosing the amount of influence the feature information has on communities. The method is asymptotically consistent under the block model with additional assumptions on the feature distributions, and performs well on simulated and real networks. Community detection is a fundamental problem in network analysis, extensively studied in a number of domains - see (1) and (2) for some examples of applications. A number of approaches to community detection are based on probabilistic models for networks with communities, such as the stochastic block model (3), the degree-corrected stochastic block model (4), and the latent factor model (5). Other approaches work by optimizing a criterion measuring the strength of community structure in some sense, often through spectral approximations. Examples include normalized cuts (6), modularity (7; 8), and many variants of spectral clustering, e.g., (9).
Online Censoring for Large-Scale Regressions with Application to Streaming Big Data
Berberidis, Dimitris, Kekatos, Vassilis, Giannakis, Georgios B.
Linear regression is arguably the most prominent among statistical inference methods, popular both for its simplicity as well as its broad applicability. On par with data-intensive applications, the sheer size of linear regression problems creates an ever growing demand for quick and cost efficient solvers. Fortunately, a significant percentage of the data accrued can be omitted while maintaining a certain quality of statistical inference with an affordable computational budget. The present paper introduces means of identifying and omitting "less informative" observations in an online and data-adaptive fashion, built on principles of stochastic approximation and data censoring. First- and second-order stochastic approximation maximum likelihood-based algorithms for censored observations are developed for estimating the regression coefficients. Online algorithms are also put forth to reduce the overall complexity by adaptively performing censoring along with estimation. The novel algorithms entail simple closed-form updates, and have provable (non)asymptotic convergence guarantees. Furthermore, specific rules are investigated for tuning to desired censoring patterns and levels of dimensionality reduction. Simulated tests on real and synthetic datasets corroborate the efficacy of the proposed data-adaptive methods compared to data-agnostic random projection-based alternatives.
Judgment Aggregation in Multi-Agent Argumentation
Awad, Edmond, Booth, Richard, Tohme, Fernando, Rahwan, Iyad
Given a set of conflicting arguments, there can exist multiple plausible opinions about which arguments should be accepted, rejected, or deemed undecided. We study the problem of how multiple such judgments can be aggregated. We define the problem by adapting various classical social-choice-theoretic properties for the argumentation domain. We show that while argument-wise plurality voting satisfies many properties, it fails to guarantee the collective rationality of the outcome, and struggles with ties. We then present more general results, proving multiple impossibility results on the existence of any good aggregation operator. After characterising the sufficient and necessary conditions for satisfying collective rationality, we study whether restricting the domain of argument-wise plurality voting to classical semantics allows us to escape the impossibility result. We close by listing graph-theoretic restrictions under which argument-wise plurality rule does produce collectively rational outcomes. In addition to identifying fundamental barriers to collective argument evaluation, our results open up the door for a new research agenda for the argumentation and computational social choice communities.
Joint Tensor Factorization and Outlying Slab Suppression with Applications
Fu, Xiao, Huang, Kejun, Ma, Wing-Kin, Sidiropoulos, Nicholas D., Bro, Rasmus
We consider factoring low-rank tensors in the presence of outlying slabs. This problem is important in practice, because data collected in many real-world applications, such as speech, fluorescence, and some social network data, fit this paradigm. Prior work tackles this problem by iteratively selecting a fixed number of slabs and fitting, a procedure which may not converge. We formulate this problem from a group-sparsity promoting point of view, and propose an alternating optimization framework to handle the corresponding $\ell_p$ ($0
Certifying and removing disparate impact
Feldman, Michael, Friedler, Sorelle, Moeller, John, Scheidegger, Carlos, Venkatasubramanian, Suresh
What does it mean for an algorithm to be biased? In U.S. law, unintentional bias is encoded via disparate impact, which occurs when a selection process has widely different outcomes for different groups, even as it appears to be neutral. This legal determination hinges on a definition of a protected class (ethnicity, gender, religious practice) and an explicit description of the process. When the process is implemented using computers, determining disparate impact (and hence bias) is harder. It might not be possible to disclose the process. In addition, even if the process is open, it might be hard to elucidate in a legal setting how the algorithm makes its decisions. Instead of requiring access to the algorithm, we propose making inferences based on the data the algorithm uses. We make four contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has received relatively little attention. Second, we propose a test for disparate impact based on analyzing the information leakage of the protected class from the other data attributes. Third, we describe methods by which data might be made unbiased. Finally, we present empirical evidence supporting the effectiveness of our test for disparate impact and our approach for both masking bias and preserving relevant information in the data. Interestingly, our approach resembles some actual selection practices that have recently received legal scrutiny.
Multi-Robot Exploration with Communication Restrictions
Jensen, Elizabeth A. (University of Minnesota)
After a disaster, instability in the environment may delay search and rescue efforts until it is safe enough for human rescuers to enter the environment. Such delays can be significant, but it is still possible to gather information about the environment in the interim, by sending in a team of robots to scout the area and locate points of interest. We present several algorithms to accomplish this exploration, and provide both theoretical proofs and simulation results that show the algorithms will achieve full exploration of an unknown environment even under communication restrictions.
Norms as a Basis for Governing Sociotechnical Systems: Extended Abstract
Singh, Munindar P. (North Carolina State University)
We understand a sociotechnical system as a microsociety in which autonomous parties interact with and about technical objects. We define governance as the administration of such a system by its participants. We develop an approach for governance based on a computational representation of norms. Our approach has the benefit of capturing stakeholder needs precisely while yielding adaptive resource allocation in the face of changes both in stakeholder needs and the environment. We are currently extending this approach to address the problem of secure collaboration and to contribute to the emerging science of cybersecurity.