Goto

Collaborating Authors

 Africa


Identifying Relationships Among Sentences in Court Case Transcripts Using Discourse Relations

arXiv.org Machine Learning

Case Law has a significant impact on the proceedings of legal cases. Therefore, the information that can be obtained from previous court cases is valuable to lawyers and other legal officials when performing their duties. This paper describes a methodology of applying discourse relations between sentences when processing text documents related to the legal domain. In this study, we developed a mechanism to classify the relationships that can be observed among sentences in transcripts of United States court cases. First, we defined relationship types that can be observed between sentences in court case transcripts. Then we classified pairs of sentences according to the relationship type by combining a machine learning model and a rule-based approach. The results obtained through our system were evaluated using human judges. To the best of our knowledge, this is the first study where discourse relationships between sentences have been used to determine relationships among sentences in legal court case transcripts.


HyperGCN: Hypergraph Convolutional Networks for Semi-Supervised Classification

arXiv.org Machine Learning

Graph-based semi-supervised learning (SSL) is an important learning problem where the goal is to assign labels to initially unlabeled nodes in a graph. Graph Convolutional Networks (GCNs) have recently been shown to be effective for graph-based SSL problems. GCNs inherently assume existence of pairwise relationships in the graph-structured data. However, in many real-world problems, relationships go beyond pairwise connections and hence are more complex. Hypergraphs provide a natural modeling tool to capture such complex relationships. In this work, we explore the use of GCNs for hypergraph-based SSL. In particular, we propose HyperGCN, an SSL method which uses a layer-wise propagation rule for convolutional neural networks operating directly on hypergraphs. To the best of our knowledge, this is the first principled adaptation of GCNs to hypergraphs. HyperGCN is able to encode both the hypergraph structure and hypernode features in an effective manner. Through detailed experimentation, we demonstrate HyperGCN's effectiveness at hypergraph-based SSL.


A Primer on Causality in Data Science

arXiv.org Machine Learning

Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure (randomized or not) on an outcome interest. Even studies that are seemingly non-causal (e.g. prediction or prevalence estimation) have causal elements, such as differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in the data. In this work, we review the "Causal Roadmap", a formal framework to augment our traditional statistical analyses in an effort to answer the causal questions driving our research. Specific steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to translate the causal parameter into a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. Throughout we focus on the effect of an exposure occurring at a single time point and provide extensions to more advanced settings.


RetGK: Graph Kernels based on Return Probabilities of Random Walks

arXiv.org Machine Learning

Graph-structured data arise in wide applications, such as computer vision, bioinformatics, and social networks. Quantifying similarities among graphs is a fundamental problem. In this paper, we develop a framework for computing graph kernels, based on return probabilities of random walks. The advantages of our proposed kernels are that they can effectively exploit various node attributes, while being scalable to large datasets. We conduct extensive graph classification experiments to evaluate our graph kernels. The experimental results show that our graph kernels significantly outperform existing state-of-the-art approaches in both accuracy and computational efficiency.


Tensor Ring Decomposition with Rank Minimization on Latent Space: An Efficient Approach for Tensor Completion

arXiv.org Machine Learning

T ensor Ring Decomposition with Rank Minimization on Latent Space: An Efficient Approach for T ensor Completion Anonymous Authors Affiliation Address email Abstract In tensor completion tasks, the traditional low-rank tensor decomposition models suffer from laborious model selection problem due to high model sensitivity. Especially for tensor ring (TR) decomposition, the number of model possibility grows exponentially with the tensor order, which makes it rather challenging to find the optimal TR decomposition. In this paper, by exploiting the low-rank structure on TR latent space, we propose a novel tensor completion method, which is robust to model selection. In contrast to imposing low-rank constraint on the data space, we introduce nuclear norm regularization on the latent TR factors, resulting in that the optimization step using singular value decomposition (SVD) can be performed at a much smaller scale. By leveraging the alternating direction method of multipliers (ADMM) scheme, the latent TR factors with optimal rank and the recovered tensor can be obtained simultaneously. Our proposed algorithm effectively alleviates the burden of TR-rank selection, therefore the computational cost is greatly reduced. The extensive experimental results on synthetic data and real-world data demonstrate the superior high performance and efficiency of the proposed approach against the state-of-the-art algorithms. Introduction Tensor decomposition aims to find the latent factors of tensor-valued data (i.e. the generalization of multidimensional arrays), thereby casting large-scale tensors into a multilinear tensor latent space of low-dimensionality (very few degrees of freedom designated by the rank).


Biologically Motivated Algorithms for Propagating Local Target Representations

arXiv.org Machine Learning

Finding biologically plausible alternatives to back-propagation of errors is a fundamentally important challenge in artificial neural network research. In this paper, we propose a simple learning algorithm called error-driven Local Representation Alignment (LRA-E), which has strong connections to predictive coding, a theory that offers a mechanistic way of describing neurocomputational machinery. In addition, we propose an improved variant of Difference Target Propagation, another procedure that comes from the same family of algorithms as Local Representation Alignment. We compare our learning procedures to several other biologically-motivated algorithms, including two feedback alignment algorithms and Equilibrium Propagation. In two benchmark datasets, we find that both of our proposed learning algorithms yield stable performance and strong generalization abilities in comparison to other competing back-propagation alternatives when training deeper, highly nonlinear networks, with LRA-E performing the best overall.


Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion

arXiv.org Machine Learning

Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic.


ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositions

arXiv.org Machine Learning

We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3x10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.


Exploring Graph-structured Passage Representation for Multi-hop Reading Comprehension with Graph Neural Networks

arXiv.org Artificial Intelligence

Multi-hop reading comprehension focuses on one type of factoid question, where a system needs to properly integrate multiple pieces of evidence to correctly answer a question. Previous work approximates global evidence with local coreference information, encoding coreference chains with DAG-styled GRU layers within a gated-attention reader. However, coreference is limited in providing information for rich inference. We introduce a new method for better connecting global evidence, which forms more complex graphs compared to DAGs. To perform evidence integration on our graphs, we investigate two recent graph neural networks, namely graph convolutional network (GCN) and graph recurrent network (GRN). Experiments on two standard datasets show that richer global information leads to better answers. Our method performs better than all published results on these datasets.


A tutorial on Particle Swarm Optimization Clustering

arXiv.org Artificial Intelligence

This paper proposes a tutorial on the Data Clustering technique using the Particle Swarm Optimization approach. Following the work proposed by Merwe et al. [1] here we present an in-deep analysis of the algorithm together with a Matlab implementation and a short tutorial that explains how to modify the proposed implementation and the effect of the parameters of the original algorithm. Moreover, we provide a comparison against the results obtained using the well known K-Means approach. All the source code presented in this paper is publicly available under the GPL-v2 license.