Goto

Collaborating Authors

 Bayesian Learning


SentiCite: An Approach for Publication Sentiment Analysis

arXiv.org Machine Learning

Abstract: With the rapid growth in the number of scientific publications, year after year, it is becoming increasingly difficult to identify quality authoritative work on a single topic. Though there is an availability of scientometric measures which promise to offer a solution to this problem, these measures are mostly quantitative and rely, for instance, only on the number of times an article is cited. With this approach, it becomes irrelevant if an article is cited 10 times in a positive, negative or neutral way. In this context, it is quite important to study the qualitative aspect of a citation to understand its significance. This paper presents a novel system for sentiment analysis of citations in scientific documents (SentiCite) and is also capable of detecting nature of citations by targeting the motivation behind a citation, e.g., reference to a dataset, reading reference. Furthermore, the paper also presents two datasets (SentiCiteDB and IntentCiteDB) containing about 2,600 citations with their ground truth for sentiment and nature of citation. SentiCite along with other state-of-the-art methods for sentiment analysis are evaluated on the presented datasets. Evaluation results reveal that SentiCite outperforms state-of-the-art methods for sentiment analysis in scientific publications by achieving a F1-measure of 0.71. 1 INTRODUCTION Sentiment analysis is the process of computationally categorizing and identifying opinions present in a textual document or images. As a field, sentiment analysis has been gaining a lot of interest from the scientific community in recent years. The main motivation for this work comes from the author's observation that there is an unavailability of a system capable of automatically analyzing the sentiment present in citations of scientific publications.


A mathematical theory of cooperative communication

arXiv.org Machine Learning

Cooperative communication plays a central role in theories of human cognition, language, development, and culture, and is increasingly relevant in human-algorithm and robot interaction. Existing models are algorithmic in nature and do not shed light on the statistical problem solved in cooperation or on constraints imposed by violations of common ground. We present a mathematical theory of cooperative communication that unifies three broad classes of algorithmic models as approximations of Optimal Transport (OT). We derive a statistical interpretation for the problem approximated by existing models in terms of entropy minimization, or likelihood maximizing, plans. We show that some models are provably robust to violations of common ground, even supporting online, approximate recovery from discovered violations, and derive conditions under which other models are provably not robust. We do so using gradient-based methods which introduce novel algorithmic-level perspectives on cooperative communication. Our mathematical approach complements and extends empirical research, providing strong theoretical tools derivation of a priori constraints on models and implications for cooperative communication in theory and practice.


Kernel-based Approach to Handle Mixed Data for Inferring Causal Graphs

arXiv.org Artificial Intelligence

Causal learning is a beneficial approach to analyze the cause and effect relationships among variables in a dataset. A causal graph can be generated from a dataset using a particular causal algorithm, for instance, the PC algorithm or Fast Causal Inference (FCI). Generating a causal graph from a dataset that contains different data types (mixed data) is not trivial. This research offers an easy way to handle the mixed data so that it can be used to learn causal graphs using the existing application of the PC algorithm and FCI. This research proposes using kernel functions and Kernel Alignment to handle a mixed data. Two main steps of this approach are computing a kernel matrix for each variable and calculating a pseudo-correlation matrix using Kernel Alignment. Kernel Alignment is used as a substitute for the correlation matrix for the conditional independence test for Gaussian data in PC Algorithm and FCI. The advantage of this idea is that is possible to handle any data type by using a suitable kernel function to compute a kernel matrix for an observed variable. The proposed method is successfully applied to learn a causal graph from a mixed data containing categorical, binary, ordinal, and continuous variables.


What Is Machine Learning?

#artificialintelligence

Machine learning is one of the quickest growing technological fields, but despite how often the words "machine learning" are tossed around, it can be difficult to understand what machine learning is, precisely. Machine learning doesn't refer to just one thing, it's an umbrella term that can be applied to many different concepts and techniques. Understanding machine learning means being familiar with different forms of model analysis, variables, and algorithms. Let's take a close look at machine learning to better understand what it encompasses. While the term machine learning can be applied to many different things, in general, the term refers to enabling a computer to carry out tasks without receiving explicit line-by-line instructions to do so.


Operational Calibration: Debugging Confidence Errors for DNNs in the Field

arXiv.org Machine Learning

Trained DNN models are increasingly adopted as integral parts of software systems. However, they are often over-confident, especially in practical operation domains where slight divergence from their training data almost always exists. To minimize the loss due to inaccurate confidence, operational calibration, i.e., calibrating the confidence function of a DNN classifier against its operation domain, becomes a necessary debugging step in the engineering of the whole system. Operational calibration is difficult considering the limited budget of labeling operation data and the weak interpretability of DNN models. We propose a Bayesian approach to operational calibration that gradually corrects the confidence given by the model under calibration with a small number of labeled operational data deliberately selected from a larger set of unlabeled operational data. Exploiting the locality of the learned representation of the DNN model and modeling the calibration as Gaussian Process Regression, the approach achieves impressive efficacy and efficiency. Comprehensive experiments with various practical data sets and DNN models show that it significantly outperformed alternative methods, and in some difficult tasks it eliminated about 71% to 97% high-confidence errors with only about 10% of the minimal amount of labeled operation data needed for practical learning techniques to barely work.


An Optimal Transport Formulation of the Ensemble Kalman Filter

arXiv.org Machine Learning

Controlled interacting particle systems such as the ensemble Kalman filter (EnKF) and the feedback particle filter (FPF) are numerical algorithms to approximate the solution of the nonlinear filtering problem in continuous time. The distinguishing feature of these algorithms is that the Bayesian update step is implemented using a feedback control law. It has been noted in the literature that the control law is not unique. This is the main problem addressed in this paper. To obtain a unique control law, the filtering problem is formulated here as an optimal transportation problem. An explicit formula for the (mean-field type) optimal control law is derived in the linear Gaussian setting. Comparisons are made with the control laws for different types of EnKF algorithms described in the literature. Via empirical approximation of the mean-field control law, a finite-$N$ controlled interacting particle algorithm is obtained. For this algorithm, the equations for empirical mean and covariance are derived and shown to be identical to the Kalman filter. This allows strong conclusions on convergence and error properties based on the classical filter stability theory for the Kalman filter. It is shown that, under certain technical conditions, the mean squared error (m.s.e.) converges to zero even with a finite number of particles. A detailed propagation of chaos analysis is carried out for the finite-$N$ algorithm. The analysis is used to prove weak convergence of the empirical distribution as $N\rightarrow\infty$. For a certain simplified filtering problem, analytical comparison of the m.s.e. with the importance sampling-based algorithms is described. The analysis helps explain the favorable scaling properties of the control-based algorithms reported in several numerical studies in recent literature.


Logistic Regressions and Rare Events

#artificialintelligence

I previously worked on designing some problem sets for a PhD class. One of the assignments dealt with a simple classification problem using data that I took from a kaggle challenge trying to predict fraudulent credit card transactions. The goal of the problem is to predict the probability that a specific credit card transaction is fraudulent. One unforeseen issue with the data was that the unconditional probability that a single credit card transaction is fraudulent is very small. This type of data is known as rare events data, and is common in many areas such as disease detection, conflict prediction and, of course, fraud detection.


A Gentle Introduction to Bayes Theorem for Machine Learning

#artificialintelligence

Bayes Theorem provides a principled way for calculating a conditional probability. It is a deceptively simple calculation, although it can be used to easily calculate the conditional probability of events where intuition often fails. Bayes Theorem also provides a way for thinking about the evaluation and selection of different models for a given dataset in applied machine learning. Maximizing the probability of a model fitting a dataset is more generally referred to as maximum a posteriori, or MAP for short, and provides a probabilistic framework for predictive modeling. In this post, you will discover Bayes Theorem for calculating conditional probabilities.


Model Order Selection Based on Information Theoretic Criteria: Design of the Penalty

arXiv.org Machine Learning

Information theoretic criteria (ITC) have been widely adopted in engineering and statistics for selecting, among an ordered set of candidate models, the one that better fits the observed sample data. The selected model minimizes a penalized likelihood metric, where the penalty is determined by the criterion adopted. While rules for choosing a penalty that guarantees a consistent estimate of the model order are known, theoretical tools for its design with finite samples have never been provided in a general setting. In this paper, we study model order selection for finite samples under a design perspective, focusing on the generalized information criterion (GIC), which embraces the most common ITC. The theory is general, and as case studies we consider: a) the problem of estimating the number of signals embedded in additive white Gaussian noise (AWGN) by using multiple sensors; b) model selection for the general linear model (GLM), which includes e.g. the problem of estimating the number of sinusoids in AWGN. The analysis reveals a trade-off between the probabilities of overestimating and underestimating the order of the model. We then propose to design the GIC penalty to minimize underestimation while keeping the overestimation probability below a specified level. For the considered problems, this method leads to analytical derivation of the optimal penalty for a given sample size. A performance comparison between the penalty optimized GIC and common AIC and BIC is provided, demonstrating the effectiveness of the proposed design strategy.


Fused Gromov-Wasserstein Alignment for Hawkes Processes

arXiv.org Machine Learning

We propose a novel fused Gromov-Wasserstein alignment method to jointly learn the Hawkes processes in different event spaces, and align their event types. Given two Hawkes processes, we use fused Gromov-Wasserstein discrepancy to measure their dissimilarity, which considers both the Wasserstein discrepancy based on their base intensities and the Gromov-Wasserstein discrepancy based on their infectivity matrices. Accordingly, the learned optimal transport reflects the correspondence between the event types of these two Hawkes processes. The Hawkes processes and their optimal transport are learned jointly via maximum likelihood estimation, with a fused Gromov-Wasserstein regularizer. Experimental results show that the proposed method works well on synthetic and real-world data.