Goto

Collaborating Authors

 Bayesian Learning


Gaussian Measures Conditioned on Nonlinear Observations: Consistency, MAP Estimators, and Simulation

arXiv.org Machine Learning

The article presents a systematic study of the problem of conditioning a Gaussian random variable $\xi$ on nonlinear observations of the form $F \circ \phi(\xi)$ where $\phi: \mathcal{X} \to \mathbb{R}^N$ is a bounded linear operator and $F$ is nonlinear. Such problems arise in the context of Bayesian inference and recent machine learning-inspired PDE solvers. We give a representer theorem for the conditioned random variable $\xi \mid F\circ \phi(\xi)$, stating that it decomposes as the sum of an infinite-dimensional Gaussian (which is identified analytically) as well as a finite-dimensional non-Gaussian measure. We also introduce a novel notion of the mode of a conditional measure by taking the limit of the natural relaxation of the problem, to which we can apply the existing notion of maximum a posteriori estimators of posterior measures. Finally, we introduce a variant of the Laplace approximation for the efficient simulation of the aforementioned conditioned Gaussian random variables towards uncertainty quantification.


Thesis: Document Summarization with applications to Keyword extraction and Image Retrieval

arXiv.org Artificial Intelligence

Automatic summarization is the process of reducing a text document in order to generate a summary that retains the most important points of the original document. In this work, we study two problems - i) summarizing a text document as set of keywords/caption, for image recommedation, ii) generating opinion summary which good mix of relevancy and sentiment with the text document. Intially, we present our work on an recommending images for enhancing a substantial amount of existing plain text news articles. We use probabilistic models and word similarity heuristics to generate captions and extract Key-phrases which are re-ranked using a rank aggregation framework with relevance feedback mechanism. We show that such rank aggregation and relevant feedback which are typically used in Tagging Documents, Text Information Retrieval also helps in improving image retrieval. These queries are fed to the Yahoo Search Engine to obtain relevant images 1. Our proposed method is observed to perform better than all existing baselines. Additonally, We propose a set of submodular functions for opinion summarization. Opinion summarization has built in it the tasks of summarization and sentiment detection. However, it is not easy to detect sentiment and simultaneously extract summary. The two tasks conflict in the sense that the demand of compression may drop sentiment bearing sentences, and the demand of sentiment detection may bring in redundant sentences. However, using submodularity we show how to strike a balance between the two requirements. Our functions generate summaries such that there is good correlation between document sentiment and summary sentiment along with good ROUGE score. We also compare the performances of the proposed submodular functions.


Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation

arXiv.org Artificial Intelligence

Structured prediction tasks, like machine translation, involve learning functions that map structured inputs to structured outputs. Recurrent Neural Networks (RNNs) have historically been a popular choice for such tasks, including in natural language processing (NLP) applications. However, training RNNs using Maximum Likelihood Estimation (MLE) has its limitations, including exposure bias and a mismatch between training and testing metrics. SEARNN, based on the learning to search (L2S) framework, has been proposed as an alternative to MLE for RNN training. This project explored the potential of SEARNN to improve machine translation for low-resourced African languages -- a challenging task characterized by limited training data availability and the morphological complexity of the languages. Through experiments conducted on translation for English to Igbo, French to \ewe, and French to \ghomala directions, this project evaluated the efficacy of SEARNN over MLE in addressing the unique challenges posed by these languages. With an average BLEU score improvement of $5.4$\% over the MLE objective, we proved that SEARNN is indeed a viable algorithm to effectively train RNNs on machine translation for low-resourced languages.


Particle swarm optimization with Applications to Maximum Likelihood Estimation and Penalized Negative Binomial Regression

arXiv.org Machine Learning

These authors contribute to the paper equally. Abstract General purpose optimization routines such as nlminb, optim (R) or nlmixed (SAS) are frequently used to estimate model parameters in nonstandard distributions. This paper presents Particle Swarm Optimization (PSO), as an alternative to many of the current algorithms used in statistics. We find that PSO can not only reproduce the same results as the above routines, it can also produce results that are more optimal or when others cannot converge. In the latter case, it can also identify the source of the problem or problems. We highlight advantages of using PSO using four examples, where: (1) some parameters in a generalized distribution are unidentified using PSO when it is not apparent or computationally manifested using routines in R or SAS; (2) PSO can produce estimation results for the log-binomial regressions when current routines may not; (3) PSO provides flexibility in the link function for binomial regression with LASSO penalty, which is unsupported by standard packages like GLM and GENMOD in Stata and SAS, respectively, and (4) PSO provides superior MLE estimates for an EE-IW distribution compared with those from the traditional statistical methods that rely on moments. Metaheuristics, and in particular, nature-inspired metaheuristic algorithms, is increasingly used across disciplines to tackle challenging optimization problems [11]. They may be broadly categorized swarm based or evolutionary based algorithms. Some examples of the former are particle swarm optimization and competitive swarm optimizer (CSO) and examples of the latter are genetic algorithm (GA) and the differential evolution. The statistical community is probably most aware of GA and simulated annealing (SA) but they are many others that have recently proven more popular in engineering and computer science.


Semantic Trajectory Data Mining with LLM-Informed POI Classification

arXiv.org Artificial Intelligence

Human travel trajectory mining is crucial for transportation systems, enhancing route optimization, traffic management, and the study of human travel patterns. Previous rule-based approaches without the integration of semantic information show a limitation in both efficiency and accuracy. Semantic information, such as activity types inferred from Points of Interest (POI) data, can significantly enhance the quality of trajectory mining. However, integrating these insights is challenging, as many POIs have incomplete feature information, and current learning-based POI algorithms require the integrity of datasets to do the classification. In this paper, we introduce a novel pipeline for human travel trajectory mining. Our approach first leverages the strong inferential and comprehension capabilities of large language models (LLMs) to annotate POI with activity types and then uses a Bayesian-based algorithm to infer activity for each stay point in a trajectory. In our evaluation using the OpenStreetMap (OSM) POI dataset, our approach achieves a 93.4% accuracy and a 96.1% F-1 score in POI classification, and a 91.7% accuracy with a 92.3% F-1 score in activity inference.


How to integrate cloud service, data analytic and machine learning technique to reduce cyber risks associated with the modern cloud based infrastructure

arXiv.org Artificial Intelligence

In today's dynamic and competitive digital era, companies are leveraging cloud technology, machine learning, and data visualization techniques to reinvent their business processes. The combination of cloud technology, machine learning, and data visualization techniques allows hybrid enterprise networks to hold massive volumes of data and provide employees and customers easy access to these cloud data. These massive collections of complex data sets are facing security challenges. While cloud platforms are more vulnerable to security threats and traditional security technologies are unable to cope with the rapid data explosion in cloud platforms, machine learning powered security solutions and data visualization techniques are playing instrumental roles in detecting security threat, data breaches, and automatic finding software vulnerabilities. The purpose of this paper is to present some of the widely used cloud services, machine learning techniques and data visualization approach and demonstrate how to integrate cloud service, data analytic and machine learning techniques that can be used to detect and reduce cyber risks associated with the modern cloud based infrastructure. In this paper I applied the machine learning supervised classifier to design a model based on wellknown UNSW-NB15 dataset to predict the network behavior metrics and demonstrated how data analytics techniques can be integrated to visualize network traffics.


Towards Optimal Beacon Placement for Range-Aided Localization

arXiv.org Artificial Intelligence

Range-based localization is ubiquitous: global navigation satellite systems (GNSS) power mobile phone-based navigation, and autonomous mobile robots can use range measurements from a variety of modalities including sonar, radar, and even WiFi signals. Many of these localization systems rely on fixed anchors or beacons with known positions acting as transmitters or receivers. In this work, we answer a fundamental question: given a set of positions we would like to localize, how should beacons be placed so as to minimize localization error? Specifically, we present an information theoretic method for optimally selecting an arrangement consisting of a few beacons from a large set of candidate positions. By formulating localization as maximum a posteriori (MAP) estimation, we can cast beacon arrangement as a submodular set function maximization problem. This approach is probabilistically rigorous, simple to implement, and extremely flexible. Furthermore, we prove that the submodular structure of our problem formulation ensures that a greedy algorithm for beacon arrangement has suboptimality guarantees. We compare our method with a number of benchmarks on simulated data and release an open source Python implementation of our algorithm and experiments.


Causal Customer Churn Analysis with Low-rank Tensor Block Hazard Model

arXiv.org Machine Learning

This study introduces an innovative method for analyzing the impact of various interventions on customer churn, using the potential outcomes framework. We present a new causal model, the tensorized latent factor block hazard model, which incorporates tensor completion methods for a principled causal analysis of customer churn. A crucial element of our approach is the formulation of a 1-bit tensor completion for the parameter tensor. This captures hidden customer characteristics and temporal elements from churn records, effectively addressing the binary nature of churn data and its time-monotonic trends. Our model also uniquely categorizes interventions by their similar impacts, enhancing the precision and practicality of implementing customer retention strategies. For computational efficiency, we apply a projected gradient descent algorithm combined with spectral clustering. We lay down the theoretical groundwork for our model, including its non-asymptotic properties. The efficacy and superiority of our model are further validated through comprehensive experiments on both simulated and real-world applications.


The Landscape of Unfolding with Machine Learning

arXiv.org Artificial Intelligence

Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.


Decoding Geometric Properties in Non-Random Data from First Information-Theoretic Principles

arXiv.org Artificial Intelligence

Based on the principles of information theory, measure theory, and theoretical computer science, we introduce a univariate signal deconvolution method with a wide range of applications to coding theory, particularly in zero-knowledge one-way communication channels, such as in deciphering messages from unknown generating sources about which no prior knowledge is available and to which no return message can be sent. Our multidimensional space reconstruction method from an arbitrary received signal is proven to be agnostic vis-a-vis the encoding-decoding scheme, computation model, programming language, formal theory, the computable (or semi-computable) method of approximation to algorithmic complexity, and any arbitrarily chosen (computable) probability measure of the events. The method derives from the principles of an approach to Artificial General Intelligence capable of building a general-purpose model of models independent of any arbitrarily assumed prior probability distribution. We argue that this optimal and universal method of decoding non-random data has applications to signal processing, causal deconvolution, topological and geometric properties encoding, cryptography, and bio- and technosignature detection.