Asia
A Case-Based Reasoning Framework to Choose Trust Models for Different E-Marketplace Environments
A. Irissappane, Athirai, Zhang, Jie
The performance of trust models highly depend on the characteristics of the environments where they are applied. Thus, it becomes challenging to choose a suitable trust model for a given e-marketplace environment, especially when ground truth about the agent (buyer and seller) behavior is unknown (called unknown environment). We propose a case-based reasoning framework to choose suitable trust models for unknown environments, based on the intuition that if a trust model performs well in one environment, it will do so in another similar environment. Firstly, we build a case base with a number of simulated environments (with known ground truth) along with the trust models most suitable for each of them. Given an unknown environment, case-based retrieval algorithms retrieve the most similar case(s), and the trust model of the most similar case(s) is chosen as the most suitable model for the unknown environment. Evaluation results confirm the effectiveness of our framework in choosing suitable trust models for different e-marketplace environments.
Asymptotic Accuracy of Bayesian Estimation for a Single Latent Variable
In data science and machine learning, hierarchical parametric models, such as mixture models, are often used. They contain two kinds of variables: observable variables, which represent the parts of the data that can be directly measured, and latent variables, which represent the underlying processes that generate the data. Although there has been an increase in research on the estimation accuracy for observable variables, the theoretical analysis of estimating latent variables has not been thoroughly investigated. In a previous study, we determined the accuracy of a Bayes estimation for the joint probability of the latent variables in a dataset, and we proved that the Bayes method is asymptotically more accurate than the maximum-likelihood method. However, the accuracy of the Bayes estimation for a single latent variable remains unknown. In the present paper, we derive the asymptotic expansions of the error functions, which are defined by the Kullback-Leibler divergence, for two types of single-variable estimations when the statistical regularity is satisfied. Our results indicate that the accuracies of the Bayes and maximum-likelihood methods are asymptotically equivalent and clarify that the Bayes method is only advantageous for multivariable estimations.
Actively Learning to Attract Followers on Twitter
Levine, Nir, Mann, Timothy A., Mannor, Shie
Twitter, a popular social network, presents great opportunities for on-line machine learning research. However, previous research has focused almost entirely on learning from passively collected data. We study the problem of learning to acquire followers through normative user behavior, as opposed to the mass following policies applied by many bots. We formalize the problem as a contextual bandit problem, in which we consider retweeting content to be the action chosen and each tweet (content) is accompanied by context. We design reward signals based on the change in followers. The result of our month long experiment with 60 agents suggests that (1) aggregating experience across agents can adversely impact prediction accuracy and (2) the Twitter community's response to different actions is non-stationary. Our findings suggest that actively learning on-line can provide deeper insights about how to attract followers than machine learning over passively collected data alone.
Generalized Correntropy for Robust Adaptive Filtering
Chen, Badong, Xing, Lei, Zhao, Haiquan, Zheng, Nanning, Prรญncipe, Josรฉ C.
As a robust nonlinear similarity measure in kernel space, correntropy has received increasing attention in domains of machine learning and signal processing. In particular, the maximum correntropy criterion (MCC) has recently been successfully applied in robust regression and filtering. The default kernel function in correntropy is the Gaussian kernel, which is, of course, not always the best choice. In this work, we propose a generalized correntropy that adopts the generalized Gaussian density (GGD) function as the kernel (not necessarily a Mercer kernel), and present some important properties. We further propose the generalized maximum correntropy criterion (GMCC), and apply it to adaptive filtering. An adaptive algorithm, called the GMCC algorithm, is derived, and the mean square convergence performance is studied. We show that the proposed algorithm is very stable and can achieve zero probability of divergence (POD). Simulation results confirm the theoretical expectations and demonstrate the desirable performance of the new algorithm.
Gradient of Probability Density Functions based Contrasts for Blind Source Separation (BSS)
The article derives some novel independence measures and contrast functions for Blind Source Separation (BSS) application. For the $k^{th}$ order differentiable multivariate functions with equal hyper-volumes (region bounded by hyper-surfaces) and with a constraint of bounded support for $k>1$, it proves that equality of any $k^{th}$ order derivatives implies equality of the functions. The difference between product of marginal Probability Density Functions (PDFs) and joint PDF of a random vector is defined as Function Difference (FD) of a random vector. Assuming the PDFs are $k^{th}$ order differentiable, the results on generalized functions are applied to the independence condition. This brings new sets of independence measures and BSS contrasts based on the $L^p$-Norm, $ p \geq 1$ of - FD, gradient of FD (GFD) and Hessian of FD (HFD). Instead of a conventional two stage indirect estimation method for joint PDF based BSS contrast estimation, a single stage direct estimation of the contrasts is desired. The article targets both the efficient estimation of the proposed contrasts and extension of the potential theory for an information field. The potential theory has a concept of reference potential and it is used to derive closed form expression for the relative analysis of potential field. Analogous to it, there are introduced concepts of Reference Information Potential (RIP) and Cross Reference Information Potential (CRIP) based on the potential due to kernel functions placed at selected sample points as basis in kernel methods. The quantities are used to derive closed form expressions for information field analysis using least squares. The expressions are used to estimate $L^2$-Norm of FD and $L^2$-Norm of GFD based contrasts.
Consistent Collective Matrix Completion under Joint Low Rank Structure
Gunasekar, Suriya, Yamada, Makoto, Yin, Dawei, Chang, Yi
We address the collective matrix completion problem of jointly recovering a collection of matrices with shared structure from partial (and potentially noisy) observations. To ensure well--posedness of the problem, we impose a joint low rank structure, wherein each component matrix is low rank and the latent space of the low rank factors corresponding to each entity is shared across the entire collection. We first develop a rigorous algebra for representing and manipulating collective--matrix structure, and identify sufficient conditions for consistent estimation of collective matrices. We then propose a tractable convex estimator for solving the collective matrix completion problem, and provide the first non--trivial theoretical guarantees for consistency of collective matrix completion. We show that under reasonable assumptions stated in Section 3.1, with high probability, the proposed estimator exactly recovers the true matrices whenever sample complexity requirements dictated by Theorem 1 are met. The sample complexity requirement derived in the paper are optimum up to logarithmic factors, and significantly improve upon the requirements obtained by trivial extensions of standard matrix completion. Finally, we propose a scalable approximate algorithm to solve the proposed convex program, and corroborate our results through simulated experiments.
The Gram-Charlier A Series based Extended Rule-of-Thumb for Bandwidth Selection in Univariate and Multivariate Kernel Density Estimations
The article derives a novel Gram-Charlier A (GCA) Series based Extended Rule-of-Thumb (ExROT) for bandwidth selection in Kernel Density Estimation (KDE). There are existing various bandwidth selection rules achieving minimization of the Asymptotic Mean Integrated Square Error (AMISE) between the estimated probability density function (PDF) and the actual PDF. The rules differ in a way to estimate the integration of the squared second order derivative of an unknown PDF $(f(\cdot))$, identified as the roughness $R(f''(\cdot))$. The simplest Rule-of-Thumb (ROT) estimates $R(f''(\cdot))$ with an assumption that the density being estimated is Gaussian. Intuitively, better estimation of $R(f''(\cdot))$ and consequently better bandwidth selection rules can be derived, if the unknown PDF is approximated through an infinite series expansion based on a more generalized density assumption. As a demonstration and verification to this concept, the ExROT derived in the article uses an extended assumption that the density being estimated is near Gaussian. This helps use of the GCA expansion as an approximation to the unknown near Gaussian PDF. The ExROT for univariate KDE is extended to that for multivariate KDE. The required multivariate AMISE criteria is re-derived using elementary calculus of several variables, instead of Tensor calculus. The derivation uses the Kronecker product and the vector differential operator to achieve the AMISE expression in vector notations. There is also derived ExROT for kernel based density derivative estimator.
Beta diffusion trees and hierarchical feature allocations
Heaukulani, Creighton, Knowles, David A., Ghahramani, Zoubin
We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlapping subsets of objects, known as a feature allocation. A generative process for the tree structure is defined in terms of particles (representing the objects) diffusing in some continuous space, analogously to the Dirichlet diffusion tree (Neal, 2003b), which defines a tree structure over partitions (i.e., non-overlapping subsets) of the objects. Unlike in the Dirichlet diffusion tree, multiple copies of a particle may exist and diffuse along multiple branches in the beta diffusion tree, and an object may therefore belong to multiple subsets of particles. We demonstrate how to build a hierarchically-clustered factor analysis model with the beta diffusion tree and how to perform inference over the random tree structures with a Markov chain Monte Carlo algorithm. We conclude with several numerical experiments on missing data problems with data sets of gene expression microarrays, international development statistics, and intranational socioeconomic measurements.
Properties of the Least Squares Temporal Difference learning algorithm
This paper presents four different ways of looking at the well-known Least Squares Temporal Differences (LSTD) algorithm for computing the value function of a Markov Reward Process, each of them leading to different insights: the operator-theory approach via the Galerkin method, the statistical approach via instrumental variables, the linear dynamical system view as well as the limit of the TD iteration. We also give a geometric view of the algorithm as an oblique projection. Furthermore, there is an extensive comparison of the optimization problem solved by LSTD as compared to Bellman Residual Minimization (BRM). We then review several schemes for the regularization of the LSTD solution. We then proceed to treat the modification of LSTD for the case of episodic Markov Reward Processes.
Modeling the Lifespan of Discourse Entities with Application to Coreference Resolution
de Marneffe, Marie-Catherine, Recasens, Marta, Potts, Christopher
A discourse typically involves numerous entities, but few are mentioned more than once. Distinguishing those that die out after just one mention (singleton) from those that lead longer lives (coreferent) would dramatically simplify the hypothesis space for coreference resolution models, leading to increased performance. To realize these gains, we build a classifier for predicting the singleton/coreferent distinction. The models feature representations synthesize linguistic insights about the factors affecting discourse entity lifespans (especially negation, modality, and attitude predication) with existing results about the benefits of surface (part-of-speech and n-gram-based) features for coreference resolution. The model is effective in its own right, and the feature representations help to identify the anchor phrases in bridging anaphora as well. Furthermore, incorporating the model into two very different state-of-the-art coreference resolution systems, one rule-based and the other learning-based, yields significant performance improvements.