Goto

Collaborating Authors

 Statistical Learning


Effective Question Recommendation Based on Multiple Features for Question Answering Communities

AAAI Conferences

We propose a new method of recommending questions to answerers so as to suit the answerers’ knowledge and interests in User-Interactive Question Answering (QA) communities. A question recommender can help answerers select the questions that interest them. This increases the number of answers, which will activate QA communities. An effective question recommender should satisfy the following three requirements: First, its accuracy should be higher than the existing category-based approach; more than 50% of answerers select the questions to answer according a fixed system of categories. Second, it should be able to recommend unanswered questions because more than 2,000 questions are posted every day. Third, it should be able to support even those people who have never answered a question previously, because more than 50% of users in current QA communities have never given any answer. To achieve an effective question recommender, we use question histories as well as the answer histories of each user by combining collaborative filtering schemes and content-base filtering schemes. Experiments on real log data sets of a famous Japanese QA community, Oshiete goo, show that our recommender satisfies the three requirements.


Empirical Analysis of User Participation in Online Communities: the Case of Wikipedia

AAAI Conferences

We study the distribution of the activity period of users in five of the largest localized versions of the free, on- line encyclopedia Wikipedia. We find it to be consis- tent with a mixture of two truncated log-normal distri- butions. Using this model, the temporal evolution of these systems can be analyzed, showing that the statis- tical description is consistent over time.


To Be a Star Is Not Only Metaphoric: From Popularity to Social Linkage

AAAI Conferences

The emergence of online platforms allowing to mix self publishing activities and social networking offers new possibilities for building online reputation and visibility. In this paper we present a method to analyze the online popularity that takes into consideration both the success of the published content and the social network topology. First, we adapt the Kohonen self organizing maps in order to cluster the users of online platforms depending on their audience and authority characteristics. Then, we perform a detailed analysis of the manner nodes are organized in the social network. Finally, we study the relationship between the network local structure around each node and the corresponding user’s popularity. We apply this method to the MySpace music social network. We observe that the most popular artists are centers of star shaped social structures and that it exists a fraction of artists who are involved in community and social activity dynamics independently of their popularity. This method based on a learning algorithm and on network analysis appears to be a robust and intuitive technique for a rich description of the online behavior.


Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

arXiv.org Machine Learning

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.


Ecological non-linear state space model selection via adaptive particle Markov chain Monte Carlo (AdPMCMC)

arXiv.org Machine Learning

We develop a novel advanced Particle Markov chain Monte Carlo algorithm that is capable of sampling from the posterior distribution of non-linear state space models for both the unobserved latent states and the unknown model parameters. We apply this novel methodology to five population growth models, including models with strong and weak Allee effects, and test if it can efficiently sample from the complex likelihood surface that is often associated with these models. Utilising real and also synthetically generated data sets we examine the extent to which observation noise and process error may frustrate efforts to choose between these models. Our novel algorithm involves an Adaptive Metropolis proposal combined with an SIR Particle MCMC algorithm (AdPMCMC). We show that the AdPMCMC algorithm samples complex, high-dimensional spaces efficiently, and is therefore superior to standard Gibbs or Metropolis Hastings algorithms that are known to converge very slowly when applied to the non-linear state space ecological models considered in this paper. Additionally, we show how the AdPMCMC algorithm can be used to recursively estimate the Bayesian Cram\'er-Rao Lower Bound of Tichavsk\'y (1998). We derive expressions for these Cram\'er-Rao Bounds and estimate them for the models considered. Our results demonstrate a number of important features of common population growth models, most notably their multi-modal posterior surfaces and dependence between the static and dynamic parameters. We conclude by sampling from the posterior distribution of each of the models, and use Bayes factors to highlight how observation noise significantly diminishes our ability to select among some of the models, particularly those that are designed to reproduce an Allee effect.


Scalable Probabilistic Databases with Factor Graphs and MCMC

arXiv.org Artificial Intelligence

Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing {\em modifications} to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our system's ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization.


Reasoning with Logical Proportions

AAAI Conferences

By logical proportion, we mean a statement that expresses a semantical equivalence between two pairs of propositions. In these pairs, each element is compared to the other in terms of similarities and/or dissimilarities. An example of such a proportion is the well known analogical proportion: a is to b as c is to d . Analogical proportions have been recently characterized in logical terms, but there are many other proportions that are worth of interest. Some of them can be related to the analogical pattern, others are related to semantical equivalence between conditional objects and express statements such as a ressembles to b and differs from b in the same way as c with respect to d. We show that there are 5 direct proportions, including the analogical one and 4 others having a conditional object flavor, where the change (if any) from a to b goes in the same direction as the change from c to d (if any), together with 5 reverse proportions obtained by switching c and d. Moreover, there exists only one auto-reverse proportion called paralogy and stating that what a and b have in common, c and d have it as well. It is then established that there is none other proportion than these ones (with the exception of 4 degenerated ones) that satisfies a natural “full identity” requirement. The paper proposes a structured and unified view of these logical proportions and discusses their characteristic properties. It extends previous works where only proportions related to analogy were considered. It also explores the use of these logical proportions in transduction-like inference, where new items are classified on the basis of already classified items without trying to induce a generic model, considering similarities and differences between items only. Taking advantage of different proportions, a transduction procedure is proposed.


Improving the Johnson-Lindenstrauss Lemma

arXiv.org Machine Learning

The Johnson-Lindenstrauss Lemma allows for the projection of $n$ points in $p-$dimensional Euclidean space onto a $k-$dimensional Euclidean space, with $k \ge \frac{24\ln \emph{n}}{3\epsilon^2-2\epsilon^3}$, so that the pairwise distances are preserved within a factor of $1\pm\epsilon$. Here, working directly with the distributions of the random distances rather than resorting to the moment generating function technique, an improvement on the lower bound for $k$ is obtained. The additional reduction in dimension when compared to bounds found in the literature, is at least $13\%$, and, in some cases, up to $30\%$ additional reduction is achieved. Using the moment generating function technique, we further provide a lower bound for $k$ using pairwise $L_2$ distances in the space of points to be projected and pairwise $L_1$ distances in the space of the projected points. Comparison with the results obtained in the literature shows that the bound presented here provides an additional $36-40\%$ reduction.


ECG Feature Extraction Techniques - A Survey Approach

arXiv.org Artificial Intelligence

ECG Feature Extraction plays a significant role in diagnosing most of the cardiac diseases. One cardiac cycle in an ECG signal consists of the P-QRS-T waves. This feature extraction scheme determines the amplitudes and intervals in the ECG signal for subsequent analysis. The amplitudes and intervals value of P-QRS-T segment determines the functioning of heart of every human. Recently, numerous research and techniques have been developed for analyzing the ECG signal. The proposed schemes were mostly based on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. All these techniques and algorithms have their advantages and limitations. This proposed paper discusses various techniques and transformations proposed earlier in literature for extracting feature from an ECG signal. In addition this paper also provides a comparative study of various methods proposed by researchers in extracting the feature from ECG signal.


The Production of Probabilistic Entropy in Structure/Action Contingency Relations

arXiv.org Artificial Intelligence

Luhmann (1984) defined society as a communication system which is structurally coupled to, but not an aggregate of, human action systems. The communication system is then considered as self-organizing ("autopoietic"), as are human actors. Communication systems can be studied by using Shannon's (1948) mathematical theory of communication. The update of a network by action at one of the local nodes is then a well-known problem in artificial intelligence (Pearl 1988). By combining these various theories, a general algorithm for probabilistic structure/action contingency can be derived. The consequences of this contingency for each system, its consequences for their further histories, and the stabilization on each side by counterbalancing mechanisms are discussed, in both mathematical and theoretical terms. An empirical example is elaborated.