AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Effective Question Recommendation Based on Multiple Features for Question Answering Communities

Kabutoya, Yutaka (NTT Cyber Solutions Laboratories, NTT Corporation) | Iwata, Tomoharu (NTT Cyber Solutions Laboratories, NTT Corporation) | Shiohara, Hisako (NTT Cyber Solutions Laboratories, NTT Corporation) | Fujimura, Ko (NTT Cyber Solutions Laboratories, NTT Corporation)

AAAI ConferencesMay-17-2010

We propose a new method of recommending questions to answerers so as to suit the answerers’ knowledge and interests in User-Interactive Question Answering (QA) communities. A question recommender can help answerers select the questions that interest them. This increases the number of answers, which will activate QA communities. An effective question recommender should satisfy the following three requirements: First, its accuracy should be higher than the existing category-based approach; more than 50% of answerers select the questions to answer according a fixed system of categories. Second, it should be able to recommend unanswered questions because more than 2,000 questions are posted every day. Third, it should be able to support even those people who have never answered a question previously, because more than 50% of users in current QA communities have never given any answer. To achieve an effective question recommender, we use question histories as well as the answer histories of each user by combining collaborative filtering schemes and content-base filtering schemes. Experiments on real log data sets of a famous Japanese QA community, Oshiete goo, show that our recommender satisfies the three requirements.

accuracy, machine learning, question answering, (20 more...)

AAAI Conferences

Fourth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > New York (0.05)
Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:

Research Report > New Finding (0.30)
Questionnaire & Opinion Survey (0.30)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.70)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Empirical Analysis of User Participation in Online Communities: the Case of Wikipedia

Ciampaglia, Giovanni Luca (Università della Svizzera Italiana) | Vancheri, Alberto (Università della Svizzera Italiana)

AAAI ConferencesMay-17-2010

We study the distribution of the activity period of users in five of the largest localized versions of the free, on- line encyclopedia Wikipedia. We find it to be consis- tent with a mixture of two truncated log-normal distri- butions. Using this model, the temporal evolution of these systems can be analyzed, showing that the statis- tical description is consistent over time.

artificial intelligence, machine learning, social media, (17 more...)

AAAI Conferences

Fourth International AAAI Conference on Weblogs and Social Media

Country:

Europe > Switzerland (0.05)
North America > United States > New York (0.04)

Genre: Research Report (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

To Be a Star Is Not Only Metaphoric: From Popularity to Social Linkage

Stoica, Alina Mihaela (Orange Labs and LIAFA, University Paris 7) | Couronne, Thomas (Orange Labs) | Beuscart, Jean - Samuel (Orange Labs)

AAAI ConferencesMay-17-2010

The emergence of online platforms allowing to mix self publishing activities and social networking offers new possibilities for building online reputation and visibility. In this paper we present a method to analyze the online popularity that takes into consideration both the success of the published content and the social network topology. First, we adapt the Kohonen self organizing maps in order to cluster the users of online platforms depending on their audience and authority characteristics. Then, we perform a detailed analysis of the manner nodes are organized in the social network. Finally, we study the relationship between the network local structure around each node and the corresponding user’s popularity. We apply this method to the MySpace music social network. We observe that the most popular artists are centers of star shaped social structures and that it exists a fraction of artists who are involved in community and social activity dynamics independently of their popularity. This method based on a learning algorithm and on network analysis appears to be a robust and intuitive technique for a rich description of the online behavior.

artificial intelligence, machine learning, vertex, (20 more...)

AAAI Conferences

Fourth International AAAI Conference on Weblogs and Social Media

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New York > New York County > New York City (0.04)

Industry: Information Technology > Services (0.77)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Hierarchical Clustering for Finding Symmetries and Other Patterns in Massive, High Dimensional Datasets

Murtagh, Fionn, Contreras, Pedro

arXiv.org Machine LearningMay-14-2010

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. "Structure" can be understood as symmetry and a range of symmetries are expressed by hierarchy. Such symmetries directly point to invariants, that pinpoint intrinsic properties of the data and of the background empirical domain of interest. We review many aspects of hierarchy here, including ultrametric topology, generalized ultrametric, linkages with lattices and other discrete algebraic structures and with p-adic number representations. By focusing on symmetries in data we have a powerful means of structuring and analyzing massive, high dimensional data stores. We illustrate the powerfulness of hierarchical clustering in case studies in chemistry and finance, and we provide pointers to other published case studies.

artificial intelligence, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

1005.2638

Country:

North America > United States (0.46)
Europe > United Kingdom (0.28)

Genre: Overview (0.87)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Ecological non-linear state space model selection via adaptive particle Markov chain Monte Carlo (AdPMCMC)

Peters, Gareth W., Hosack, Geoff R., Hayes, Keith R.

arXiv.org Machine LearningMay-12-2010

We develop a novel advanced Particle Markov chain Monte Carlo algorithm that is capable of sampling from the posterior distribution of non-linear state space models for both the unobserved latent states and the unknown model parameters. We apply this novel methodology to five population growth models, including models with strong and weak Allee effects, and test if it can efficiently sample from the complex likelihood surface that is often associated with these models. Utilising real and also synthetically generated data sets we examine the extent to which observation noise and process error may frustrate efforts to choose between these models. Our novel algorithm involves an Adaptive Metropolis proposal combined with an SIR Particle MCMC algorithm (AdPMCMC). We show that the AdPMCMC algorithm samples complex, high-dimensional spaces efficiently, and is therefore superior to standard Gibbs or Metropolis Hastings algorithms that are known to converge very slowly when applied to the non-linear state space ecological models considered in this paper. Additionally, we show how the AdPMCMC algorithm can be used to recursively estimate the Bayesian Cram\'er-Rao Lower Bound of Tichavsk\'y (1998). We derive expressions for these Cram\'er-Rao Bounds and estimate them for the models considered. Our results demonstrate a number of important features of common population growth models, most notably their multi-modal posterior surfaces and dependence between the static and dynamic parameters. We conclude by sampling from the posterior distribution of each of the models, and use Bayes factors to highlight how observation noise significantly diminishes our ability to select among some of the models, particularly those that are designed to reproduce an Allee effect.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Machine Learning

1005.2238

Country:

North America > United States (0.67)
Europe > United Kingdom > England (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Scalable Probabilistic Databases with Factor Graphs and MCMC

Wick, Michael, McCallum, Andrew, Miklau, Gerome

arXiv.org Artificial IntelligenceMay-11-2010

Probabilistic databases play a crucial role in the management and understanding of uncertain data. However, incorporating probabilities into the semantics of incomplete databases has posed many challenges, forcing systems to sacrifice modeling power, scalability, or restrict the class of relational algebra formula under which they are closed. We propose an alternative approach where the underlying relational database always represents a single world, and an external factor graph encodes a distribution over possible worlds; Markov chain Monte Carlo (MCMC) inference is then used to recover this uncertainty to a desired level of fidelity. Our approach allows the efficient evaluation of arbitrary queries over probabilistic databases with arbitrary dependencies expressed by graphical models with structure that changes during inference. MCMC sampling provides efficiency by hypothesizing {\em modifications} to possible worlds rather than generating entire worlds from scratch. Queries are then run over the portions of the world that change, avoiding the onerous cost of running full queries over each sampled world. A significant innovation of this work is the connection between MCMC sampling and materialized view maintenance techniques: we find empirically that using view maintenance techniques is several orders of magnitude faster than naively querying each sampled world. We also demonstrate our system's ability to answer relational queries with aggregation, and demonstrate additional scalability through the use of parallelization.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

1005.1934

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Reasoning with Logical Proportions

Prade, Henri (IRIT - University of Toulouse) | Richard, Gilles (British Institute of Technology and E-commerce)

AAAI ConferencesMay-9-2010

By logical proportion, we mean a statement that expresses a semantical equivalence between two pairs of propositions. In these pairs, each element is compared to the other in terms of similarities and/or dissimilarities. An example of such a proportion is the well known analogical proportion: a is to b as c is to d . Analogical proportions have been recently characterized in logical terms, but there are many other proportions that are worth of interest. Some of them can be related to the analogical pattern, others are related to semantical equivalence between conditional objects and express statements such as a ressembles to b and differs from b in the same way as c with respect to d. We show that there are 5 direct proportions, including the analogical one and 4 others having a conditional object ﬂavor, where the change (if any) from a to b goes in the same direction as the change from c to d (if any), together with 5 reverse proportions obtained by switching c and d. Moreover, there exists only one auto-reverse proportion called paralogy and stating that what a and b have in common, c and d have it as well. It is then established that there is none other proportion than these ones (with the exception of 4 degenerated ones) that satisﬁes a natural “full identity” requirement. The paper proposes a structured and uniﬁed view of these logical proportions and discusses their characteristic properties. It extends previous works where only proportions related to analogy were considered. It also explores the use of these logical proportions in transduction-like inference, where new items are classiﬁed on the basis of already classiﬁed items without trying to induce a generic model, considering similarities and differences between items only. Taking advantage of different proportions, a transduction procedure is proposed.

analogical proportion, proportion, reverse analogy, (14 more...)

AAAI Conferences

Twelfth International Conference on the Principles of Knowledge Representation and Reasoning

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Île-de-France > Paris > Paris (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Improving the Johnson-Lindenstrauss Lemma

Rojo, Javier, Nguyen, Tuan

arXiv.org Machine LearningMay-9-2010

The Johnson-Lindenstrauss Lemma allows for the projection of $n$ points in $p-$dimensional Euclidean space onto a $k-$dimensional Euclidean space, with $k \ge \frac{24\ln \emph{n}}{3\epsilon^2-2\epsilon^3}$, so that the pairwise distances are preserved within a factor of $1\pm\epsilon$. Here, working directly with the distributions of the random distances rather than resorting to the moment generating function technique, an improvement on the lower bound for $k$ is obtained. The additional reduction in dimension when compared to bounds found in the literature, is at least $13\%$, and, in some cases, up to $30\%$ additional reduction is achieved. Using the moment generating function technique, we further provide a lower bound for $k$ using pairwise $L_2$ distances in the space of points to be projected and pairwise $L_1$ distances in the space of the projected points. Comparison with the results obtained in the literature shows that the bound presented here provides an additional $36-40\%$ reduction.

dasgupta and gupta, probability, random matrix, (13 more...)

arXiv.org Machine Learning

1005.144

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Afghanistan > Parwan Province > Charikar (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

ECG Feature Extraction Techniques - A Survey Approach

Karpagachelvi, S., Arthanari, M., Sivakumar, M.

arXiv.org Artificial IntelligenceMay-6-2010

ECG Feature Extraction plays a significant role in diagnosing most of the cardiac diseases. One cardiac cycle in an ECG signal consists of the P-QRS-T waves. This feature extraction scheme determines the amplitudes and intervals in the ECG signal for subsequent analysis. The amplitudes and intervals value of P-QRS-T segment determines the functioning of heart of every human. Recently, numerous research and techniques have been developed for analyzing the ECG signal. The proposed schemes were mostly based on Fuzzy Logic Methods, Artificial Neural Networks (ANN), Genetic Algorithm (GA), Support Vector Machines (SVM), and other Signal Analysis techniques. All these techniques and algorithms have their advantages and limitations. This proposed paper discusses various techniques and transformations proposed earlier in literature for extracting feature from an ECG signal. In addition this paper also provides a comparative study of various methods proposed by researchers in extracting the feature from ECG signal.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1005.0957

Country: Asia > India (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining > Feature Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.56)

Add feedback

The Production of Probabilistic Entropy in Structure/Action Contingency Relations

Leydesdorff, Loet

arXiv.org Artificial IntelligenceMay-5-2010

Luhmann (1984) defined society as a communication system which is structurally coupled to, but not an aggregate of, human action systems. The communication system is then considered as self-organizing ("autopoietic"), as are human actors. Communication systems can be studied by using Shannon's (1948) mathematical theory of communication. The update of a network by action at one of the local nodes is then a well-known problem in artificial intelligence (Pearl 1988). By combining these various theories, a general algorithm for probabilistic structure/action contingency can be derived. The consequences of this contingency for each system, its consequences for their further histories, and the stabilization on each side by counterbalancing mechanisms are discussed, in both mathematical and theoretical terms. An empirical example is elaborated.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Artificial Intelligence

1005.0707

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback