AITopics | Country

Plotting

Country

Sparse Non Gaussian Component Analysis by Semidefinite Programming

Diederichs, Elmar, Juditsky, Anatoli, Nemirovski, Arkadi, Spokoiny, Vladimir

arXiv.org Machine LearningJan-13-2012

Sparse non-Gaussian component analysis (SNGCA) is an unsupervised method of extracting a linear structure from a high dimensional data based on estimating a low-dimensional non-Gaussian data component. In this paper we discuss a new approach to direct estimation of the projector on the target space based on semidefinite programming which improves the method sensitivity to a broad variety of deviations from normality. We also discuss the procedures which allows to recover the structure when its effective dimension is unknown.

artificial intelligence, optimization problem, procedure, (21 more...)

arXiv.org Machine Learning

1106.0321

Country:

North America > United States (0.28)
Europe > Germany (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)

Add feedback

Distance-Based Bias in Model-Directed Optimization of Additively Decomposable Problems

Pelikan, Martin, Hauschild, Mark W.

arXiv.org Artificial IntelligenceJan-10-2012

For many optimization problems it is possible to define a distance metric between problem variables that correlates with the likelihood and strength of interactions between the variables. For example, one may define a metric so that the dependencies between variables that are closer to each other with respect to the metric are expected to be stronger than the dependencies between variables that are further apart. The purpose of this paper is to describe a method that combines such a problem-specific distance metric with information mined from probabilistic models obtained in previous runs of estimation of distribution algorithms with the goal of solving future problem instances of similar type with increased speed, accuracy and reliability. While the focus of the paper is on additively decomposable problems and the hierarchical Bayesian optimization algorithm, it should be straightforward to generalize the approach to other model-directed optimization techniques and other problem classes. Compared to other techniques for learning from experience put forward in the past, the proposed technique is both more practical and more broadly applicable.

bayesian inference, optimization problem, pelikan, (18 more...)

arXiv.org Artificial Intelligence

1201.2241

Country:

North America > United States > Missouri > St. Louis County > St. Louis (0.14)
North America > United States > Illinois > Champaign County (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)

Add feedback

Sentence based semantic similarity measure for blog-posts

Aziz, Mehwish, Rafi, Muhammad

arXiv.org Artificial IntelligenceJan-10-2012

Blogs-Online digital diary like application on web 2.0 has opened new and easy way to voice opinion, thoughts, and like-dislike of every Internet user to the World. Blogosphere has no doubt the largest user-generated content repository full of knowledge. The potential of this knowledge is still to be explored. Knowledge discovery from this new genre is quite difficult and challenging as it is totally different from other popular genre of web-applications like World Wide Web (WWW). Blog-posts unlike web documents are small in size, thus lack in context and contain relaxed grammatical structures. Hence, standard text similarity measure fails to provide good results. In this paper, specialized requirements for comparing a pair of blog-posts is thoroughly investigated. Based on this we proposed a novel algorithm for sentence oriented semantic similarity measure of a pair of blog-posts. We applied this algorithm on a subset of political blogosphere of Pakistan, to cluster the blogs on different issues of political realm and to identify the influential bloggers.

arXiv.org Artificial Intelligence

1201.2084

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Industry: Media > News (0.57)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Fuzzy Model Construction with Statistical Information using Genetic Algorithm

Hossain, Md. Amjad, Shill, Pintu Chandra, Sarker, Bishnu, Murase, Kazuyuki

arXiv.org Artificial IntelligenceJan-10-2012

Fuzzy rule based models have a capability to approximate any continuous function to any degree of accuracy on a compact domain. The majority of FLC design process relies on heuristic knowledge of experience operators. In order to make the design process automatic we present a genetic approach to learn fuzzy rules as well as membership function parameters. Moreover, several statistical information criteria such as the Akaike information criterion (AIC), the Bhansali-Downham information criterion (BDIC), and the Schwarz-Rissanen information criterion (SRIC) are used to construct optimal fuzzy models by reducing fuzzy rules. A genetic scheme is used to design Takagi-Sugeno-Kang (TSK) model for identification of the antecedent rule parameters and the identification of the consequent parameters. Computer simulations are presented confirming the performance of the constructed fuzzy logic controller.

computer science & information technology, expert system, fuzzy logic, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.5121/ijcsit.2011.3619

1201.2004

Country:

Asia (0.15)
North America > United States (0.14)
Europe > Italy (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

A comparison of two suffix tree-based document clustering algorithms

Rafi, Muhammad, Maujood, M., Fazal, M. M., Ali, S. M.

arXiv.org Artificial IntelligenceJan-10-2012

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional vector based document similarity for clustering to suffix tree based document similarity, as it offers more semantic representation of the text present in the document. In this paper, we compare and contrast two recently introduced approaches to document clustering based on suffix tree data model. The first is an Efficient Phrase based document clustering, which extracts phrases from documents to form compact document representation and uses a similarity measure based on common suffix tree to cluster the documents. The second approach is a frequent word/word meaning sequence based document clustering, it similarly extracts the common word sequence from the document and uses the common sequence/ common word meaning sequence to perform the compact representation, and finally, it uses document clustering approach to cluster the compact documents. These algorithms are using agglomerative hierarchical document clustering to perform the actual clustering step, the difference in these approaches are mainly based on extraction of phrases, model representation as a compact document, and the similarity measures used for clustering. This paper investigates the computational aspect of the two algorithms, and the quality of results they produced.

algorithm, artificial intelligence, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/2010.5625688

1112.6222

Country:

North America > United States > New Jersey (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

On the equivalence of Hopfield Networks and Boltzmann Machines

Barra, Adriano, Bernacchia, Alberto, Santucci, Enrica, Contucci, Pierluigi

arXiv.org Artificial IntelligenceJan-10-2012

A specific type of neural network, the Restricted Boltzmann Machine (RBM), is implemented for classification and feature detection in machine learning. RBM is characterized by separate layers of visible and hidden units, which are able to learn efficiently a generative model of the observed data. We study a "hybrid" version of RBM's, in which hidden units are analog and visible units are binary, and we show that thermodynamics of visible units are equivalent to those of a Hopfield network, in which the N visible units are the neurons and the P hidden units are the learned patterns. We apply the method of stochastic stability to derive the thermodynamics of the model, by considering a formal extension of this technique to the case of multiple sets of stored patterns, which may act as a benchmark for the study of correlated sets. Our results imply that simulating the dynamics of a Hopfield network, requiring the update of N neurons and the storage of N(N-1)/2 synapses, can be accomplished by a hybrid Boltzmann Machine, requiring the update of N+P neurons but the storage of only NP synapses. In addition, the well known glass transition of the Hopfield network has a counterpart in the Boltzmann Machine: It corresponds to an optimum criterion for selecting the relative sizes of the hidden and visible layers, resolving the trade-off between flexibility and generality of the model. The low storage phase of the Hopfield model corresponds to few hidden units and hence a overly constrained RBM, while the spin-glass phase (too many hidden units) corresponds to unconstrained RBM prone to overfitting of the observed data.

artificial intelligence, hopfield network, neural network, (19 more...)

arXiv.org Artificial Intelligence

1105.279

Country:

North America > United States (0.14)
Europe > United Kingdom > England (0.14)
Europe > Italy (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

The Interaction of Entropy-Based Discretization and Sample Size: An Empirical Study

Bennett, Casey

arXiv.org Machine LearningJan-6-2012

An empirical investigation of the interaction of sample size and discretization - in this case the entropy-based method CAIM (Class-Attribute Interdependence Maximization) - was undertaken to evaluate the impact and potential bias introduced into data mining performance metrics due to variation in sample size as it impacts the discretization process. Of particular interest was the effect of discretizing within cross-validation folds averse to outside discretization folds. Previous publications have suggested that discretizing externally can bias performance results; however, a thorough review of the literature found no empirical evidence to support such an assertion. This investigation involved construction of over 117,000 models on seven distinct datasets from the UCI (University of California-Irvine) Machine Learning Library and multiple modeling methods across a variety of configurations of sample size and discretization, with each unique "setup" being independently replicated ten times. The analysis revealed a significant optimistic bias as sample sizes decreased and discretization was employed. The study also revealed that there may be a relationship between the interaction that produces such bias and the numbers and types of predictor attributes, extending the "curse of dimensionality" concept from feature selection into the discretization realm. Directions for further exploration are laid out, as well some general guidelines about the proper application of discretization in light of these results.

health & medicine, oncology, sample size, (18 more...)

arXiv.org Machine Learning

1201.145

Country:

North America > United States > California > Orange County > Irvine (0.24)
North America > United States > California > San Francisco County (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

Fusion de classifieurs pour la classification d'images sonar

Martin, Arnaud

arXiv.org Artificial IntelligenceJan-6-2012

In this paper, we present some high level information fusion approaches for numeric and symbolic data. We study the interest of such method particularly for classifier fusion. A comparative study is made in a context of sea bed characterization from sonar images. The classi- fication of kind of sediment is a difficult problem because of the data complexity. We compare high level information fusion and give the obtained performance.

artificial intelligence, fonction, information fusion, (12 more...)

arXiv.org Artificial Intelligence

0806.2006

Country: Europe > France (0.14)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.54)

Add feedback

STA4273H Fall 2011 Home Page

#artificialintelligenceJan-5-2012, 03:12:35 GMT

STA4273H : index

artificial intelligence, machine learning, sta4273h fall 2011

#artificialintelligence

Country: North America > Canada > Ontario > Toronto (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Classification under Data Contamination with Application to Remote Sensing Image Mis-registration

Yan, Donghui, Gong, Peng, Chen, Aiyou, Zhong, Liheng

arXiv.org Machine LearningJan-5-2012

This work is motivated by the problem of image mis-registration in remote sensing and we are interested in determining the resulting loss in the accuracy of pattern classification. A statistical formulation is given where we propose to use data contamination to model and understand the phenomenon of image mis-registration. This model is widely applicable to many other types of errors as well, for example, measurement errors and gross errors etc. The impact of data contamination on classification is studied under a statistical learning theoretical framework. A closed-form asymptotic bound is established for the resulting loss in classification accuracy, which is less than $\epsilon/(1-\epsilon)$ for data contamination of an amount of $\epsilon$. Our bound is sharper than similar bounds in the domain adaptation literature and, unlike such bounds, it applies to classifiers with an infinite Vapnik-Chervonekis (VC) dimension. Extensive simulations have been conducted on both synthetic and real datasets under various types of data contamination, including label flipping, feature swapping and the replacement of feature values with data generated from a random source such as a Gaussian or Cauchy distribution. Our simulation results show that the bound we derive is fairly tight.

artificial intelligence, bayesian inference, data contamination, (15 more...)

arXiv.org Machine Learning

1101.3594

Country: North America > United States > California > Alameda County > Berkeley (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Exploration and Development > Geophysical Analysis & Survey (0.63)

Add feedback