AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Ensembled Correlation Between Liver Analysis Outputs

Seker, Sadi Evren, Unal, Y., Erdem, Z., Kocer, H. Erdinc

arXiv.org Machine LearningJan-25-2014

Data mining techniques on the biological analysis are spreading for most of the areas including the health care and medical information. We have applied the data mining techniques, such as KNN, SVM, MLP or decision trees over a unique dataset, which is collected from 16,380 analysis results for a year. Furthermore we have also used meta-classifiers to question the increased correlation rate between the liver disorder and the liver analysis outputs. The results show that there is a correlation among ALT, AST, Billirubin Direct and Billirubin Total down to 15% of error rate. Also the correlation coefficient is up to 94%. This makes possible to predict the analysis results from each other or disease patterns can be applied over the linear correlation of the parameters.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1401.6597

Country:

Europe (1.00)
Asia > Middle East > Republic of Türkiye (0.69)

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

Add feedback

The Stochastic Gradient Descent for the Primal L1-SVM Optimization Revisited

Panagiotakopoulos, Constantinos, Tsampouka, Petroula

arXiv.org Artificial IntelligenceJan-25-2014

We reconsider the stochastic (sub)gradient approach to the unconstrained primal L1-SVM optimization. We observe that if the learning rate is inversely proportional to the number of steps, i.e., the number of times any training pattern is presented to the algorithm, the update rule may be transformed into the one of the classical perceptron with margin in which the margin threshold increases linearly with the number of steps. Moreover, if we cycle repeatedly through the possibly randomly permuted training set the dual variables defined naturally via the expansion of the weight vector as a linear combination of the patterns on which margin errors were made are shown to obey at the end of each complete cycle automatically the box constraints arising in dual optimization. This renders the dual Lagrangian a running lower bound on the primal objective tending to it at the optimum and makes available an upper bound on the relative accuracy achieved which provides a meaningful stopping criterion. In addition, we propose a mechanism of presenting the same pattern repeatedly to the algorithm which maintains the above properties. Finally, we give experimental evidence that algorithms constructed along these lines exhibit a considerably improved performance.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1304.6383

Country: Europe (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.87)

Add feedback

The EM algorithm and the Laplace Approximation

Brümmer, Niko

arXiv.org Machine LearningJan-24-2014

The Laplace approximation calls for the computation of second derivatives at the likelihood maximum. When the maximum is found by the EM algorithm, there is a convenient way to compute these derivatives. The likelihood gradient can be obtained from the EMauxiliary, while the Hessian can be obtained from this gradient with the Pearlmutter trick. Let X denote the observed data, H some hidden variables and Θ the model parameters. P (X, Θ) P (X, H, Θ) dH (2) has a more complex form.

artificial intelligence, hessian, machine learning, (11 more...)

arXiv.org Machine Learning

1401.6276

Country: Africa (0.15)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.62)

Add feedback

Community Detection in Networks using Graph Distance

Bhattacharyya, Sharmodeep, Bickel, Peter J.

arXiv.org Machine LearningJan-24-2014

The study of networks has received increased attention recently not only from the social sciences and statistics but also from physicists, computer scientists and mathematicians. One of the principal problem in networks is community detection. Many algorithms have been proposed for community finding but most of them do not have have theoretical guarantee for sparse networks and networks close to the phase transition boundary proposed by physicists. There are some exceptions but all have some incomplete theoretical basis. Here we propose an algorithm based on the graph distance of vertices in the network. We give theoretical guarantees that our method works in identifying communities for block models and can be extended for degree-corrected block models and block models with the number of communities growing with number of vertices. Despite favorable simulation results, we are not yet able to conclude that our method is satisfactory for worst possible case. We illustrate on a network of political blogs, Facebook networks and some other networks.

data mining, machine learning, vertex, (18 more...)

arXiv.org Machine Learning

1401.3915

Country: North America > United States (0.92)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Multimodal Distributional Semantics

Bruni, E., Tran, N. K., Baroni, M.

Journal of Artificial Intelligence ResearchJan-23-2014

Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. We address the lack of perceptual grounding of distributional models by exploiting computer vision techniques that automatically identify discrete visual words in images, so that the distributional representation of a word can be extended to also encompass its co-occurrence with the visual words of images it is associated with. We propose a flexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

information, representation, vector, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4135

AI Access Foundation

10857

Journal of Artificial Intelligence Research

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(38 more...)

Genre: Research Report > New Finding (0.46)

Industry: Transportation > Air (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Add feedback

Asymptotic Accuracy of Bayes Estimation for Latent Variables with Redundancy

Yamazaki, Keisuke

arXiv.org Machine LearningJan-23-2014

Hierarchical parametric models consisting of observable and latent variables are widely used for unsupervised learning tasks. For example, a mixture model is a representative hierarchical model for clustering. From the statistical point of view, the models can be regular or singular due to the distribution of data. In the regular case, the models have the identifiability; there is one-to-one relation between a probability density function for the model expression and the parameter. The Fisher information matrix is positive definite, and the estimation accuracy of both observable and latent variables has been studied. In the singular case, on the other hand, the models are not identifiable and the Fisher matrix is not positive definite. Conventional statistical analysis based on the inverse Fisher matrix is not applicable. Recently, an algebraic geometrical analysis has been developed and is used to elucidate the Bayes estimation of observable variables. The present paper applies this analysis to latent-variable estimation and determines its theoretical performance. Our results clarify behavior of the convergence of the posterior distribution. It is found that the posterior of the observable-variable estimation can be different from the one in the latent-variable estimation. Because of the difference, the Markov chain Monte Carlo method based on the parameter and the latent variable cannot construct the desired posterior distribution.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1205.3234

Country:

North America > United States (0.67)
Asia (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Add feedback

Finding the True Frequent Itemsets

Riondato, Matteo, Vandin, Fabio

arXiv.org Machine LearningJan-22-2014

Frequent Itemsets (FIs) mining is a fundamental primitive in data mining. It requires to identify all itemsets appearing in at least a fraction $\theta$ of a transactional dataset $\mathcal{D}$. Often though, the ultimate goal of mining $\mathcal{D}$ is not an analysis of the dataset \emph{per se}, but the understanding of the underlying process that generated it. Specifically, in many applications $\mathcal{D}$ is a collection of samples obtained from an unknown probability distribution $\pi$ on transactions, and by extracting the FIs in $\mathcal{D}$ one attempts to infer itemsets that are frequently (i.e., with probability at least $\theta$) generated by $\pi$, which we call the True Frequent Itemsets (TFIs). Due to the inherently stochastic nature of the generative process, the set of FIs is only a rough approximation of the set of TFIs, as it often contains a huge number of \emph{false positives}, i.e., spurious itemsets that are not among the TFIs. In this work we design and analyze an algorithm to identify a threshold $\hat{\theta}$ such that the collection of itemsets with frequency at least $\hat{\theta}$ in $\mathcal{D}$ contains only TFIs with probability at least $1-\delta$, for some user-specified $\delta$. Our method uses results from statistical learning theory involving the (empirical) VC-dimension of the problem at hand. This allows us to identify almost all the TFIs without including any false positive. We also experimentally compare our method with the direct mining of $\mathcal{D}$ at frequency $\theta$ and with techniques based on widely-used standard bounds (i.e., the Chernoff bounds) of the binomial distribution, and show that our algorithm outperforms these methods and achieves even better results than what is guaranteed by the theoretical analysis.

artificial intelligence, itemset, machine learning, (16 more...)

arXiv.org Machine Learning

1301.1218

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.89)

Add feedback

Collaborative Regression

Gross, Samuel M., Tibshirani, Robert

arXiv.org Machine LearningJan-22-2014

We consider the scenario where one observes an outcome variable and sets of features from multiple assays, all measured on the same set of samples. One approach that has been proposed for dealing with this type of data is ``sparse multiple canonical correlation analysis'' (sparse mCCA). All of the current sparse mCCA techniques are biconvex and thus have no guarantees about reaching a global optimum. We propose a method for performing sparse supervised canonical correlation analysis (sparse sCCA), a specific case of sparse mCCA when one of the datasets is a vector. Our proposal for sparse sCCA is convex and thus does not face the same difficulties as the other methods. We derive efficient algorithms for this problem, and illustrate their use on simulated and real data.

artificial intelligence, machine learning, optimization problem, (14 more...)

arXiv.org Machine Learning

1401.5823

Genre: Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

On the symmetrical Kullback-Leibler Jeffreys centroids

Nielsen, Frank

arXiv.org Machine LearningJan-22-2014

Due to the success of the bag-of-word modeling paradigm, clustering histograms has become an important ingredient of modern information processing. Clustering histograms can be performed using the celebrated $k$-means centroid-based algorithm. From the viewpoint of applications, it is usually required to deal with symmetric distances. In this letter, we consider the Jeffreys divergence that symmetrizes the Kullback-Leibler divergence, and investigate the computation of Jeffreys centroids. We first prove that the Jeffreys centroid can be expressed analytically using the Lambert $W$ function for positive histograms. We then show how to obtain a fast guaranteed approximation when dealing with frequency histograms. Finally, we conclude with some remarks on the $k$-means histogram clustering.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

doi: 10.1109/LSP.2013.2260538

1303.7286

Country: Asia > Japan (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Guaranteed Model Order Estimation and Sample Complexity Bounds for LDA

Gutiérrez, E. D.

arXiv.org Machine LearningJan-21-2014

The question of how to determine the number of independent latent factors, or topics, in Latent Dirichlet Allocation (LDA) is of great practical importance. In most applications, the exact number of topics is unknown, and depends on the application and the size of the data set. We introduce a spectral model selection procedure for topic number estimation that does not require learning the model's latent parameters beforehand and comes with probabilistic guarantees. The procedure is motivated by the spectral learning approach and relies on adaptations of results from random matrix theory. In a simulation experiment taken from the nonparametric Bayesian literature, this procedure outperforms the nonparametric Bayesian approach in both accuracy and speed. We also discuss some implications of our results for the sample complexity and accuracy of popular spectral learning algorithms for LDA. The principles underlying the procedure can be extended to spectral learning algorithms for other exchangeable mixture models with similar conditional independence properties.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1312.2646

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback