Learning Graphical Models
Evaluating Variable Length Markov Chain Models for Analysis of User Web Navigation Sessions
Markov models have been widely used to represent and analyse user web navigation data. In previous work we have proposed a method to dynamically extend the order of a Markov chain model and a complimentary method for assessing the predictive power of such a variable length Markov chain. Herein, we review these two methods and propose a novel method for measuring the ability of a variable length Markov model to summarise user web navigation sessions up to a given length. While the summarisation ability of a model is important to enable the identification of user navigation patterns, the ability to make predictions is important in order to foresee the next link choice of a user after following a given trail so as, for example, to personalise a web site. We present an extensive experimental evaluation providing strong evidence that prediction accuracy increases linearly with summarisation ability.
New Millennium AI and the Convergence of History
Artificial Intelligence (AI) has recently become a real formal science: the new millennium brought the first mathematically sound, asymptotically optimal, universal problem solvers, providing a new, rigorous foundation for the previously largely heuristic field of General AI and embedded agents. At the same time there has been rapid progress in practical methods for learning true sequence-processing programs, as opposed to traditional methods limited to stationary pattern association. Here we will briefly review some of the new results, and speculate about future developments, pointing out that the time intervals between the most notable events in over 40,000 years or 2^9 lifetimes of human history have sped up exponentially, apparently converging to zero within the next few decades. Or is this impression just a by-product of the way humans allocate memory space to past events?
Semi-Supervised Learning -- A Statistical Physics Approach
Getz, Gad, Shental, Noam, Domany, Eytan
We present a novel approach to semi-supervised learning which is based on statistical physics. Most of the former work in the field of semi-supervised learning classifies the points by minimizing a certain energy function, which corresponds to a minimal k-way cut solution. In contrast to these methods, we estimate the distribution of classifications, instead of the sole minimal k-way cut, which yields more accurate and robust results. Our approach may be applied to all energy functions used for semi-supervised learning. The method is based on sampling using a Mul-ticanonical Markov chain Monte-Carlo algorithm, and has a straightforward probabilistic interpretation, which allows for soft assignments of points to classes, and also to cope with yet unseen class types. The suggested approach is demonstrated on a toy data set and on two real-life data sets of gene expression.
Asymptotic Learnability of Reinforcement Problems with Arbitrary Dependence
Ryabko, Daniil, Hutter, Marcus
We address the problem of reinforcement learning in which observations may exhibit an arbitrary form of stochastic dependence on past observations and actions. The task for an agent is to attain the best possible asymptotic reward where the true generating environment is unknown but belongs to a known countable family of environments. We find some sufficient conditions on the class of environments under which an agent exists which attains the best asymptotic reward for any environment in the class. We analyze how tight these conditions are and how they relate to different probabilistic assumptions known in reinforcement learning and related fields, such as Markov Decision Processes and mixing conditions.
Fast Lexically Constrained Viterbi Algorithm (FLCVA): Simultaneous Optimization of Speed and Memory
Lifchitz, Alain, Maire, Frederic, Revuz, Dominique
Lexical constraints on the input of speech and on-line handwriting systems improve the performance of such systems. A significant gain in speed can be achieved by integrating in a digraph structure the different Hidden Markov Models (HMM) corresponding to the words of the relevant lexicon. This integration avoids redundant computations by sharing intermediate results between HMM's corresponding to different words of the lexicon. In this paper, we introduce a token passing method to perform simultaneously the computation of the a posteriori probabilities of all the words of the lexicon. The coding scheme that we introduce for the tokens is optimal in the information theory sense. The tokens use the minimum possible number of bits. Overall, we optimize simultaneously the execution speed and the memory requirement of the recognition systems.
PageRank without hyperlinks: Structural re-ranking using links induced by language models
Inspired by the PageRank and HITS (hubs and authorities) algorithms for Web search, we propose a structural re-ranking approach to ad hoc information retrieval: we reorder the documents in an initially retrieved set by exploiting asymmetric relationships between them. Specifically, we consider generation links, which indicate that the language model induced from one document assigns high probability to the text of another; in doing so, we take care to prevent bias against long documents. We study a number of re-ranking criteria based on measures of centrality in the graphs formed by generation links, and show that integrating centrality into standard language-model-based retrieval is quite effective at improving precision at top ranks.
Parameter Estimation of Hidden Diffusion Processes: Particle Filter vs. Modified Baum-Welch Algorithm
We propose a new method for the estimation of parameters of hidden diffusion processes. Based on parametrization of the transition matrix, the Baum-Welch algorithm is improved. The algorithm is compared to the particle filter in application to the noisy periodic systems. It is shown that the modified Baum-Welch algorithm is capable of estimating the system parameters with better accuracy than particle filters.
An elitist approach for extracting automatically well-realized speech sounds with high confidence
Maj, Jean-Baptiste, Bonneau, Anne, Fohr, Dominique, Laprie, Yves
This paper presents an'elitist approach' for extracting au tomat-ically well-realized speech sounds with high confidence. Th e elitist approach uses a speech recognition system based on H id-den Markov Models (HMM). The HMM are trained on speech sounds which are systematically well-detected in an iterat ive procedure. The results show that, by using the HMM models defined in the training phase, the speech recognizer detects reliably specific speech sounds with a small rate of errors.
Identifying Interaction Sites in "Recalcitrant" Proteins: Predicted Protein and Rna Binding Sites in Rev Proteins of Hiv-1 and Eiav Agree with Experimental Data
Terribilini, Michael, Lee, Jae-Hyung, Yan, Changhui, Jernigan, Robert L., Carpenter, Susan, Honavar, Vasant, Dobbs, Drena
HIV-1 Rev is one of several clinically important proteins that are "experimentally recalcitrant," i.e., for which it has not been possible to obtain high resolution structural in formation. Identifying critic al functional residues in Rev is further complicated by the fact that Rev proteins have no significant sequence similarity to any protein with known structure, and that Rev sequences from different species have very little similarity to one another. Our comparison of predictions with experimental data on the Rev proteins from HIV-1 and EIAV demonstrates that sequence-based computational methods can identify residues in "recalcitrant" proteins that interact with other proteins or nucleic acids. When structural information is available for a protein of interest, enhanced prediction accuracy can be achieved (18, 29). Developing improved methods for predicting binding sites will contribute to our understanding of how proteins recognize their targets in cells and may significantly decrease the time needed to precisely map binding sites in the laboratory. The level of accuracy obtained using the sequence-based methods presented here suggests that they could expedite the design of experiments to explore the function of key regulatory proteins, even when no structural information is available, with obvious implications for developing new therapies for both genetic and infectious diseases. Acknowledgments This Research was supported in part by grants NIH, GM 066387 (VH, DD, & RLJ) and CA97936 (SC), by an ISU Center for Integrated Animal Genomics grant (DD, VH & RLJ), and by USDA Formula Funds (SC & DD). We thank Sijun Liu for technical assistance and Jeffrey Sander for useful comments.
When Ignorance is Bliss
Grunwald, Peter D., Halpern, Joseph Y.
It is commonly-accepted wisdom that more information is better, and that information should never be ignored. Here we argue, using both a Bayesian and a non-Bayesian analysis, that in some situations you are better off ignoring information if your uncertainty is represented by a set of probability measures. These include situations in which the information is relevant for the prediction task at hand. In the non-Bayesian analysis, we show how ignoring information avoids dilation, the phenomenon that additional pieces of information sometimes lead to an increase in uncertainty. In the Bayesian analysis, we show that for small sample sizes and certain prediction tasks, the Bayesian posterior based on a non-informative prior yields worse predictions than simply ignoring the given information.