Goto

Collaborating Authors

 Country


Towards a Mathematical Foundation of Immunology and Amino Acid Chains

arXiv.org Machine Learning

We attempt to set a mathematical foundation of immunology and amino acid chains. To measure the similarities of these chains, a kernel on strings is defined using only the sequence of the chains and a good amino acid substitution matrix (e.g. BLOSUM62). The kernel is used in learning machines to predict binding affinities of peptides to human leukocyte antigens DR (HLA-DR) molecules. On both fixed allele (Nielsen and Lund 2009) and pan-allele (Nielsen et.al. 2010) benchmark databases, our algorithm achieves the state-of-the-art performance. The kernel is also used to define a distance on an HLA-DR allele set based on which a clustering analysis precisely recovers the serotype classifications assigned by WHO (Nielsen and Lund 2009, and Marsh et.al. 2010). These results suggest that our kernel relates well the chain structure of both peptides and HLA-DR molecules to their biological functions, and that it offers a simple, powerful and promising methodology to immunology and amino acid chain studies.


Keyphrase Based Arabic Summarizer (KPAS)

arXiv.org Artificial Intelligence

This paper describes a computationally inexpensive and efficient generic summarization algorithm for Arabic texts. The algorithm belongs to extractive summarization family, which reduces the problem into representative sentences identification and extraction sub-problems. Important keyphrases of the document to be summarized are identified employing combinations of statistical and linguistic features. The sentence extraction algorithm exploits keyphrases as the primary attributes to rank a sentence. The present experimental work, demonstrates different techniques for achieving various summarization goals including: informative richness, coverage of both main and auxiliary topics, and keeping redundancy to a minimum. A scoring scheme is then adopted that balances between these summarization goals. To evaluate the resulted Arabic summaries with well-established systems, aligned English/Arabic texts are used through the experiments.


Analysis of a Nature Inspired Firefly Algorithm based Back-propagation Neural Network Training

arXiv.org Artificial Intelligence

Optimization algorithms are normally influenced by meta-heuristic approach. In recent years several hybrid methods for optimization are developed to find out a better solution. The proposed work using meta-heuristic Nature Inspired algorithm is applied with back-propagation method to train a feed-forward neural network. Firefly algorithm is a nature inspired meta-heuristic algorithm, and it is incorporated into back-propagation algorithm to achieve fast and improved convergence rate in training feed-forward neural network. The proposed technique is tested over some standard data set. It is found that proposed method produces an improved convergence within very few iteration. This performance is also analyzed and compared to genetic algorithm based back-propagation. It is observed that proposed method consumes less time to converge and providing improved convergence rate with minimum feed-forward neural network design.


Hidden Markov Models with mixtures as emission distributions

arXiv.org Machine Learning

In unsupervised classification, Hidden Markov Models (HMM) are used to account for a neighborhood structure between observations. The emission distributions are often supposed to belong to some parametric family. In this paper, a semiparametric modeling where the emission distributions are a mixture of parametric distributions is proposed to get a higher flexibility. We show that the classical EM algorithm can be adapted to infer the model parameters. For the initialisation step, starting from a large number of components, a hierarchical method to combine them into the hidden states is proposed. Three likelihood-based criteria to select the components to be combined are discussed. To estimate the number of hidden states, BIC-like criteria are derived. A simulation study is carried out both to determine the best combination between the merging criteria and the model selection criteria and to evaluate the accuracy of classification. The proposed method is also illustrated using a biological dataset from the model plant Arabidopsis thaliana. A R package HMMmix is freely available on the CRAN.


Bayesian Structure Learning for Markov Random Fields with a Spike and Slab Prior

arXiv.org Machine Learning

In recent years a number of methods have been developed for automatically learning the (sparse) connectivity structure of Markov Random Fields. These methods are mostly based on L1-regularized optimization which has a number of disadvantages such as the inability to assess model uncertainty and expensive cross-validation to find the optimal regularization parameter. Moreover, the model's predictive performance may degrade dramatically with a suboptimal value of the regularization parameter (which is sometimes desirable to induce sparseness). We propose a fully Bayesian approach based on a "spike and slab" prior (similar to L0 regularization) that does not suffer from these shortcomings. We develop an approximate MCMC method combining Langevin dynamics and reversible jump MCMC to conduct inference in this model. Experiments show that the proposed model learns a good combination of the structure and parameter values without the need for separate hyper-parameter tuning. Moreover, the model's predictive performance is much more robust than L1-based methods with hyper-parameter settings that induce highly sparse model structures.


Leaf vein segmentation using Odd Gabor filters and morphological operations

arXiv.org Artificial Intelligence

Leaf vein forms the basis of leaf characterization and classification. Different species have different leaf vein patterns. It is seen that leaf vein segmentation will help in maintaining a record of all the leaves according to their specific pattern of veins thus provide an effective way to retrieve and store information regarding various plant species in database as well as provide an effective means to characterize plants on the basis of leaf vein structure which is unique for every species. The algorithm proposes a new way of segmentation of leaf veins with the use of Odd Gabor filters and the use of morphological operations for producing a better output. The Odd Gabor filter gives an efficient output and is robust and scalable as compared with the existing techniques as it detects the fine fiber like veins present in leaves much more efficiently.


Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

arXiv.org Artificial Intelligence

An automated technique has recently been proposed to transfer learning in the hierarchical Bayesian optimization algorithm (hBOA) based on distance-based statistics. The technique enables practitioners to improve hBOA efficiency by collecting statistics from probabilistic models obtained in previous hBOA runs and using the obtained statistics to bias future hBOA runs on similar problems. The purpose of this paper is threefold: (1) test the technique on several classes of NP-complete problems, including MAXSAT, spin glasses and minimum vertex cover; (2) demonstrate that the technique is effective even when previous runs were done on problems of different size; (3) provide empirical evidence that combining transfer learning with other efficiency enhancement techniques can often yield nearly multiplicative speedups.


Multi-Level Error-Resilient Neural Networks with Learning

arXiv.org Artificial Intelligence

The problem of neural network association is to retrieve a previously memorized pattern from its noisy version using a network of neurons. An ideal neural network should include three components simultaneously: a learning algorithm, a large pattern retrieval capacity and resilience against noise. Prior works in this area usually improve one or two aspects at the cost of the third. Our work takes a step forward in closing this gap. More specifically, we show that by forcing natural constraints on the set of learning patterns, we can drastically improve the retrieval capacity of our neural network. Moreover, we devise a learning algorithm whose role is to learn those patterns satisfying the above mentioned constraints. Finally we show that our neural network can cope with a fair amount of noise.


Foundations of Inference

arXiv.org Artificial Intelligence

We present a simple and clear foundation for finite inference that unites and significantly extends the approaches of Kolmogorov and Cox. Our approach is based on quantifying lattices of logical statements in a way that satisfies general lattice symmetries. With other applications such as measure theory in mind, our derivations assume minimal symmetries, relying on neither negation nor continuity nor differentiability. Each relevant symmetry corresponds to an axiom of quantification, and these axioms are used to derive a unique set of quantifying rules that form the familiar probability calculus. We also derive a unique quantification of divergence, entropy and information.


Statistical Translation, Heat Kernels and Expected Distances

arXiv.org Machine Learning

High dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modeling. The standard histogram representation suffers from high variance and performs poorly in general. We explore novel connections between statistical translation, heat kernels on manifolds and graphs, and expected distances. These connections provide a new framework for unsupervised metric learning for text documents. Experiments indicate that the resulting distances are generally superior to their more standard counterparts.