Industry
Detecting lateral genetic material transfer
Calderón, C., Delaye, L., Mireles, V., Miramontes, P.
The bioinformatical methods to detect lateral gene transfer events are mainly based on functional coding DNA characteristics. In this paper, we propose the use of DNA traits not depending on protein coding requirements. We introduce several semilocal variables that depend on DNA primary sequence and that reflect thermodynamic as well as physico-chemical magnitudes that are able to tell apart the genome of different organisms. After combining these variables in a neural classificator, we obtain results whose power of resolution go as far as to detect the exchange of genomic material between bacteria that are phylogenetically close.
Evolutionary Computation in Astronomy and Astrophysics: A Review
Gutiérrez, José A. García, Cotta, Carlos, Fernández-Leiva, Antonio J.
In general Evolutionary Computation (EC) includes a number of optimization methods inspired by biological mechanisms of evolution. The methods catalogued in this area use the Darwinian principles of life evolution to produce algorithms that returns high quality solutions to hard-to-solve optimization problems. The main strength of EC is precisely that they provide good solutions even if the computational resources (e.g., running time) are limited. Astronomy and Astrophysics are two fields that often require optimizing problems of high complexity or analyzing a huge amount of data and the so-called complete optimization methods are inherently limited by the size of the problem/data. For instance, reliable analysis of large amounts of data is central to modern astrophysics and astronomical sciences in general. EC techniques perform well where other optimization methods are inherently limited (as complete methods applied to NP-hard problems), and in the last ten years, numerous proposals have come up that apply with greater or lesser success methodologies of evolutional computation to common engineering problems. Some of these problems, such as the estimation of non-lineal parameters, the development of automatic learning techniques, the implementation of control systems, or the resolution of multi-objective optimization problems, have had (and have) a special repercussion in the fields. For these reasons EC emerges as a feasible alternative for traditional methods. In this paper, we discuss some promising applications in this direction and a number of recent works in this area; the paper also includes a general description of EC to provide a global perspective to the reader and gives some guidelines of application of EC techniques for future research
Robust Spatio-Temporal Signal Recovery from Noisy Counts in Social Media
Xu, Jun-Ming, Bhargava, Aniruddha, Nowak, Robert, Zhu, Xiaojin
Many real-world phenomena can be represented by a spatio-temporal signal: where, when, and how much. Social media is a tantalizing data source for those who wish to monitor such signals. Unlike most prior work, we assume that the target phenomenon is known and we are given a method to count its occurrences in social media. However, counting is plagued by sample bias, incomplete data, and, paradoxically, data scarcity -- issues inadequately addressed by prior work. We formulate signal recovery as a Poisson point process estimation problem. We explicitly incorporate human population bias, time delays and spatial distortions, and spatio-temporal regularization into the model to address the noisy count issues. We present an efficient optimization algorithm and discuss its theoretical properties. We show that our model is more accurate than commonly-used baselines. Finally, we present a case study on wildlife roadkill monitoring, where our model produces qualitatively convincing results.
Coherence Functions with Applications in Large-Margin Classification Methods
Zhang, Zhihua, Dai, Guang, Jordan, Michael I.
Support vector machines (SVMs) naturally embody sparseness due to their use of hinge loss functions. However, SVMs can not directly estimate conditional class probabilities. In this paper we propose and study a family of coherence functions, which are convex and differentiable, as surrogates of the hinge function. The coherence function is derived by using the maximum-entropy principle and is characterized by a temperature parameter. It bridges the hinge function and the logit function in logistic regression. The limit of the coherence function at zero temperature corresponds to the hinge function, and the limit of the minimizer of its expected error is the minimizer of the expected error of the hinge loss. We refer to the use of the coherence function in large-margin classification as C-learning, and we present efficient coordinate descent algorithms for the training of regularized ${\cal C}$-learning models.
Publishing Identifiable Experiment Code And Configuration Is Important, Good and Easy
Vaughan, Richard, Wawerla, Jens
A few months ago, a graduate student in another country called me (Vaughan) to ask for the source code of one of my multi-robot simulation experiments. The student had an idea for a modification that she thought would improve the system's performance. By the standards of scientific practice this was a perfectly reasonable request and I felt obliged to give it to her. With our original code, the student could (i) rerun our experiments to verify that we reported the results correctly; (ii) inspect the code to make sure that it actually implements the algorithm described in our paper; (iii) change parameters and initial conditions to make sure our results were not a fluke of the particular experimental setting; (iv) modify the robot controllers and quantitatively compare her new method with our originals. It would cost me nothing to make her a copy of our code, and her methodology would be impeccable. Why then do we read so few papers using this methodology? It turned out to be impossible to identify exactly which code was used to perform the experiments in our years-old paper. We had not labeled the source code at that moment, and it had subsequently been modified. All the code was under version control, so we could obtain approximately the right code by looking at revision dates.
Applications of fuzzy logic to Case-Based Reasoning
Subbotin, Igor Ya., Voskoglou, Michael Gr.
Broadly construed Case-Based Reasoning (CBR) is the process of solving new problems based on the solution of past problems. The CBR systems' expertise is embodied in a collection (library) of past cases rather, than being encoded in classical rules. Each case typically contains a description of the problem plus a solution and/or the outcomes. When a problem is successfully solved, the experience is retained in order to solve similar problems in future. When an attempt to solve a problem fails, the reason for the failure is identified and remembered in order to avoid the same mistake in future. Thus CBR is a cyclic and integrated process of solving a problem, learning from this experience, solving a new problem, etc.
UCB Algorithm for Exponential Distributions
Jouini, Wassim, Moy, Christophe
Several sequential decision making problems face a dilemma between the exploration of a space of choices, or solutions, and the exploitation of the information available to the decision maker. The problem described herein is known as sequential decision making under uncertainty. In this paper we focus on a subclass of this problem, where the decision maker has a discrete set of stateless choices and the added information is a real valued sequence (of feedbacks, or rewards) that quantifies how well the decision maker behaved in the previous time steps. This particular instance of sequential decision making problems is generally known as the multi-armed bandit (MAB) problem [1], [2]. A common approach to solving the exploration versus exploitation dilemma within MAB problems consists in assigning an utility value to every arm.
The threshold EM algorithm for parameter learning in bayesian network with incomplete data
Lamine, Fradj Ben, Kalti, Karim, Mahjoub, Mohamed Ali
Bayesian networks (BN) are used in a big range of applications but they have one issue concerning parameter learning. In real application, training data are always incomplete or some nodes are hidden. To deal with this problem many learning parameter algorithms are suggested foreground EM, Gibbs sampling and RBE algorithms. In order to limit the search space and escape from local maxima produced by executing EM algorithm, this paper presents a learning parameter algorithm that is a fusion of EM and RBE algorithms. This algorithm incorporates the range of a parameter into the EM algorithm. This range is calculated by the first step of RBE algorithm allowing a regularization of each parameter in bayesian network after the maximization step of the EM algorithm. The threshold EM algorithm is applied in brain tumor diagnosis and show some advantages and disadvantages over the EM algorithm.
Clustering and Bayesian network for image of faces classification
Jayech, Khlifia, Mahjoub, Mohamed Ali
In a content based image classification system, target images are sorted by feature similarities with respect to the query (CBIR). In this paper, we propose to use new approach combining distance tangent, k-means algorithm and Bayesian network for image classification. First, we use the technique of tangent distance to calculate several tangent spaces representing the same image. The objective is to reduce the error in the classification phase. Second, we cut the image in a whole of blocks. For each block, we compute a vector of descriptors. Then, we use K-means to cluster the low-level features including color and texture information to build a vector of labels for each image. Finally, we apply five variants of Bayesian networks classifiers (Na\"ive Bayes, Global Tree Augmented Na\"ive Bayes (GTAN), Global Forest Augmented Na\"ive Bayes (GFAN), Tree Augmented Na\"ive Bayes for each class (TAN), and Forest Augmented Na\"ive Bayes for each class (FAN) to classify the image of faces using the vector of labels. In order to validate the feasibility and effectively, we compare the results of GFAN to FAN and to the others classifiers (NB, GTAN, TAN). The results demonstrate FAN outperforms than GFAN, NB, GTAN and TAN in the overall classification accuracy.
Machine Cognition Models: EPAM and GPS
Through history, the human being tried to relay its daily tasks to other creatures, which was the main reason behind the rise of civilizations. It started with deploying animals to automate tasks in the field of agriculture(bulls), transportation (e.g. horses and donkeys), and even communication (pigeons). Millenniums after, come the Golden age with "Al-jazari" and other Muslim inventors, which were the pioneers of automation, this has given birth to industrial revolution in Europe, centuries after. At the end of the nineteenth century, a new era was to begin, the computational era, the most advanced technological and scientific development that is driving the mankind and the reason behind all the evolutions of science; such as medicine, communication, education, and physics. At this edge of technology engineers and scientists are trying to model a machine that behaves the same as they do, which pushed us to think about designing and implementing "Things that-Thinks", then artificial intelligence was. In this work we will cover each of the major discoveries and studies in the field of machine cognition, which are the "Elementary Perceiver and Memorizer"(EPAM) and "The General Problem Solver"(GPS). The First one focus mainly on implementing the human-verbal learning behavior, while the second one tries to model an architecture that is able to solve problems generally (e.g. theorem proving, chess playing, and arithmetic). We will cover the major goals and the main ideas of each model, as well as comparing their strengths and weaknesses, and finally giving their fields of applications. And Finally, we will suggest a real life implementation of a cognitive machine.