Technology
A Multivariate Regression Approach to Association Analysis of Quantitative Trait Network
Kim, Seyoung, Sohn, Kyung-Ah, Xing, Eric P.
Many complex disease syndromes such as asthma consist of a large number of highly related, rather than independent, clinical phenotypes, raising a new technical challenge in identifying genetic variations associated simultaneously with correlated traits. In this study, we propose a new statistical framework called graph-guided fused lasso (GFlasso) to address this issue in a principled way. Our approach explicitly represents the dependency structure among the quantitative traits as a network, and leverages this trait network to encode structured regularizations in a multivariate regression model over the genotypes and traits, so that the genetic markers that jointly influence subgroups of highly correlated traits can be detected with high sensitivity and specificity. While most of the traditional methods examined each phenotype independently and combined the results afterwards, our approach analyzes all of the traits jointly in a single statistical method, and borrow information across correlated phenotypes to discover the genetic markers that perturbe a subset of correlated triats jointly rather than a single trait. Using simulated datasets based on the HapMap consortium data and an asthma dataset, we compare the performance of our method with the single-marker analysis, and other sparse regression methods such as the ridge regression and the lasso that do not use any structural information in the traits. Our results show that there is a significant advantage in detecting the true causal SNPs when we incorporate the correlation pattern in traits using our proposed methods.
Robustness and Regularization of Support Vector Machines
Xu, Huan, Caramanis, Constantine, Mannor, Shie
We consider regularized support vector machines (SVMs) and show that they are precisely equivalent to a new robust optimization formulation. We show that this equivalence of robust optimization and regularization has implications for both algorithms, and analysis. In terms of algorithms, the equivalence suggests more general SVM-like algorithms for classification that explicitly build in protection to noise, and at the same time control overfitting. On the analysis front, the equivalence of robustness and regularization, provides a robust optimization interpretation for the success of regularized SVMs. We use the this new robustness interpretation of SVMs to give a new proof of consistency of (kernelized) SVMs, thus establishing robustness as the reason regularized SVMs generalize well.
Artificial Intelligence Techniques for Steam Generator Modelling
Wright, Sarah, Marwala, Tshilidzi
This paper investigates the use of different Artificial Intelligence methods to predict the values of several continuous variables from a Steam Generator. The objective was to determine how the different artificial intelligence methods performed in making predictions on the given dataset. The artificial intelligence methods evaluated were Neural Networks, Support Vector Machines, and Adaptive Neuro-Fuzzy Inference Systems. The types of neural networks investigated were Multi-Layer Perceptions, and Radial Basis Function. Bayesian and committee techniques were applied to these neural networks. Each of the AI methods considered was simulated in Matlab. The results of the simulations showed that all the AI methods were capable of predicting the Steam Generator data reasonably accurately. However, the Adaptive Neuro-Fuzzy Inference system out performed the other methods in terms of accuracy and ease of implementation, while still achieving a fast execution time as well as a reasonable training time.
Airport Gate Assignment: New Model and Implementation
Airport gate assignment is of great importance in airport operations. In this paper, we study the Airport Gate Assignment Problem (AGAP), propose a new model and implement the model with Optimization Programming language (OPL). With the objective to minimize the number of conflicts of any two adjacent aircrafts assigned to the same gate, we build a mathematical model with logical constraints and the binary constraints, which can provide an efficient evaluation criterion for the Airlines to estimate the current gate assignment. To illustrate the feasibility of the model we construct experiments with the data obtained from Continental Airlines, Houston Gorge Bush Intercontinental Airport IAH, which indicate that our model is both energetic and effective. Moreover, we interpret experimental results, which further demonstrate that our proposed model can provide a powerful tool for airline companies to estimate the efficiency of their current work of gate assignment.
Improved Estimation of High-dimensional Ising Models
We consider the problem of jointly estimating the parameters as well as the structure of binary valued Markov Random Fields, in contrast to earlier work that focus on one of the two problems. We formulate the problem as a maximization of $\ell_1$-regularized surrogate likelihood that allows us to find a sparse solution. Our optimization technique efficiently incorporates the cutting-plane algorithm in order to obtain a tighter outer bound on the marginal polytope, which results in improvement of both parameter estimates and approximation to marginals. On synthetic data, we compare our algorithm on the two estimation tasks to the other existing methods. We analyze the method in the high-dimensional setting, where the number of dimensions $p$ is allowed to grow with the number of observations $n$. The rate of convergence of the estimate is demonstrated to depend explicitly on the sparsity of the underlying graph.
Classification dynamique d'un flux documentaire : une \'evaluation statique pr\'ealable de l'algorithme GERMEN
Lelu, Alain, Cuxac, Pascal, Johansson, Joel
Data-stream clustering is an ever-expanding subdomain of knowledge extraction. Most of the past and present research effort aims at efficient scaling up for the huge data repositories. Our approach focuses on qualitative improvement, mainly for "weak signals" detection and precise tracking of topical evolutions in the framework of information watch - though scalability is intrinsically guaranteed in a possibly distributed implementation. Our GERMEN algorithm exhaustively picks up the whole set of density peaks of the data at time t, by identifying the local perturbations induced by the current document vector, such as changing cluster borders, or new/vanishing clusters. Optimality yields from the uniqueness 1) of the density landscape for any value of our zoom parameter, 2) of the cluster allocation operated by our border propagation rule. This results in a rigorous independence from the data presentation ranking or any initialization parameter. We present here as a first step the only assessment of a static view resulting from one year of the CNRS/INIST Pascal database in the field of geotechnics.
Hierarchical structure and the prediction of missing links in networks
Clauset, Aaron, Moore, Cristopher, Newman, M. E. J.
Networks have in recent years emerged as an invaluable tool for describing and quantifying complex systems in many branches of science. Recent studies suggest that networks often exhibit hierarchical organization, where vertices divide into groups that further subdivide into groups of groups, and so forth over multiple scales. In many cases these groups are found to correspond to known functional units, such as ecological niches in food webs, modules in biochemical networks (protein interaction networks, metabolic networks, or genetic regulatory networks), or communities in social networks. Here we present a general technique for inferring hierarchical structure from network data and demonstrate that the existence of hierarchy can simultaneously explain and quantitatively reproduce many commonly observed topological properties of networks, such as right-skewed degree distributions, high clustering coefficients, and short path lengths. We further show that knowledge of hierarchical structure can be used to predict missing connections in partially known networks with high accuracy, and for more general network structures than competing techniques. Taken together, our results suggest that hierarchy is a central organizing principle of complex networks, capable of offering insight into many network phenomena.
Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends
Lelu, Alain, Cadot, Martine, Cuxac, Pascal
We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.
Cooperative interface of a swarm of UAVs
Saget, Sylvie, Legras, Francois, Coppin, Gilles
After presenting the broad context of authority sharing, we outline how introducing more natural interaction in the design of the ground operator interface of UV systems should help in allowing a single operator to manage the complexity of his/her task. Introducing new modalities is one one of the means in the realization of our vision of next- generation GOI. A more fundamental aspect resides in the interaction manager which should help balance the workload of the operator between mission and interaction, notably by applying a multi-strategy approach to generation and interpretation. We intend to apply these principles to the context of the Smaart prototype, and in this perspective, we illustrate how to characterize the workload associated with a particular operational situation.
Edhibou: a Customizable Interface for Decision Support in a Semantic Portal
Badra, Fadi, D'Aquin, Mathieu, Lieber, Jean, Meilender, Thomas
The Semantic Web is becoming more and more a reality, as the required technologies have reached an appropriate level of maturity. However, at this stage, it is important to provide tools facilitating the use and deployment of these technologies by end-users. In this paper, we describe EdHibou, an automatically generated, ontology-based graphical user interface that integrates in a semantic portal. The particularity of EdHibou is that it makes use of OWL reasoning capabilities to provide intelligent features, such as decision support, upon the underlying ontology. We present an application of EdHibou to medical decision support based on a formalization of clinical guidelines in OWL and show how it can be customized thanks to an ontology of graphical components.