Country
A multiagent urban traffic simulation. Part II: dealing with the extraordinary
Daudé, Eric, Tranouez, Pierrick, Langlois, Patrice
In Probabilistic Risk Management, risk is characterized by two quantities: the magnitude (or severity) of the adverse consequences that can potentially result from the given activity or action, and by the likelihood of occurrence of the given adverse consequences. But a risk seldom exists in isolation: chain of consequences must be examined, as the outcome of one risk can increase the likelihood of other risks. Systemic theory must complement classic PRM. Indeed these chains are composed of many different elements, all of which may have a critical importance at many different levels. Furthermore, when urban catastrophes are envisioned, space and time constraints are key determinants of the workings and dynamics of these chains of catastrophes: models must include a correct spatial topology of the studied risk. Finally, literature insists on the importance small events can have on the risk on a greater scale: urban risks management models belong to self-organized criticality theory. We chose multiagent systems to incorporate this property in our model: the behavior of an agent can transform the dynamics of important groups of them.
SparseCodePicking: feature extraction in mass spectrometry using sparse coding algorithms
Alexandrov, Theodore, Steinhorst, Klaus, Keszoecze, Oliver, Schiffler, Stefan
Mass spectrometry (MS) is an important technique for chemical profiling which calculates for a sample a high dimensional histogram-like spectrum. A crucial step of MS data processing is the peak picking which selects peaks containing information about molecules with high concentrations which are of interest in an MS investigation. We present a new procedure of the peak picking based on a sparse coding algorithm. Given a set of spectra of different classes, i.e. with different positions and heights of the peaks, this procedure can extract peaks by means of unsupervised learning. Instead of an $l_1$-regularization penalty term used in the original sparse coding algorithm we propose using an elastic-net penalty term for better regularization. The evaluation is done by means of simulation. We show that for a large region of parameters the proposed peak picking method based on the sparse coding features outperforms a mean spectrum-based method. Moreover, we demonstrate the procedure applying it to two real-life datasets.
Statistical Decision Making for Authentication and Intrusion Detection
Dimitrakakis, Christos, Mitrokotsa, Aikaterini
Classification is the problem of categorising data in one of two or more possible classes. In the classical supervised learning framework, examples of each class have already been obtained and the task of the decision maker is to accurately categorise new observations, whose class is unknown. The accuracy is either measured in terms of the rate of misclassification, or in terms of the average cost, for problems where different types of errors carry different costs. In that setting, the problem has three phases: (a) the collection of training data, (b) the estimation of a decision rule based on the training data and (c) the application 1 of the decision rule to new data. Typically, the decision rule remains fixed after the second step.
On the conditions used to prove oracle results for the Lasso
van de Geer, Sara A., Bühlmann, Peter
Oracle inequalities and variable selection properties for the Lasso in linear models have been established under a variety of different assumptions on the design matrix. We show in this paper how the different conditions and concepts relate to each other. The restricted eigenvalue condition (Bickel et al., 2009) or the slightly weaker compatibility condition (van de Geer, 2007) are sufficient for oracle results. We argue that both these conditions allow for a fairly general class of design matrices. Hence, optimality of the Lasso for prediction and estimation holds for more general situations than what it appears from coherence (Bunea et al., 2007b,c) or restricted isometry (Candès and Tao, 2005) assumptions. Keywords and phrases: Coherence, compatibility, irrepresentable condition, Lasso, restricted eigenvalue, restricted isometry, sparsity.
Pre-processing in AI based Prediction of QSARs
Patri, Om Prasad, Mishra, Amit Kumar
Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.
Expectation Propagation on the Maximum of Correlated Normal Variables
Many inference problems involving questions of optimality ask for the maximum or the minimum of a finite set of unknown quantities. This technical report derives the first two posterior moments of the maximum of two correlated Gaussian variables and the first two posterior moments of the two generating variables (corresponding to Gaussian approximations minimizing relative entropy). It is shown how this can be used to build a heuristic approximation to the maximum relationship over a finite set of Gaussian variables, allowing approximate inference by Expectation Propagation on such quantities.
Laplacian Support Vector Machines Trained in the Primal
Melacci, Stefano, Belkin, Mikhail
In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi--supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. Whereas training a LapSVM in the dual requires two steps, using the primal form allows us to collapse training to a single step. Moreover, the computational complexity of the training algorithm is reduced from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the combined number of labeled and unlabeled examples. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large datasets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach.
Initialization Free Graph Based Clustering
Galluccio, Laurent, Michel, Olivier J. J., Comon, Pierre, Slezak, Eric, Hero, Alfred O.
This paper proposes an original approach to cluster multi-component data sets, including an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm, and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are then computed in order to initialize the generalized Lloyd's algorithm, also known as $K$-means, which allows to circumvent initialization problems. Some results are derived for evaluating the false positive rate of our cluster detection algorithm, with the help of approximations relevant in Euclidean spaces. Metrics used for measuring similarity between multi-dimensional data points are based on symmetrical divergences. The use of these informational divergences together with the proposed method leads to better results, compared to other clustering methods for the problem of astrophysical data processing. Some applications of this method in the multi/hyper-spectral imagery domain to a satellite view of Paris and to an image of the Mars planet are also presented. In order to demonstrate the usefulness of divergences in our problem, the method with informational divergence as similarity measure is compared with the same method using classical metrics. In the astrophysics application, we also compare the method with the spectral clustering algorithms.
Telling cause from effect based on high-dimensional observations
Janzing, Dominik, Hoyer, Patrik O., Schoelkopf, Bernhard
We describe a method for inferring linear causal relations among multi-dimensional variables. The idea is to use an asymmetry between the distributions of cause and effect that occurs if both the covariance matrix of the cause and the structure matrix mapping cause to the effect are independently chosen. The method works for both stochastic and deterministic causal relations, provided that the dimensionality is sufficiently high (in some experiments, 5 was enough). It is applicable to Gaussian as well as non-Gaussian data.
Discrete MDL Predicts in Total Variation
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed.