Not enough data to create a plot.
Try a different view from the menu above.
Country
Statistical Decision Making for Authentication and Intrusion Detection
Dimitrakakis, Christos, Mitrokotsa, Aikaterini
User authentication and intrusion detection differ from standard classification problems in that while we have data generated from legitimate users, impostor or intrusion data is scarce or non-existent. We review existing techniques for dealing with this problem and propose a novel alternative based on a principled statistical decision-making view point. We examine the technique on a toy problem and validate it on complex real-world data from an RFID based access control system. The results indicate that it can significantly outperform the classical world model approach. The method could be more generally useful in other decision-making scenarios where there is a lack of adversary data.
Pre-processing in AI based Prediction of QSARs
Patri, Om Prasad, Mishra, Amit Kumar
Machine learning, data mining and artificial intelligence (AI) based methods have been used to determine the relations between chemical structure and biological activity, called quantitative structure activity relationships (QSARs) for the compounds. Pre-processing of the dataset, which includes the mapping from a large number of molecular descriptors in the original high dimensional space to a small number of components in the lower dimensional space while retaining the features of the original data, is the first step in this process. A common practice is to use a mapping method for a dataset without prior analysis. This pre-analysis has been stressed in our work by applying it to two important classes of QSAR prediction problems: drug design (predicting anti-HIV-1 activity) and predictive toxicology (estimating hepatocarcinogenicity of chemicals). We apply one linear and two nonlinear mapping methods on each of the datasets. Based on this analysis, we conclude the nature of the inherent relationships between the elements of each dataset, and hence, the mapping method best suited for it. We also show that proper preprocessing can help us in choosing the right feature extraction tool as well as give an insight about the type of classifier pertinent for the given problem.
Expectation Propagation on the Maximum of Correlated Normal Variables
Many inference problems involving questions of optimality ask for the maximum or the minimum of a finite set of unknown quantities. This technical report derives the first two posterior moments of the maximum of two correlated Gaussian variables and the first two posterior moments of the two generating variables (corresponding to Gaussian approximations minimizing relative entropy). It is shown how this can be used to build a heuristic approximation to the maximum relationship over a finite set of Gaussian variables, allowing approximate inference by Expectation Propagation on such quantities.
Laplacian Support Vector Machines Trained in the Primal
Melacci, Stefano, Belkin, Mikhail
In the last few years, due to the growing ubiquity of unlabeled data, much effort has been spent by the machine learning community to develop better understanding and improve the quality of classifiers exploiting unlabeled data. Following the manifold regularization approach, Laplacian Support Vector Machines (LapSVMs) have shown the state of the art performance in semi--supervised classification. In this paper we present two strategies to solve the primal LapSVM problem, in order to overcome some issues of the original dual formulation. Whereas training a LapSVM in the dual requires two steps, using the primal form allows us to collapse training to a single step. Moreover, the computational complexity of the training algorithm is reduced from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the combined number of labeled and unlabeled examples. We speed up training by using an early stopping strategy based on the prediction on unlabeled data or, if available, on labeled validation examples. This allows the algorithm to quickly compute approximate solutions with roughly the same classification accuracy as the optimal ones, considerably reducing the training time. Due to its simplicity, training LapSVM in the primal can be the starting point for additional enhancements of the original LapSVM formulation, such as those for dealing with large datasets. We present an extensive experimental evaluation on real world data showing the benefits of the proposed approach.
Dealing with incomplete agents' preferences and an uncertain agenda in group decision making via sequential majority voting
Pini, Maria, Rossi, Francesca, Venable, Brent, Walsh, Toby
We consider multi-agent systems where agents' preferences are aggregated via sequential majority voting: each decision is taken by performing a sequence of pairwise comparisons where each comparison is a weighted majority vote among the agents. Incompleteness in the agents' preferences is common in many real-life settings due to privacy issues or an ongoing elicitation process. In addition, there may be uncertainty about how the preferences are aggregated. For example, the agenda (a tree whose leaves are labelled with the decisions being compared) may not yet be known or fixed. We therefore study how to determine collectively optimal decisions (also called winners) when preferences may be incomplete, and when the agenda may be uncertain. We show that it is computationally easy to determine if a candidate decision always wins, or may win, whatever the agenda. On the other hand, it is computationally hard to know wheth er a candidate decision wins in at least one agenda for at least one completion of the agents' preferences. These results hold even if the agenda must be balanced so that each candidate decision faces the same number of majority votes. Such results are useful for reasoning about preference elicitation. They help understand the complexity of tasks such as determining if a decision can be taken collectively, as well as knowing if the winner can be manipulated by appropriately ordering the agenda.
Telling cause from effect based on high-dimensional observations
Janzing, Dominik, Hoyer, Patrik O., Schoelkopf, Bernhard
We describe a method for inferring linear causal relations among multi-dimensional variables. The idea is to use an asymmetry between the distributions of cause and effect that occurs if both the covariance matrix of the cause and the structure matrix mapping cause to the effect are independently chosen. The method works for both stochastic and deterministic causal relations, provided that the dimensionality is sufficiently high (in some experiments, 5 was enough). It is applicable to Gaussian as well as non-Gaussian data.
Initialization Free Graph Based Clustering
Galluccio, Laurent, Michel, Olivier J. J., Comon, Pierre, Slezak, Eric, Hero, Alfred O.
This paper proposes an original approach to cluster multi-component data sets, including an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm, and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are then computed in order to initialize the generalized Lloyd's algorithm, also known as $K$-means, which allows to circumvent initialization problems. Some results are derived for evaluating the false positive rate of our cluster detection algorithm, with the help of approximations relevant in Euclidean spaces. Metrics used for measuring similarity between multi-dimensional data points are based on symmetrical divergences. The use of these informational divergences together with the proposed method leads to better results, compared to other clustering methods for the problem of astrophysical data processing. Some applications of this method in the multi/hyper-spectral imagery domain to a satellite view of Paris and to an image of the Mars planet are also presented. In order to demonstrate the usefulness of divergences in our problem, the method with informational divergence as similarity measure is compared with the same method using classical metrics. In the astrophysics application, we also compare the method with the spectral clustering algorithms.
Discrete MDL Predicts in Total Variation
The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed.
Using Physics- and Sensor-based Simulation for High-Fidelity Temporal Projection of Realistic Robot Behavior
Mösenlechner, Lorenz (Technische Universität München) | Beetz, Michael (Technische Universität München)
Planning means deciding on the future course of action based on predictions of what will happen when an activity is carried out in one way or the other. As we apply action planning to autonomous, sensor-guided mobile robots with manipulators or even to humanoid robots we need very realistic and detailed predictions of the behavior generated by a plan in order to improve the robot's performance substantially. In this paper we investigate the high-fidelity temporal projection of realistic robot behavior based on physics- and sensor-based simulation systems. We equip a simulator and interpreter with means to log simulated plan executions into a database. A logic-based query and inference mechanism then retrieves and reconstructs the necessary information from the database and translates the information into a first-order representation of robot plans and the behavior they generate. The query language enables the robot planning system to infer the intentions, the beliefs, and the world state at any projected time. It also allows the planning system to recognize, diagnose, and analyze various plan failures typical for performing everyday manipulation tasks.
Automatic Derivation of Memoryless Policies and Finite-State Controllers Using Classical Planners
Bonet, Blai (Universidad Simón Bolívar) | Palacios, Héctor (Universidad Simón Bolívar) | Geffner, Héctor (ICREA and Universitat Pompeu Fabra)
Finite-state and memoryless controllers are simple action selection mechanisms widely used in domains such as video-games and mobile robotics. Memoryless controllers stand for functions that map observations into actions, while finite-state controllers generalize memoryless ones with a finite amount of memory. In contrast to the policies obtained from MDPs and POMDPs, finite-state controllers have two advantages: they are often extremely compact, involving a small number of controller states or none at all, and they are general, applying to many problems and not just one. A limitation of finite-state controllers is that they must be written by hand. In this work, we address this limitation, and develop a method for deriving finite-state controllers automatically from models. These models represent a class of contingent problems where actions are deterministic and some fluents are observable. The problem of deriving a controller from such models is converted into a conformant planning problem that is solved using classical planners, taking advantage of a complete translation introduced recently. The controllers derived in this way are 'general' in the sense that they do not solve the original problem only, but many variations as well, including changes in the size of the problem or in the uncertainty of the initial situation and action effects. Experiments illustrating the derivation of such controllers are presented.