Genre
PAC-Bayesian Majority Vote for Late Classifier Fusion
Morvant, Emilie, Habrard, Amaury, Ayache, Stéphane
A lot of attention has been devoted to multimedia indexing over the past few years. In the literature, we often consider two kinds of fusion schemes: The early fusion and the late fusion. In this paper we focus on late classifier fusion, where one combines the scores of each modality at the decision level. To tackle this problem, we investigate a recent and elegant well-founded quadratic program named MinCq coming from the Machine Learning PAC-Bayes theory. MinCq looks for the weighted combination, over a set of real-valued functions seen as voters, leading to the lowest misclassification rate, while making use of the voters' diversity. We provide evidence that this method is naturally adapted to late fusion procedure. We propose an extension of MinCq by adding an order- preserving pairwise loss for ranking, helping to improve Mean Averaged Precision measure. We confirm the good behavior of the MinCq-based fusion approaches with experiments on a real image benchmark.
Belief Updating and Learning in Semi-Qualitative Probabilistic Networks
de Campos, Cassio Polpo, Cozman, Fabio Gagliardi
This paper explores semi-qualitative probabilistic networks (SQPNs) that combine numeric and qualitative information. We first show that exact inferences with SQPNs are NPPP-Complete. We then show that existing qualitative relations in SQPNs (plus probabilistic logic and imprecise assessments) can be dealt effectively through multilinear programming. We then discuss learning: we consider a maximum likelihood method that generates point estimates given a SQPN and empirical data, and we describe a Bayesian-minded method that employs the Imprecise Dirichlet Model to generate set-valued estimates.
Inferring land use from mobile phone activity
Toole, Jameson L., Ulm, Michael, Bauer, Dietmar, Gonzalez, Marta C.
Understanding the spatiotemporal distribution of people within a city is crucial to many planning applications. Obtaining data to create required knowledge, currently involves costly survey methods. At the same time ubiquitous mobile sensors from personal GPS devices to mobile phones are collecting massive amounts of data on urban systems. The locations, communications, and activities of millions of people are recorded and stored by new information technologies. This work utilizes novel dynamic data, generated by mobile phone users, to measure spatiotemporal changes in population. In the process, we identify the relationship between land use and dynamic population over the course of a typical week. A machine learning classification algorithm is used to identify clusters of locations with similar zoned uses and mobile phone activity patterns. It is shown that the mobile phone data is capable of delivering useful information on actual land use that supplements zoning regulations.
Relational Data Mining Through Extraction of Representative Exemplars
Blanchard, Frédéric, Herbin, Michel
With the growing interest on Network Analysis, Relational Data Mining is becoming an emphasized domain of Data Mining. This paper addresses the problem of extracting representative elements from a relational dataset. After defining the notion of degree of representativeness, computed using the Borda aggregation procedure, we present the extraction of exemplars which are the representative elements of the dataset. We use these concepts to build a network on the dataset. We expose the main properties of these notions and we propose two typical applications of our framework. The first application consists in resuming and structuring a set of binary images and the second in mining co-authoring relation in a research team.
Polarimetric SAR Image Smoothing with Stochastic Distances
Torres, Leonardo, Medeiros, Antonio C., Frery, Alejandro C.
Polarimetric Synthetic Aperture Radar (PolSAR) images are establishing as an important source of information in remote sensing applications. The most complete format this type of imaging produces consists of complex-valued Hermitian matrices in every image coordinate and, as such, their visualization is challenging. They also suffer from speckle noise which reduces the signal-to-noise ratio. Smoothing techniques have been proposed in the literature aiming at preserving different features and, analogously, projections from the cone of Hermitian positive matrices to different color representation spaces are used for enhancing certain characteristics. In this work we propose the use of stochastic distances between models that describe this type of data in a Nagao-Matsuyama-type of smoothing technique. The resulting images are shown to present good visualization properties (noise reduction with preservation of fine details) in all the considered visualization spaces.
Generalized Statistical Complexity of SAR Imagery
de Almeida, Eliana S., de Medeiros, Antonio Carlos, Rosso, Osvaldo A., Frery, Alejandro C.
A new generalized Statistical Complexity Measure (SCM) was proposed by Rosso et al in 2010. It is a functional that captures the notions of order/disorder and of distance to an equilibrium distribution. The former is computed by a measure of entropy, while the latter depends on the definition of a stochastic divergence. When the scene is illuminated by coherent radiation, image data is corrupted by speckle noise, as is the case of ultrasound-B, sonar, laser and Synthetic Aperture Radar (SAR) sensors. In the amplitude and intensity formats, this noise is multiplicative and non-Gaussian requiring, thus, specialized techniques for image processing and understanding. One of the most successful family of models for describing these images is the Multiplicative Model which leads, among other probability distributions, to the G0 law. This distribution has been validated in the literature as an expressive and tractable model, deserving the "universal" denomination for its ability to describe most types of targets. In order to compute the statistical complexity of a site in an image corrupted by speckle noise, we assume that the equilibrium distribution is that of fully developed speckle, namely the Gamma law in intensity format, which appears in areas with little or no texture. We use the Shannon entropy along with the Hellinger distance to measure the statistical complexity of intensity SAR images, and we show that it is an expressive feature capable of identifying many types of targets.
Agnostic System Identification for Model-Based Reinforcement Learning
Ross, Stephane, Bagnell, J. Andrew
A fundamental problem in control is to learn a model of a system from observations that is useful for controller synthesis. To provide good performance guarantees, existing methods must assume that the real system is in the class of models considered during learning. We present an iterative method with strong guarantees even in the agnostic case where the system is not in the class. In particular, we show that any no-regret online learning algorithm can be used to obtain a near-optimal policy, provided some model achieves low training error and access to a good exploration distribution. Our approach applies to both discrete and continuous domains. We demonstrate its efficacy and scalability on a challenging helicopter domain from the literature.
Readouts for Echo-state Networks Built using Locally Regularized Orthogonal Forward Regression
Dolinský, Ján, Hirose, Kei, Konishi, Sadanori
Echo state network (ESN) is viewed as a temporal non-orthogonal expansion with pseudo-random parameters. Such expansions naturally give rise to regressors of various relevance to a teacher output. We illustrate that often only a certain amount of the generated echo-regressors effectively explain the variance of the teacher output and also that sole local regularization is not able to provide in-depth information concerning the importance of the generated regressors. The importance is therefore determined by a joint calculation of the individual variance contributions and Bayesian relevance using locally regularized orthogonal forward regression (LROFR) algorithm. This information can be advantageously used in a variety of ways for an in-depth analysis of an ESN structure and its state-space parameters in relation to the unknown dynamics of the underlying problem. We present locally regularized linear readout built using LROFR. The readout may have a different dimensionality than an ESN model itself, and besides improving robustness and accuracy of an ESN it relates the echo-regressors to different features of the training data and may determine what type of an additional readout is suitable for a task at hand. Moreover, as flexibility of the linear readout has limitations and might sometimes be insufficient for certain tasks, we also present a radial basis function (RBF) readout built using LROFR. It is a flexible and parsimonious readout with excellent generalization abilities and is a viable alternative to readouts based on a feed-forward neural network (FFNN) or an RBF net built using relevance vector machine (R VM). Introduction ESNs are a novel class of recurrent neural networks (RNN) [1]. Their easy construction and simple training procedure are appealing and have attracted the attention of many researchers. Vector function f is applied element-wise to its arguments. The most common choice forf is either a vector of sigmoid or identity functions. The expansion is carried out so that diverse echoes of an input and teacher signal are generated (hence the name echo-state). This diversity, which should appropriately "explain" a variance of a teacher signal, is the key to the successful training of an ESN.
Aggregating Content and Network Information to Curate Twitter User Lists
Greene, Derek, Sheridan, Gavin, Smyth, Barry, Cunningham, Pádraig
Twitter introduced user lists in late 2009, allowing users to be grouped according to meaningful topics or themes. Lists have since been adopted by media outlets as a means of organising content around news stories. Thus the curation of these lists is important - they should contain the key information gatekeepers and present a balanced perspective on a story. Here we address this list curation process from a recommender systems perspective. We propose a variety of criteria for generating user list recommendations, based on content analysis, network analysis, and the "crowdsourcing" of existing user lists. We demonstrate that these types of criteria are often only successful for datasets with certain characteristics. To resolve this issue, we propose the aggregation of these different "views" of a news story on Twitter to produce more accurate user recommendations to support the curation process.
Innovative Applications of Artificial Intelligence 2011: Introduction to the Special Issue
Shapiro, Daniel G. (Institute for the Study of Learning and Expertise) | Fromherz, Markus (Xerox)
Every year, AI Magazine devotes one fourth of its annual production to a special issue based on the Innovative Applications of Artificial Intelligence conference. Because IAAI is the premier venue for documenting the transition of AI technology into application, these special issues provide a snapshot of the state of the art in AI with the practical syllogism in mind; they present work that has value because it delivers value in use.