Goto

Collaborating Authors

 Industry


Discriminative Feature Selection for Uncertain Graph Classification

arXiv.org Machine Learning

Mining discriminative features for graph data has attracted much attention in recent years due to its important role in constructing graph classifiers, generating graph indices, etc. Most measurement of interestingness of discriminative subgraph features are defined on certain graphs, where the structure of graph objects are certain, and the binary edges within each graph represent the "presence" of linkages among the nodes. In many real-world applications, however, the linkage structure of the graphs is inherently uncertain. Therefore, existing measurements of interestingness based upon certain graphs are unable to capture the structural uncertainty in these applications effectively. In this paper, we study the problem of discriminative subgraph feature selection from uncertain graphs. This problem is challenging and different from conventional subgraph mining problems because both the structure of the graph objects and the discrimination score of each subgraph feature are uncertain. To address these challenges, we propose a novel discriminative subgraph feature selection method, DUG, which can find discriminative subgraph features in uncertain graphs based upon different statistical measures including expectation, median, mode and phi-probability. We first compute the probability distribution of the discrimination scores for each subgraph feature based on dynamic programming. Then a branch-and-bound algorithm is proposed to search for discriminative subgraphs efficiently. Extensive experiments on various neuroimaging applications (i.e., Alzheimer's Disease, ADHD and HIV) have been performed to analyze the gain in performance by taking into account structural uncertainties in identifying discriminative subgraph features for graph classification.


An alternative text representation to TF-IDF and Bag-of-Words

arXiv.org Machine Learning

In text mining, information retrieval, and machine learning, text documents are commonly represented through variants of sparse Bag of Words (sBoW) vectors (e.g. TF-IDF). Although simple and intuitive, sBoW style representations suffer from their inherent over-sparsity and fail to capture word-level synonymy and polysemy. Especially when labeled data is limited (e.g. in document classification), or the text documents are short (e.g. emails or abstracts), many features are rarely observed within the training corpus. This leads to overfitting and reduced generalization accuracy. In this paper we propose Dense Cohort of Terms (dCoT), an unsupervised algorithm to learn improved sBoW document features. dCoT explicitly models absent words by removing and reconstructing random sub-sets of words in the unlabeled corpus. With this approach, dCoT learns to reconstruct frequent words from co-occurring infrequent words and maps the high dimensional sparse sBoW vectors into a low-dimensional dense representation. We show that the feature removal can be marginalized out and that the reconstruction can be solved for in closed-form. We demonstrate empirically, on several benchmark datasets, that dCoT features significantly improve the classification accuracy across several document classification tasks.


Linear-Nonlinear-Poisson Neuron Networks Perform Bayesian Inference On Boltzmann Machines

arXiv.org Artificial Intelligence

One conjecture in both deep learning and classical connectionist viewpoint is that the biological brain implements certain kinds of deep networks as its back-end. However, to our knowledge, a detailed correspondence has not yet been set up, which is important if we want to bridge between neuroscience and machine learning. Recent researches emphasized the biological plausibility of Linear-Nonlinear-Poisson (LNP) neuron model. We show that with neurally plausible settings, the whole network is capable of representing any Boltzmann machine and performing a semi-stochastic Bayesian inference algorithm lying between Gibbs sampling and variational inference.


Transfer Topic Modeling with Ease and Scalability

arXiv.org Machine Learning

The increasing volume of short texts generated on social media sites, such as Twitter or Facebook, creates a great demand for effective and efficient topic modeling approaches. While latent Dirichlet allocation (LDA) can be applied, it is not optimal due to its weakness in handling short texts with fast-changing topics and scalability concerns. In this paper, we propose a transfer learning approach that utilizes abundant labeled documents from other domains (such as Yahoo! News or Wikipedia) to improve topic modeling, with better model fitting and result interpretation. Specifically, we develop Transfer Hierarchical LDA (thLDA) model, which incorporates the label information from other domains via informative priors. In addition, we develop a parallel implementation of our model for large-scale applications. We demonstrate the effectiveness of our thLDA model on both a microblogging dataset and standard text collections including AP and RCV1 datasets.


Generalized double Pareto shrinkage

arXiv.org Machine Learning

We propose a generalized double Pareto prior for Bayesian shrinkage estimation and inferences in linear models. The prior can be obtained via a scale mixture of Laplace or normal distributions, forming a bridge between the Laplace and Normal-Jeffreys' priors. While it has a spike at zero like the Laplace density, it also has a Student's $t$-like tail behavior. Bayesian computation is straightforward via a simple Gibbs sampling algorithm. We investigate the properties of the maximum a posteriori estimator, as sparse estimation plays an important role in many problems, reveal connections with some well-established regularization procedures, and show some asymptotic results. The performance of the prior is tested through simulations and an application.


Developing Parallel Dependency Graph In Improving Game Balancing

arXiv.org Artificial Intelligence

The dependency graph is a data architecture that models all the dependencies between the different types of assets in the game. It depicts the dependency-based relationships between the assets of a game. For example, a player must construct an arsenal before he can build weapons. It is vital that the dependency graph of a game is designed logically to ensure a logical sequence of game play. However, a mere logical dependency graph is not sufficient in sustaining the players' enduring interests in a game, which brings the problem of game balancing into picture. The issue of game balancing arises when the players do not feel the chances of winning the game over their AI opponents who are more skillful in the game play. At the current state of research, the architecture of dependency graph is monolithic for the players. The sequence of asset possession is always foreseeable because there is only a single dependency graph. Game balancing is impossible when the assets of AI players are overwhelmingly outnumbering that of human players. This paper proposes a parallel architecture of dependency graph for the AI players and human players. Instead of having a single dependency graph, a parallel architecture is proposed where the dependency graph of AI player is adjustable with that of human player using a support dependency as a game balancing mechanism. This paper exhibits that the parallel dependency graph helps to improve game balancing.


The thermodynamic cost of fast thought

arXiv.org Artificial Intelligence

After more than sixty years, Shannon's research [1-3] continues to raise fundamental questions, such as the one formulated by Luce [4,5], which is still unanswered: "Why is information theory not very applicable to psychological problems, despite apparent similarities of concepts?" On this topic, Pinker [6], one of the foremost defenders of the computational theory of mind [6], has argued that thought is simply a type of computation, and that the gap between human cognition and computational models may be illusory. In this context, in his latest book, titled Thinking Fast and Slow [8], Kahneman [7,8] provides further theoretical interpretation by differentiating the two assumed systems of the cognitive functioning of the human mind. He calls them intuition (system 1) determined to be an associative (automatic, fast and perceptual) machine, and reasoning (system 2) required to be voluntary and to operate logical- deductively. In this paper, we propose an ansatz inspired by Ausubel's learning theory for investigating, from the constructivist perspective [9-12], information processing in the working memory of cognizers. Specifically, a thought experiment is performed utilizing the mind of a dual-natured creature known as Maxwell's demon: a tiny "man-machine" solely equipped with the characteristics of system 1, which prevents it from reasoning. The calculation presented here shows that [...]. This result indicates that when the system 2 is shut down, both an intelligent being, as well as a binary machine, incur the same energy cost per unit of information processed, which mathematically proves the computational attribute of the system 1, as Kahneman [7,8] theorized. This finding links information theory to human psychological features and opens a new path toward the conception of a multi-bit reasoning machine.


Explorative Data Analysis for Changes in Neural Activity

arXiv.org Machine Learning

Neural recordings are nonstationary time series, i.e. their properties typically change over time. Identifying specific changes, e.g. those induced by a learning task, can shed light on the underlying neural processes. However, such changes of interest are often masked by strong unrelated changes, which can be of physiological origin or due to measurement artifacts. We propose a novel algorithm for disentangling such different causes of non-stationarity and in this manner enable better neurophysiological interpretation for a wider set of experimental paradigms. A key ingredient is the repeated application of Stationary Subspace Analysis (SSA) using different temporal scales. The usefulness of our explorative approach is demonstrated in simulations, theory and EEG experiments with 80 Brain-Computer-Interfacing (BCI) subjects.


Mixture Gaussian Process Conditional Heteroscedasticity

arXiv.org Machine Learning

Generalized autoregressive conditional heteroscedasticity (GARCH) models have long been considered as one of the most successful families of approaches for volatility modeling in financial return series. In this paper, we propose an alternative approach based on methodologies widely used in the field of statistical machine learning. Specifically, we propose a novel nonparametric Bayesian mixture of Gaussian process regression models, each component of which models the noise variance process that contaminates the observed data as a separate latent Gaussian process driven by the observed data. This way, we essentially obtain a mixture Gaussian process conditional heteroscedasticity (MGPCH) model for volatility modeling in financial return series. We impose a nonparametric prior with power-law nature over the distribution of the model mixture components, namely the Pitman-Yor process prior, to allow for better capturing modeled data distributions with heavy tails and skewness. Finally, we provide a copula- based approach for obtaining a predictive posterior for the covariances over the asset returns modeled by means of a postulated MGPCH model. We evaluate the efficacy of our approach in a number of benchmark scenarios, and compare its performance to state-of-the-art methodologies.


A Framework for Intelligent Medical Diagnosis using Rough Set with Formal Concept Analysis

arXiv.org Artificial Intelligence

Medical diagnosis process vary in the degree to which they attempt to deal with different complicating aspects of diagnosis such as relative importance of symptoms, varied symptom pattern and the relation between diseases them selves. Based on decision theory, in the past many mathematical models such as crisp set, probability distribution, fuzzy set, intuitionistic fuzzy set were developed to deal with complicating aspects of diagnosis. But, many such models are failed to include important aspects of the expert decisions. Therefore, an effort has been made to process inconsistencies in data being considered by Pawlak with the introduction of rough set theory. Though rough set has major advantages over the other methods, but it generates too many rules that create many difficulties while taking decisions. Therefore, it is essential to minimize the decision rules. In this paper, we use two processes such as pre process and post process to mine suitable rules and to explore the relationship among the attributes. In pre process we use rough set theory to mine suitable rules, whereas in post process we use formal concept analysis from these suitable rules to explore better knowledge and most important factors affecting the decision making.