Goto

Collaborating Authors

 Country


A Structural Approach to Coordinate-Free Statistics

arXiv.org Machine Learning

We consider the question of learning in general topological vector spaces. By exploiting known (or parametrized) covariance structures, our Main Theorem demonstrates that any continuous linear map corresponds to a certain isomorphism of embedded Hilbert spaces. By inverting this isomorphism and extending continuously, we construct a version of the Ordinary Least Squares estimator in absolute generality. Our Gauss-Markov theorem demonstrates that OLS is a "best linear unbiased estimator", extending the classical result. We construct a stochastic version of the OLS estimator, which is a continuous disintegration exactly for the class of "uncorrelated implies independent" (UII) measures. As a consequence, Gaussian measures always exhibit continuous disintegrations through continuous linear maps, extending a theorem of the first author. Applying this framework to some problems in machine learning, we prove a useful representation theorem for covariance tensors, and show that OLS defines a good kriging predictor for vector-valued arrays on general index spaces. We also construct a support-vector machine classifier in this setting. We hope that our article shines light on some deeper connections between probability theory, statistics and machine learning, and may serve as a point of intersection for these three communities.


Ridge Fusion in Statistical Learning

arXiv.org Machine Learning

We propose a penalized likelihood method to jointly estimate multiple precision matrices for use in quadratic discriminant analysis and model based clustering. A ridge penalty and a ridge fusion penalty are used to introduce shrinkage and promote similarity between precision matrix estimates. Block-wise coordinate descent is used for optimization, and validation likelihood is used for tuning parameter selection. Our method is applied in quadratic discriminant analysis and semi-supervised model based clustering.


Nested Hierarchical Dirichlet Processes

arXiv.org Machine Learning

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.


Markov Blanket Ranking using Kernel-based Conditional Dependence Measures

arXiv.org Machine Learning

Developing feature selection algorithms that move beyond a pure correlational to a more causal analysis of observational data is an important problem in the sciences. Several algorithms attempt to do so by discovering the Markov blanket of a target, but they all contain a forward selection step which variables must pass in order to be included in the conditioning set. As a result, these algorithms may not consider all possible conditional multivariate combinations. We improve on this limitation by proposing a backward elimination method that uses a kernel-based conditional dependence measure to identify the Markov blanket in a fully multivariate fashion. The algorithm is easy to implement and compares favorably to other methods on synthetic and real datasets.


A Rank-SVM Approach to Anomaly Detection

arXiv.org Machine Learning

We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on rank-SVM. Data points are first ranked based on scores derived from nearest neighbor graphs on n-point nominal data. We then train a rank-SVM using this ranked data. A test-point is declared as an anomaly at alpha-false alarm level if the predicted score is in the alpha-percentile. The resulting anomaly detector is shown to be asymptotically optimal and adaptive in that for any false alarm rate alpha, its decision region converges to the alpha-percentile level set of the unknown underlying density. In addition we illustrate through a number of synthetic and real-data experiments both the statistical performance and computational efficiency of our anomaly detector.


Extension-based Semantics of Abstract Dialectical Frameworks

arXiv.org Artificial Intelligence

One of the most prominent tools for abstract argumentation is the Dung's framework, AF for short. It is accompanied by a variety of semantics including grounded, complete, preferred and stable. Although powerful, AFs have their shortcomings, which led to development of numerous enrichments. Among the most general ones are the abstract dialectical frameworks, also known as the ADFs. They make use of the so-called acceptance conditions to represent arbitrary relations. This level of abstraction brings not only new challenges, but also requires addressing existing problems in the field. One of the most controversial issues, recognized not only in argumentation, concerns the support cycles. In this paper we introduce a new method to ensure acyclicity of the chosen arguments and present a family of extension-based semantics built on it. We also continue our research on the semantics that permit cycles and fill in the gaps from the previous works. Moreover, we provide ADF versions of the properties known from the Dung setting. Finally, we also introduce a classification of the developed sub-semantics and relate them to the existing labeling-based approaches.


A Cookbook for Temporal Conceptual Data Modelling with Description Logics

arXiv.org Artificial Intelligence

We design temporal description logics suitable for reasoning about temporal conceptual data models and investigate their computational complexity. Our formalisms are based on DL-Lite logics with three types of concept inclusions (ranging from atomic concept inclusions and disjointness to the full Booleans), as well as cardinality constraints and role inclusions. In the temporal dimension, they capture future and past temporal operators on concepts, flexible and rigid roles, the operators `always' and `some time' on roles, data assertions for particular moments of time and global concept inclusions. The logics are interpreted over the Cartesian products of object domains and the flow of time (Z,<), satisfying the constant domain assumption. We prove that the most expressive of our temporal description logics (which can capture lifespan cardinalities and either qualitative or quantitative evolution constraints) turn out to be undecidable. However, by omitting some of the temporal operators on concepts/roles or by restricting the form of concept inclusions we obtain logics whose complexity ranges between PSpace and NLogSpace. These positive results were obtained by reduction to various clausal fragments of propositional temporal logic, which opens a way to employ propositional or first-order temporal provers for reasoning about temporal data models.


Exchangeable Variable Models

arXiv.org Artificial Intelligence

A sequence of random variables is exchangeable if its joint distribution is invariant under variable permutations. We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. We prove that a family of tractable EVMs is optimal under zero-one loss for a large class of functions, including parity and threshold functions, and strictly subsumes existing tractable independence-based model families. Extensive experiments show that EVMs outperform state of the art classifiers such as SVMs and probabilistic models which are solely based on independence assumptions.


Representation of a Sentence using a Polar Fuzzy Neutrosophic Semantic Net

arXiv.org Artificial Intelligence

A semantic net can be used to represent a sentence. A sentence in a language contains semantics which are polar in nature, that is, semantics which are positive, neutral and negative. Neutrosophy is a relatively new field of science which can be used to mathematically represent triads of concepts. These triads include truth, indeterminacy and falsehood, and so also positivity, neutrality and negativity. Thus a conventional semantic net has been extended in this paper using neutrosophy into a Polar Fuzzy Neutrosophic Semantic Net. A Polar Fuzzy Neutrosophic Semantic Net has been implemented in MATLAB and has been used to illustrate a polar sentence in English language. The paper demonstrates a method for the representation of polarity in a computers memory. Thus, polar concepts can be applied to imbibe a machine such as a robot, with emotions, making machine emotion representation possible.


Cover Tree Bayesian Reinforcement Learning

arXiv.org Machine Learning

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with least squares policy iteration.