Plotting

 Country


P-values for high-dimensional regression

arXiv.org Machine Learning

Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.


Regularization methods for learning incomplete matrices

arXiv.org Machine Learning

We use convex relaxation techniques to provide a sequence of solutions to the matrix completion problem. Using the nuclear norm as a regularizer, we provide simple and very efficient algorithms for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm iteratively replaces the missing elements with those obtained from a thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions.


Towards Improving Validation, Verification, Crash Investigations, and Event Reconstruction of Flight-Critical Systems with Self-Forensics

arXiv.org Artificial Intelligence

This paper introduces a novel concept of self-forensics to complement the standard autonomic self-CHOP properties of the self-managed systems, to be specified in the Forensic Lucid language. We argue that self-forensics, with the forensics taken out of the cybercrime domain, is applicable to "self-dissection" for the purpose of verification of autonomous software and hardware systems of flight-critical systems for automated incident and anomaly analysis and event reconstruction by the engineering teams in a variety of incident scenarios during design and testing as well as actual flight data.


On Maximum a Posteriori Estimation of Hidden Markov Processes

arXiv.org Artificial Intelligence

We present a theoretical analysis of Maximum a Posteriori (MAP) sequence estimation for binary symmetric hidden Markov processes. We reduce the MAP estimation to the energy minimization of an appropriately defined Ising spin model, and focus on the performance of MAP as characterized by its accuracy and the number of solutions corresponding to a typical observed sequence. It is shown that for a finite range of sufficiently low noise levels, the solution is uniquely related to the observed sequence, while the accuracy degrades linearly with increasing the noise strength. For intermediate noise values, the accuracy is nearly noise-independent, but now there are exponentially many solutions to the estimation problem, which is reflected in non-zero ground-state entropy for the Ising model. Finally, for even larger noise intensities, the number of solutions reduces again, but the accuracy is poor. It is shown that these regimes are different thermodynamic phases of the Ising model that are related to each other via first-order phase transitions.


Feature Reinforcement Learning: Part I: Unstructured MDPs

arXiv.org Artificial Intelligence

General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II. The role of POMDPs is also considered there.


Managing Requirement Volatility in an Ontology-Driven Clinical LIMS Using Category Theory. International Journal of Telemedicine and Applications

arXiv.org Artificial Intelligence

Requirement volatility is an issue in software engineering in general, and in Web-based clinical applications in particular, which often originates from an incomplete knowledge of the domain of interest. With advances in the health science, many features and functionalities need to be added to, or removed from, existing software applications in the biomedical domain. At the same time, the increasing complexity of biomedical systems makes them more difficult to understand, and consequently it is more difficult to define their requirements, which contributes considerably to their volatility. In this paper, we present a novel agent-based approach for analyzing and managing volatile and dynamic requirements in an ontology-driven laboratory information management system (LIMS) designed for Web-based case reporting in medical mycology. The proposed framework is empowered with ontologies and formalized using category theory to provide a deep and common understanding of the functional and nonfunctional requirement hierarchies and their interrelations, and to trace the effects of a change on the conceptual framework.


Toward a Category Theory Design of Ontological Knowledge Bases

arXiv.org Artificial Intelligence

I discuss (ontologies_and_ontological_knowledge_bases / formal_methods_and_theories) duality and its category theory extensions as a step toward a solution to Knowledge-Based Systems Theory. In particular I focus on the example of the design of elements of ontologies and ontological knowledge bases of next three electronic courses: Foundations of Research Activities, Virtual Modeling of Complex Systems and Introduction to String Theory.


Large-Margin kNN Classification Using a Deep Encoder Network

arXiv.org Artificial Intelligence

KNN is one of the most popular classification methods, but it often fails to work well with inappropriate choice of distance metric or due to the presence of numerous class-irrelevant features. Linear feature transformation methods have been widely applied to extract class-relevant information to improve kNN classification, which is very limited in many applications. Kernels have been used to learn powerful non-linear feature transformations, but these methods fail to scale to large datasets. In this paper, we present a scalable non-linear feature mapping method based on a deep neural network pretrained with restricted boltzmann machines for improving kNN classification in a large-margin framework, which we call DNet-kNN. DNet-kNN can be used for both classification and for supervised dimensionality reduction. The experimental results on two benchmark handwritten digit datasets show that DNet-kNN has much better performance than large-margin kNN using a linear mapping and kNN based on a deep autoencoder pretrained with retricted boltzmann machines.


Knowledge Management in Economic Intelligence with Reasoning on Temporal Attributes

arXiv.org Artificial Intelligence

People have to make important decisions within a time frame. Hence, it is imperative to employ means or strategy to aid effective decision making. Consequently, Economic Intelligence (EI) has emerged as a field to aid strategic and timely decision making in an organization. In the course of attaining this goal: it is indispensable to be more optimistic towards provision for conservation of intellectual resource invested into the process of decision making. This intellectual resource is nothing else but the knowledge of the actors as well as that of the various processes for effecting decision making. Knowledge has been recognized as a strategic economic resource for enhancing productivity and a key for innovation in any organization or community. Thus, its adequate management with cognizance of its temporal properties is highly indispensable. Temporal properties of knowledge refer to the date and time (known as timestamp) such knowledge is created as well as the duration or interval between related knowledge. This paper focuses on the needs for a user-centered knowledge management approach as well as exploitation of associated temporal properties. Our perspective of knowledge is with respect to decision-problems projects in EI. Our hypothesis is that the possibility of reasoning about temporal properties in exploitation of knowledge in EI projects should foster timely decision making through generation of useful inferences from available and reusable knowledge for a new project.


Introduction to Semi-Supervised Learning

Morgan & Claypool Publishers

In this introductory book, we present some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi-supervised support vector machines. ISBN 9781598295474, 130 pages.