Industry
P-values for high-dimensional regression
Meinshausen, Nicolai, Meier, Lukas, Bühlmann, Peter
Assigning significance in high-dimensional regression is challenging. Most computationally efficient selection algorithms cannot guard against inclusion of noise variables. Asymptotically valid p-values are not available. An exception is a recent proposal by Wasserman and Roeder (2008) which splits the data into two parts. The number of variables is then reduced to a manageable size using the first split, while classical variable selection techniques can be applied to the remaining variables, using the data from the second split. This yields asymptotic error control under minimal conditions. It involves, however, a one-time random split of the data. Results are sensitive to this arbitrary choice: it amounts to a `p-value lottery' and makes it difficult to reproduce results. Here, we show that inference across multiple random splits can be aggregated, while keeping asymptotic control over the inclusion of noise variables. We show that the resulting p-values can be used for control of both family-wise error (FWER) and false discovery rate (FDR). In addition, the proposed aggregation is shown to improve power while reducing the number of falsely selected variables substantially.
Neural networks in 3D medical scan visualization
Zukić, Dženan, Elsner, Andreas, Avdagić, Zikrija, Domik, Gitta
For medical volume visualization, one of the most important tasks is to reveal clinically relevant details from the 3D scan (CT, MRI ...), e.g. the coronary arteries, without obscuring them with less significant parts. These volume datasets contain different materials which are difficult to extract and visualize with 1D transfer functions based solely on the attenuation coefficient. Multi-dimensional transfer functions allow a much more precise classification of data which makes it easier to separate different surfaces from each other. Unfortunately, setting up multi-dimensional transfer functions can become a fairly complex task, generally accomplished by trial and error. This paper explains neural networks, and then presents an efficient way to speed up visualization process by semi-automatic transfer function generation. We describe how to use neural networks to detect distinctive features shown in the 2D histogram of the volume data and how to use this information for data classification.
Regularization methods for learning incomplete matrices
Mazumder, Rahul, Hastie, Trevor, Tibshirani, Rob
We use convex relaxation techniques to provide a sequence of solutions to the matrix completion problem. Using the nuclear norm as a regularizer, we provide simple and very efficient algorithms for minimizing the reconstruction error subject to a bound on the nuclear norm. Our algorithm iteratively replaces the missing elements with those obtained from a thresholded SVD. With warm starts this allows us to efficiently compute an entire regularization path of solutions.
Towards Improving Validation, Verification, Crash Investigations, and Event Reconstruction of Flight-Critical Systems with Self-Forensics
In this paper we introduce a new concept for flight-critical integrated software and hardware systems to analyze themselves forensically as needed as well as keeping forensics data for further automated analysis in cases of reports of anomalies, failures, and crashes. We insist this should be a part of the protocol for each system, (even not only flight systems), but any large and/or critical self-managed system. This proposition is a rehash of the related work of the author during his PhD studies [1, 2] for the NASA spacecraft self-forensics concept as well as a work towards improving the safety and crash investigation of read vehicles with similar means. We review some of the related work that these ideas are built upon prior describing the requirements for self-forensics components. We describe the general requirements as well as limitations and advantages. This is a draft sketch.
Managing Requirement Volatility in an Ontology-Driven Clinical LIMS Using Category Theory. International Journal of Telemedicine and Applications
Shaban-Nejad, Arash, Ormandjieva, Olga, Kassab, Mohamad, Haarslev, Volker
Requirement volatility is an issue in software engineering in general, and in Web-based clinical applications in particular, which often originates from an incomplete knowledge of the domain of interest. With advances in the health science, many features and functionalities need to be added to, or removed from, existing software applications in the biomedical domain. At the same time, the increasing complexity of biomedical systems makes them more difficult to understand, and consequently it is more difficult to define their requirements, which contributes considerably to their volatility. In this paper, we present a novel agentbased approach for analyzing and managing volatile and dynamic requirements in an ontology-driven laboratory information management system (LIMS) designed for Web-based case reporting in medical mycology. The proposed framework is empowered with ontologies and formalized using category theory to provide a deep and common understanding of the 1 functional and nonfunctional requirement hierarchies and their interrelations, and to trace the effects of a change on the conceptual framework. Keywords: LIMS, requirement volatility, requirement change management, ontology, category theory, intelligent agents 1. Introduction The life sciences constitute a challenging domain in knowledge representation. Biological data are highly dynamic, and bioinformatics applications are large and there are complex interrelationships between their elements with various levels of interpretation for each concept. In an ideal situation, the requirements for a software system should be completely and unambiguously determined before design, coding, and testing take place. The complexity of bioinformatics applications and their constant evolution lead to frequent changes in their requirements: often new requirements are added and existing requirements are modified or deleted, causing parts of the software system to be redesigned, deleted, or added. Such changes lead to volatility in the requirements of bioinformatics applications. In this paper, we deal with an important problem of requirements volatility in the context of an ontology-driven clinical laboratory information management system (LIMS)[1, 2].
Knowledge Management in Economic Intelligence with Reasoning on Temporal Attributes
Oladejo, Bolanle, Osofisan, Adenike, Odumuyiwa, Victor
People have to make important decisions within a time frame. Hence, it is imperative to employ means or strategy to aid effective decision making. Consequently, Economic Intelligence (EI) has emerged as a field to aid strategic and timely decision making in an organization. In the course of attaining this goal: it is indispensable to be more optimistic towards provision for conservation of intellectual resource invested into the process of decision making. This intellectual resource is nothing else but the knowledge of the actors as well as that of the various processes for effecting decision making. Knowledge has been recognized as a strategic economic resource for enhancing productivity and a key for innovation in any organization or community. Thus, its adequate management with cognizance of its temporal properties is highly indispensable. Temporal properties of knowledge refer to the date and time (known as timestamp) such knowledge is created as well as the duration or interval between related knowledge. This paper focuses on the needs for a user-centered knowledge management approach as well as exploitation of associated temporal properties. Our perspective of knowledge is with respect to decision-problems projects in EI. Our hypothesis is that the possibility of reasoning about temporal properties in exploitation of knowledge in EI projects should foster timely decision making through generation of useful inferences from available and reusable knowledge for a new project.
Feature Reinforcement Learning: Part I: Unstructured MDPs
General-purpose, intelligent, learning agents cycle through sequences of observations, actions, and rewards that are complex, uncertain, unknown, and non-Markovian. On the other hand, reinforcement learning is well-developed for small finite state Markov decision processes (MDPs). Up to now, extracting the right state representations out of bare observations, that is, reducing the general agent setup to the MDP framework, is an art that involves significant effort by designers. The primary goal of this work is to automate the reduction process and thereby significantly expand the scope of many existing reinforcement learning algorithms and the agents that employ them. Before we can think of mechanizing this search for suitable MDPs, we need a formal objective criterion. The main contribution of this article is to develop such a criterion. I also integrate the various parts into one learning algorithm. Extensions to more realistic dynamic Bayesian networks are developed in Part II [Hut09c]. The role of POMDPs is also considered there.
Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy
Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. "Structure" has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Beginning with the role of number theory in expressing data, we show how we can naturally proceed to hierarchical structures. We show how this both encapsulates traditional paradigms in data analysis, and also opens up new perspectives towards issues that are on the order of the day, including data mining of massive, high dimensional, heterogeneous data sets. Linkages with other fields are also discussed including computational logic and symbolic dynamics. The structures in data surveyed here are based on hierarchy, represented as p-adic numbers or an ultrametric topology.
Solar radiation forecasting using ad-hoc time series preprocessing and neural networks
Paoli, Christophe, Voyant, Cyril, Muselli, Marc, Nivet, Marie-Laure
In this paper, we present an application of neural networks in the renewable energy domain. We have developed a methodology for the daily prediction of global solar radiation on a horizontal surface. We use an ad-hoc time series preprocessing and a Multi-Layer Perceptron (MLP) in order to predict solar radiation at daily horizon. First results are promising with nRMSE < 21% and RMSE < 998 Wh/m2. Our optimized MLP presents prediction similar to or even better than conventional methods such as ARIMA techniques, Bayesian inference, Markov chains and k-Nearest-Neighbors approximators. Moreover we found that our data preprocessing approach can reduce significantly forecasting errors.