Industry
Bayesian Biosurveillance of Disease Outbreaks
Cooper, Gregory F., Dash, Denver, Levander, John, Wong, Weng-Keen, Hogan, William, Wagner, Michael
Early, reliable detection of disease outbreaks is a critical problem today. This paper reports an investigation of the use of causal Bayesian networks to model spatio-temporal patterns of a non-contagious disease (respiratory anthrax infection) in a population of people. The number of parameters in such a network can become enormous, if not carefully managed. Also, inference needs to be performed in real time as population data stream in. We describe techniques we have applied to address both the modeling and inference challenges. A key contribution of this paper is the explication of assumptions and techniques that are sufficient to allow the scaling of Bayesian network modeling and inference to millions of nodes for real-time surveillance applications. The results reported here provide a proof-of-concept that Bayesian networks can serve as the foundation of a system that effectively performs Bayesian biosurveillance of disease outbreaks.
Dynamic Programming for Structured Continuous Markov Decision Problems
Feng, Zhengzhu, Dearden, Richard, Meuleau, Nicolas, Washington, Richard
We describe an approach for exploiting structure in Markov Decision Processes with continuous state variables. At each step of the dynamic programming, the state space is dynamically partitioned into regions where the value function is the same throughout the region. We first describe the algorithm for piecewise constant representations. We then extend it to piecewise linear representations, using techniques from POMDPs to represent and reason about linear surfaces efficiently. We show that for complex, structured problems, our approach exploits the natural structure so that optimal solutions can be computed efficiently.
Etude de Mod\`eles \`a base de r\'eseaux Bay\'esiens pour l'aide au diagnostic de tumeurs c\'er\'ebrales
Lamine, Fradj Ben, Kalti, Karim, Mahjoub, Mohamed Ali
This article describes different models based on Bayesian networks RB modeling expertise in the diagnosis of brain tumors. Indeed, they are well adapted to the representation of the uncertainty in the process of diagnosis of these tumors. In our work, we first tested several structures derived from the Bayesian network reasoning performed by doctors on the one hand and structures generated automatically on the other. This step aims to find the best structure that increases diagnostic accuracy. The machine learning algorithms relate MWST-EM algorithms, SEM and SEM + T. To estimate the parameters of the Bayesian network from a database incomplete, we have proposed an extension of the EM algorithm by adding a priori knowledge in the form of the thresholds calculated by the first phase of the algorithm RBE . The very encouraging results obtained are discussed at the end of the paper
Arabic CALL system based on pedagogically indexed text
Mohamed, Mohamed Achraf Ben, Ghoul, Dhaou El, Nahdi, Mohamed Amine, Mars, Mourad, Zrigui, Mounir
This article introduces the benefits of using computer as a tool for foreign language teaching and learning. It describes the effect of using Natural Language Processing (NLP) tools for learning Arabic. The technique explored in this particular case is the employment of pedagogically indexed corpora. This text-based method provides the teacher the advantage of building activities based on texts adapted to a particular pedagogical situation. This paper also presents ARAC: a Platform dedicated to language educators allowing them to create activities within their own pedagogical area of interest.
An Introduction to Artificial Prediction Markets for Classification
Prediction markets are used in real life to predict outcomes of interest such as presidential elections. This paper presents a mathematical theory of artificial prediction markets for supervised learning of conditional probability estimators. The artificial prediction market is a novel method for fusing the prediction information of features or trained classifiers, where the fusion result is the contract price on the possible outcomes. The market can be trained online by updating the participants' budgets using training examples. Inspired by the real prediction markets, the equations that govern the market are derived from simple and reasonable assumptions. Efficient numerical algorithms are presented for solving these equations. The obtained artificial prediction market is shown to be a maximum likelihood estimator. It generalizes linear aggregation, existent in boosting and random forest, as well as logistic regression and some kernel methods. Furthermore, the market mechanism allows the aggregation of specialized classifiers that participate only on specific instances. Experimental comparisons show that the artificial prediction markets often outperform random forest and implicit online learning on synthetic data and real UCI datasets. Moreover, an extensive evaluation for pelvic and abdominal lymph node detection in CT data shows that the prediction market improves adaboost's detection rate from 79.6% to 81.2% at 3 false positives/volume.
Forecasting electricity consumption by aggregating specialized experts
Devaine, Marie, Gaillard, Pierre, Goude, Yannig, Stoltz, Gilles
We consider the setting of sequential prediction of arbitrary sequences based on specialized experts. We first provide a review of the relevant literature and present two theoretical contributions: a general analysis of the specialist aggregation rule of Freund et al. (1997) and an adaptation of fixed-share rules of Herbster and Warmuth (1998) in this setting. We then apply these rules to the sequential short-term (one-day-ahead) forecasting of electricity consumption; to do so, we consider two data sets, a Slovakian one and a French one, respectively concerned with hourly and half-hourly predictions. We follow a general methodology to perform the stated empirical studies and detail in particular tuning issues of the learning parameters. The introduced aggregation rules demonstrate an improved accuracy on the data sets at hand; the improvements lie in a reduced mean squared error but also in a more robust behavior with respect to large occasional errors.
Rule Based Expert System for Diagnosis of Neuromuscular Disorders
Borgohain, Rajdeep, Sanyal, Sugata
In this paper, we discuss the implementation of a rule based expert system for diagnosing neuromuscular diseases. The proposed system is implemented as a rule based expert system in JESS for the diagnosis of Cerebral Palsy, Multiple Sclerosis, Muscular Dystrophy and Parkinson's disease. In the system, the user is presented with a list of questionnaires about the symptoms of the patients based on which the disease of the patient is diagnosed and possible treatment is suggested. The system can aid and support the patients suffering from neuromuscular diseases to get an idea of their disease and possible treatment for the disease.
Minimal Proof Search for Modal Logic K Model Checking
Most modal logics such as S5, LTL, or ATL are extensions of Modal Logic K. While the model checking problems for LTL and to a lesser extent ATL have been very active research areas for the past decades, the model checking problem for the more basic Multi-agent Modal Logic K (MMLK) has important applications as a formal framework for perfect information multi-player games on its own. We present Minimal Proof Search (MPS), an effort number based algorithm solving the model checking problem for MMLK. We prove two important properties for MPS beyond its correctness. The (dis)proof exhibited by MPS is of minimal cost for a general definition of cost, and MPS is an optimal algorithm for finding (dis)proofs of minimal cost. Optimality means that any comparable algorithm either needs to explore a bigger or equal state space than MPS, or is not guaranteed to find a (dis)proof of minimal cost on every input. As such, our work relates to A* and AO* in heuristic search, to Proof Number Search and DFPN+ in two-player games, and to counterexample minimization in software model checking.
Keeping greed good: sparse regression under design uncertainty with application to biomass characterization
Biagioni, David J., Elmore, Ryan, Jones, Wesley
This paper is motivated by the practical problem of how to meaningfully perform sparse regression when the predictor variables are observed with measurement error or some source of uncertainty. We will refer to this error or noise as design uncertainty to emphasize that perturbations in the design matrix may arise from a number of random sources unrelated to experimental or measurement error per se. Recent workin this areahasjust begun to addressthe issue ofsparseregressionunder design uncertainty from a theoretical point of view. We are primarily interested in describing an approach that, while theoretically justifiable, is essentially pragmatic and broadly applicable. In short, we argue that greed - a basic feature of many sparsity promoting algorithms - is indeed good [Tropp, 2004], so long as the design data is scaled by the uncertainty variances. We demonstrate the efficacy of scaling from several points of view and validate it empirically with a biomass characterization data set using two of the most widely used sparse algorithms: least angle regression (LARS) and the Dantzig selector (DS). Our work was motivated by an example from a biomass characterization experiment related to work at the National Renewable Energy Laboratory. The example is described in detail in Section 4 and contains repeated measurements of mass spectral (design, or predictor) and sugar mass fraction (response) values within each switchgrass sample. The domain scientists' goal was to find a small subset of masses in the spectrum that could be used to predict sugar mass fraction.
A Spectral Algorithm for Learning Hidden Markov Models
Hsu, Daniel, Kakade, Sham M., Zhang, Tong
Hidden Markov Models (HMMs) (Baum and Eagon, 1967; Rabiner, 1989) are the workhorse statistical model for discrete time series, with widely diverse applications including automatic speech recognition, natural language processing (NLP), and genomic sequence modeling. In this model, a discrete hidden state evolves according to some Markovian dynamics, and observations at a particular time depend only on the hidden state at that time. The learning problem is to estimate the model only with observation samples from the underlying distribution. Thus far, the predominant learning algorithms have been local search heuristics, such as the Baum-Welch / EM algorithm (Baum et al., 1970; Dempster et al., 1977). It is not surprising that practical algorithms have resorted to heuristics, as the general learning problem has been shown to be hard under cryptographic assumptions (Terwijn, 2002). Fortunately, the hardness results are for HMMs that seem divorced from those that we are likely to encounter in practical applications. The situation is in many ways analogous to learning mixture distributions with samples from the underlying distribution. There, the general problem is also believed to be hard. However, much recent progress has been made when certain separation assumptions are made with respect to the component mixture distributions (e.g.