Dorffner, Georg
A critical analysis of metrics used for measuring progress in artificial intelligence
Blagec, Kathrin, Dorffner, Georg, Moradi, Milad, Samwald, Matthias
Comparing model performances on benchmark datasets is an integral part of measuring and driving progress in artificial intelligence. A model's performance on a benchmark dataset is commonly assessed based on a single or a small set of performance metrics. While this enables quick comparisons, it may also entail the risk of inadequately reflecting model performance if the metric does not sufficiently cover all performance characteristics. Currently, it is unknown to what extent this might impact current benchmarking efforts. To address this question, we analysed the current landscape of performance metrics based on data covering 3867 machine learning model performance results from the web-based open platform 'Papers with Code'. Our results suggest that the large majority of metrics currently used to evaluate classification AI benchmark tasks have properties that may result in an inadequate reflection of a classifiers' performance, especially when used with imbalanced datasets. While alternative metrics that address problematic properties have been proposed, they are currently rarely applied as performance metrics in benchmarking tasks. Finally, we noticed that the reporting of metrics was partly inconsistent and partly unspecific, which may lead to ambiguities when comparing model performances.
Bayesian deep neural networks for low-cost neurophysiological markers of Alzheimer's disease severity
Fruehwirt, Wolfgang, Cobb, Adam D., Mairhofer, Martin, Weydemann, Leonard, Garn, Heinrich, Schmidt, Reinhold, Benke, Thomas, Dal-Bianco, Peter, Ransmayr, Gerhard, Waser, Markus, Grossegger, Dieter, Zhang, Pengfei, Dorffner, Georg, Roberts, Stephen
As societies around the world are ageing, the number of Alzheimer's disease (AD) patients is rapidly increasing. To date, no low-cost, non-invasive biomarkers have been established to advance the objectivization of AD diagnosis and progression assessment. Here, we utilize Bayesian neural networks to develop a multivariate predictor for AD severity using a wide range of quantitative EEG (QEEG) markers. The Bayesian treatment of neural networks both automatically controls model complexity and provides a predictive distribution over the target function, giving uncertainty bounds for our regression task. It is therefore well suited to clinical neuroscience, where data sets are typically sparse and practitioners require a precise assessment of the predictive uncertainty. We use data of one of the largest prospective AD EEG trials ever conducted to demonstrate the potential of Bayesian deep learning in this domain, while comparing two distinct Bayesian neural network approaches, i.e., Monte Carlo dropout and Hamiltonian Monte Carlo.
Riemannian tangent space mapping and elastic net regularization for cost-effective EEG markers of brain atrophy in Alzheimer's disease
Fruehwirt, Wolfgang, Gerstgrasser, Matthias, Zhang, Pengfei, Weydemann, Leonard, Waser, Markus, Schmidt, Reinhold, Benke, Thomas, Dal-Bianco, Peter, Ransmayr, Gerhard, Grossegger, Dieter, Garn, Heinrich, Peters, Gareth W., Roberts, Stephen, Dorffner, Georg
The diagnosis of Alzheimer's disease (AD) in routine clinical practice is most commonly based on subjective clinical interpretations. Quantitative electroencephalography (QEEG) measures have been shown to reflect neurodegenerative processes in AD and might qualify as affordable and thereby widely available markers to facilitate the objectivization of AD assessment. Here, we present a novel framework combining Riemannian tangent space mapping and elastic net regression for the development of brain atrophy markers. While most AD QEEG studies are based on small sample sizes and psychological test scores as outcome measures, here we train and test our models using data of one of the largest prospective EEG AD trials ever conducted, including MRI biomarkers of brain atrophy.
Graded Grammaticality in Prediction Fractal Machines
Parfitt, Shan, Tiรฑo, Peter, Dorffner, Georg
We introduce a novel method of constructing language models, which avoids some of the problems associated with recurrent neural networks.The method of creating a Prediction Fractal Machine (PFM) [1] is briefly described and some experiments are presented which demonstrate the suitability of PFMs for language modeling. PFMs distinguish reliably between minimal pairs, and their behavior isconsistent with the hypothesis [4] that wellformedness is'graded' not absolute. A discussion of their potential to offer fresh insights into language acquisition and processing follows. 1 Introduction Cognitive linguistics has seen the development in recent years of two important, related trends.
Building Predictive Models from Fractal Representations of Symbolic Sequences
Tiรฑo, Peter, Dorffner, Georg
We propose a novel approach for building finite memory predictive models similarin spirit to variable memory length Markov models (VLMMs). The models are constructed by first transforming the n-block structure of the training sequence into a spatial structure of points in a unit hypercube, such that the longer is the common suffix shared by any two n-blocks, the closer lie their point representations. Such a transformation embodies a Markov assumption - n-blocks with long common suffixes are likely to produce similar continuations. Finding a set of prediction contexts is formulated as a resource allocation problem solved by vector quantizing the spatial n-block representation. We compare our model with both the classical and variable memory length Markov models on three data sets with different memory and stochastic components. Our models have a superior performance, yet, their construction is fully automatic, which is shown to be problematic in the case of VLMMs.
Building Predictive Models from Fractal Representations of Symbolic Sequences
Tiรฑo, Peter, Dorffner, Georg
We propose a novel approach for building finite memory predictive models similar in spirit to variable memory length Markov models (VLMMs). The models are constructed by first transforming the n-block structure of the training sequence into a spatial structure of points in a unit hypercube, such that the longer is the common suffix shared by any two n-blocks, the closer lie their point representations. Such a transformation embodies a Markov assumption - n-blocks with long common suffixes are likely to produce similar continuations. Finding a set of prediction contexts is formulated as a resource allocation problem solved by vector quantizing the spatial n-block representation. We compare our model with both the classical and variable memory length Markov models on three data sets with different memory and stochastic components. Our models have a superior performance, yet, their construction is fully automatic, which is shown to be problematic in the case of VLMMs.
Graded Grammaticality in Prediction Fractal Machines
Parfitt, Shan, Tiรฑo, Peter, Dorffner, Georg
We introduce a novel method of constructing language models, which avoids some of the problems associated with recurrent neural networks. The method of creating a Prediction Fractal Machine (PFM) [1] is briefly described and some experiments are presented which demonstrate the suitability of PFMs for language modeling. PFMs distinguish reliably between minimal pairs, and their behavior is consistent with the hypothesis [4] that wellformedness is'graded' not absolute. A discussion of their potential to offer fresh insights into language acquisition and processing follows.
Experiences with Bayesian Learning in a Real World Application
Sykacek, Peter, Dorffner, Georg, Rappelsberger, Peter, Zeitlhofer, Josef
This paper reports about an application of Bayes' inferred neural network classifiers in the field of automatic sleep staging. The reason for using Bayesian learning for this task is twofold. First, Bayesian inference is known to embody regularization automatically. Second, a side effect of Bayesian learning leads to larger variance of network outputs in regions without training data. This results in well known moderation effects, which can be used to detect outliers. In a 5 fold cross-validation experiment the full Bayesian solution found with R. Neals hybrid Monte Carlo algorithm, was not better than a single maximum a-posteriori (MAP) solution found with D.J. MacKay's evidence approximation. In a second experiment we studied the properties of both solutions in rejecting classification of movement artefacts.
Experiences with Bayesian Learning in a Real World Application
Sykacek, Peter, Dorffner, Georg, Rappelsberger, Peter, Zeitlhofer, Josef
Sleep staging is usually based on rules defined by Rechtschaffen and Kales (see [8]). Rechtschaffen and Kales rules define 4 sleep stages, stage one to four, as well as rapid eye movement (REM) and wakefulness. In [1] J. Bentrup and S. Ray report that every year nearly one million US citizens consulted their physicians concerning their sleep. Since sleep staging is a tedious task (one all night recording on average takes abou t 3 hours to score manually), much effort was spent in designing automatic sleep stagers. Sleep staging is a classification problem which was solved using classical statistical t.echniques or techniques emerged from the field of artificial intelligence (AI) .
Experiences with Bayesian Learning in a Real World Application
Sykacek, Peter, Dorffner, Georg, Rappelsberger, Peter, Zeitlhofer, Josef
This paper reports about an application of Bayes' inferred neural network classifiers in the field of automatic sleep staging. The reason for using Bayesian learning for this task is twofold. First, Bayesian inference is known to embody regularization automatically. Second, a side effect of Bayesian learning leads to larger variance of network outputs in regions without training data. This results in well known moderation effects, which can be used to detect outliers. In a 5 fold cross-validation experiment the full Bayesian solution found with R. Neals hybrid Monte Carlo algorithm, was not better than a single maximum a-posteriori (MAP) solution found with D.J. MacKay's evidence approximation. In a second experiment we studied the properties of both solutions in rejecting classification of movement artefacts.