AITopics | Large, James

Collaborating Authors

Large, James

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HIVE-COTE 2.0: a new meta ensemble for time series classification

Middlehurst, Matthew, Large, James, Flynn, Michael, Lines, Jason, Bostrom, Aaron, Bagnall, Anthony

arXiv.org Artificial IntelligenceApr-15-2021

The Hierarchical Vote Collective of Transformation-based Ensembles (HIVE-COTE) is a heterogeneous meta ensemble for time series classification. HIVE-COTE forms its ensemble from classifiers of multiple domains, including phase-independent shapelets, bag-of-words based dictionaries and phase-dependent intervals. Since it was first proposed in 2016, the algorithm has remained state of the art for accuracy on the UCR time series classification archive. Over time it has been incrementally updated, culminating in its current state, HIVE-COTE 1.0. During this time a number of algorithms have been proposed which match the accuracy of HIVE-COTE. We propose comprehensive changes to the HIVE-COTE algorithm which significantly improve its accuracy and usability, presenting this upgrade as HIVE-COTE 2.0. We introduce two novel classifiers, the Temporal Dictionary Ensemble (TDE) and Diverse Representation Canonical Interval Forest (DrCIF), which replace existing ensemble members. Additionally, we introduce the Arsenal, an ensemble of ROCKET classifiers as a new HIVE-COTE 2.0 constituent. We demonstrate that HIVE-COTE 2.0 is significantly more accurate than the current state of the art on 112 univariate UCR archive datasets and 26 multivariate UEA archive datasets.

classifier, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s10994-021-06057-9

2104.07551

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Can automated smoothing significantly improve benchmark time series classification algorithms?

Large, James, Southam, Paul, Bagnall, Anthony

arXiv.org Machine LearningNov-1-2018

We assess whether using six smoothing algorithms (moving average, exponential smoothing, Gaussian filter, Savitzky-Golay filter, Fourier approximation and a recursive median sieve) could be automatically applied to time series classification problems as a preprocessing step to improve the performance of three benchmark classifiers (1-Nearest Neighbour with Euclidean and Dynamic Time Warping distances, and Rotation Forest). We found no significant improvement over unsmoothed data even when we set the smoothing parameter through cross validation. We are not claiming smoothing has no worth. It has an important role in exploratory analysis and helps with specific classification problems where domain knowledge can be exploited. What we observe is that the automatic application does not help and that we cannot explain the improvement of other time series classification algorithms over the baseline classifiers simply as a function of the absence of smoothing.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Machine Learning

1811.00894

Genre:

Research Report > New Finding (0.69)
Research Report > Experimental Study > Negative Result (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

The UEA multivariate time series classification archive, 2018

Bagnall, Anthony, Dau, Hoang Anh, Lines, Jason, Flynn, Michael, Large, James, Bostrom, Aaron, Southam, Paul, Keogh, Eamonn

arXiv.org Machine LearningOct-31-2018

In 2002, the UCR time series classification archive was first released with sixteen datasets. It gradually expanded, until 2015 when it increased in size from 45 datasets to 85 datasets. In October 2018 more datasets were added, bringing the total to 128. The new archive contains a wide range of problems, including variable length series, but it still only contains univariate time series classification problems. One of the motivations for introducing the archive was to encourage researchers to perform a more rigorous evaluation of newly proposed time series classification (TSC) algorithms. It has worked: most recent research into TSC uses all 85 datasets to evaluate algorithmic advances. Research into multivariate time series classification, where more than one series are associated with each class label, is in a position where univariate TSC research was a decade ago. Algorithms are evaluated using very few datasets and claims of improvement are not based on statistical comparisons. We aim to address this problem by forming the first iteration of the MTSC archive, to be hosted at the website www.timeseriesclassification.com. Like the univariate archive, this formulation was a collaborative effort between researchers at the University of East Anglia (UEA) and the University of California, Riverside (UCR). The 2018 vintage consists of 30 datasets with a wide range of cases, dimensions and series lengths. For this first iteration of the archive we format all data to be of equal length, include no series with missing data and provide train/test splits.

cardiology, dataset, vascular disease, (24 more...)

arXiv.org Machine Learning

1811.00075

Country:

Europe (0.88)
North America > United States > California > Riverside County > Riverside (0.24)

Genre: Research Report (0.84)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Education (0.66)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

From BOP to BOSS and Beyond: Time Series Classification with Dictionary Based Classifiers

Large, James, Bagnall, Anthony, Malinowski, Simon, Tavenard, Romain

arXiv.org Machine LearningSep-18-2018

A family of algorithms for time series classification (TSC) involve running a sliding window across each series, discretising the window to form a word, forming a histogram of word counts over the dictionary, then constructing a classifier on the histograms. A recent evaluation of two of this type of algorithm, Bag of Patterns (BOP) and Bag of Symbolic Fourier Approximation Symbols (BOSS) found a significant difference in accuracy between these seemingly similar algorithms. We investigate this phenomenon by deconstructing the classifiers and measuring the relative importance of the four key components between BOP and BOSS. We find that whilst ensembling is a key component for both algorithms, the effect of the other components is mixed and more complex. We conclude that BOSS represents the state of the art for dictionary based TSC. Both BOP and BOSS can be classed as bag of words approaches. These are particularly popular in Computer Vision for tasks such as image classification. Converting approaches from vision requires careful engineering. We adapt three techniques used in Computer Vision for TSC: Scale Invariant Feature Transform; Spatial Pyramids; and Histrogram Intersection. We find that using Spatial Pyramids in conjunction with BOSS (SP) produces a significantly more accurate classifier. SP is significantly more accurate than standard benchmarks and the original BOSS algorithm. It is not significantly worse than the best shapelet based approach, and is only outperformed by HIVE-COTE, an ensemble that includes BOSS as a constituent module.

artificial intelligence, classifier, survey article, (18 more...)

arXiv.org Machine Learning

1809.06751

Country:

Europe > United Kingdom (0.14)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.68)

Add feedback

The Heterogeneous Ensembles of Standard Classification Algorithms (HESCA): the Whole is Greater than the Sum of its Parts

Large, James, Lines, Jason, Bagnall, Anthony

arXiv.org Machine LearningOct-25-2017

Building classification models is an intrinsically practical exercise that requires many design decisions prior to deployment. We aim to provide some guidance in this decision making process. Specifically, given a classification problem with real valued attributes, we consider which classifier or family of classifiers should one use. Strong contenders are tree based homogeneous ensembles, support vector machines or deep neural networks. All three families of model could claim to be state-of-the-art, and yet it is not clear when one is preferable to the others. Our extensive experiments with over 200 data sets from two distinct archives demonstrate that, rather than choose a single family and expend computing resources on optimising that model, it is significantly better to build simpler versions of classifiers from each family and ensemble. We show that the Heterogeneous Ensembles of Standard Classification Algorithms (HESCA), which ensembles based on error estimates formed on the train data, is significantly better (in terms of error, balanced error, negative log likelihood and area under the ROC curve) than its individual components, picking the component that is best on train data, and a support vector machine tuned over 1089 different parameter configurations. We demonstrate HESCA+, which contains a deep neural network, a support vector machine and two decision tree forests, is significantly better than its components, picking the best component, and HESCA. We analyse the results further and find that HESCA and HESCA+ are of particular value when the train set size is relatively small and the problem has multiple classes. HESCA is a fast approach that is, on average, as good as state-of-the-art classifiers, whereas HESCA+ is significantly better than average and represents a strong benchmark for future research.

classifier, deep learning, neural network, (23 more...)

arXiv.org Machine Learning

1710.0922

Country: Europe > United Kingdom (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine (0.46)
Transportation > Ground > Rail (0.35)

Add feedback

Simulated Data Experiments for Time Series Classification Part 1: Accuracy Comparison with Default Settings

Bagnall, Anthony, Bostrom, Aaron, Large, James, Lines, Jason

arXiv.org Machine LearningMar-28-2017

There are now a broad range of time series classification (TSC) algorithms designed to exploit different representations of the data. These have been evaluated on a range of problems hosted at the UCR-UEA TSC Archive (www.timeseriesclassification.com), and there have been extensive comparative studies. However, our understanding of why one algorithm outperforms another is still anecdotal at best. This series of experiments is meant to help provide insights into what sort of discriminatory features in the data lead one set of algorithms that exploit a particular representation to be better than other algorithms. We categorise five different feature spaces exploited by TSC algorithms then design data simulators to generate randomised data from each representation. We describe what results we expected from each class of algorithm and data representation, then observe whether these prior beliefs are supported by the experimental evidence. We provide an open source implementation of all the simulators to allow for the controlled testing of hypotheses relating to classifier performance on different data representations. We identify many surprising results that confounded our expectations, and use these results to highlight how an over simplified view of classifier structure can often lead to erroneous prior beliefs. We believe ensembling can often overcome prior bias, and our results support the belief by showing that the ensemble approach adopted by the Hierarchical Collective of Transform based Ensembles (HIVE-COTE) is significantly better than the alternatives when the data representation is unknown, and is significantly better than, or not significantly significantly better than, or not significantly worse than, the best other approach on three out of five of the individual simulators.

artificial intelligence, classifier, machine learning, (14 more...)

arXiv.org Machine Learning

1703.0948

Country: Europe > United Kingdom (0.28)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback