AITopics | data mining and knowledge discovery

Collaborating Authors

data mining and knowledge discovery

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Statistical Analysis of Nearest Neighbor Methods for Anomaly Detection

Xiaoyi Gu, Leman Akoglu, Alessandro Rinaldo

Neural Information Processing SystemsFeb-12-2026, 18:05:49 GMT

In this paper we are concerned with investigating theperformance ofNN-based methods foranomaly detection. We firstshowthrough extensivesimulations thatNNmethods compare favorably to some of the other state-of-the-art algorithms for anomaly detection based on a setofbenchmark syntheticdatasets.

data mining, detection, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.95)

Add feedback

Structural Classification of Locally Stationary Time Series Based on Second-order Characteristics

Qian, Chen, Ding, Xiucai, Li, Lexin

arXiv.org Machine LearningJul-11-2025

Time series classification is crucial for numerous scientific and engineering applications. In this article, we present a numerically efficient, practically competitive, and theoretically rigorous classification method for distinguishing between two classes of locally stationary time series based on their time-domain, second-order characteristics. Our approach builds on the autoregressive approximation for locally stationary time series, combined with an ensemble aggregation and a distance-based threshold for classification. It imposes no requirement on the training sample size, and is shown to achieve zero misclassification error rate asymptotically when the underlying time series differ only mildly in their second-order characteristics. The new method is demonstrated to outperform a variety of state-of-the-art solutions, including wavelet-based, tree-based, convolution-based methods, as well as modern deep learning methods, through intensive numerical simulations and a real EEG data analysis for epilepsy classification.

artificial intelligence, machine learning, time sery, (18 more...)

arXiv.org Machine Learning

2507.04237

Country: North America > United States > California (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

MONSTER: Monash Scalable Time Series Evaluation Repository

Dempster, Angus, Foumani, Navid Mohammadi, Tan, Chang Wei, Miller, Lynn, Mishra, Amish, Salehi, Mahsa, Pelletier, Charlotte, Schmidt, Daniel F., Webb, Geoffrey I.

arXiv.org Artificial IntelligenceFeb-20-2025

We introduce Monster--the MONash Scalable Time Series E valuation R epository--a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequence they favour a narrow subspace of models that are optimised to achieve low classification error on a wide variety of smaller datasets, that is, models that minimise variance, and give little weight to computational issues such as scalability. Our hope is to diversify the field by introducing benchmarks using larger datasets. We believe that there is enormous potential for new progress in the field by engaging with the theoretical and practical challenges of learning effectively from larger quantities of data.

cross-validation fold, dataset, time sery classification, (9 more...)

arXiv.org Artificial Intelligence

2502.15122

Country:

Oceania > Australia > Victoria > Melbourne (0.14)
Africa > La Réunion (0.04)
North America > Canada > Yukon (0.04)
(7 more...)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Pressing Intensity: An Intuitive Measure for Pressing in Soccer

Bekkers, Joris

arXiv.org Artificial IntelligenceDec-30-2024

Pressing is a fundamental defensive strategy in football, characterized by applying pressure on the ball owning team to regain possession. Despite its significance, existing metrics for measuring pressing often lack precision or comprehensive consideration of positional data, player movement and speed. This research introduces an innovative framework for quantifying pressing intensity, leveraging advancements in positional tracking data and components from Spearman's Pitch Control model. Our method integrates player velocities, movement directions, and reaction times to compute the time required for a defender to intercept an attacker or the ball. This time-to-intercept measure is then transformed into probabilistic values using a logistic function, enabling dynamic and intuitive analysis of pressing situations at the individual frame level. the model captures how every player's movement influences pressure on the field, offering actionable insights for coaches, analysts, and decision-makers. By providing a robust and intepretable metric, our approach facilitates the identification of pressing strategies, advanced situational analyses, and the derivation of metrics, advancing the analytical capabilities for modern football.

artificial intelligence, data mining, pressing intensity, (13 more...)

arXiv.org Artificial Intelligence

2501.04712

Country: Europe > Netherlands (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Soccer (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining (0.48)

Add feedback

aeon: a Python toolkit for learning from time series

Middlehurst, Matthew, Ismail-Fawaz, Ali, Guillaume, Antoine, Holder, Christopher, Rubio, David Guijo, Bulatova, Guzal, Tsaprounis, Leonidas, Mentel, Lukasz, Walter, Martin, Schäfer, Patrick, Bagnall, Anthony

arXiv.org Artificial IntelligenceJun-20-2024

aeon is a unified Python 3 library for all machine learning tasks involving time series. The package contains modules for time series forecasting, classification, extrinsic regression and clustering, as well as a variety of utilities, transformations and distance measures designed for time series data. aeon also has a number of experimental modules for tasks such as anomaly detection, similarity search and segmentation. aeon follows the scikit-learn API as much as possible to help new users and enable easy integration of aeon estimators with useful tools such as model selection and pipelines. It provides a broad library of time series algorithms, including efficient implementations of the very latest advances in research. Using a system of optional dependencies, aeon integrates a wide variety of packages into a single interface while keeping the core framework with minimal dependencies. The package is distributed under the 3-Clause BSD license and is available at https://github.com/ aeon-toolkit/aeon. This version was submitted to the JMLR journal on 02 Nov 2023 for v0.5.0 of aeon. At the time of this preprint aeon has released v0.9.0, and has had substantial changes.

classification, data mining and knowledge discovery, time sery classification, (9 more...)

arXiv.org Artificial Intelligence

2406.14231

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > United Kingdom > England > Hampshire > Southampton (0.04)
Europe > Spain > Andalusia > Córdoba Province > Córdoba (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.56)

Add feedback

Task and Explanation Network

Sipper, Moshe

arXiv.org Artificial IntelligenceJan-3-2024

Explainability in deep networks has gained increased importance in recent years. We argue herein that an AI must be tasked not just with a task but also with an explanation of why said task was accomplished as such. We present a basic framework--Task and Explanation Network (TENet)--which fully integrates task completion and its explanation. We believe that the field of AI as a whole should insist--quite emphatically--on explainability. With the meteoric rise of AI over the past decade, and in particular deep learning, an issue that has been gaining more and more traction is that of explainability.

explanation, task and explanation network, tenet, (13 more...)

arXiv.org Artificial Intelligence

2401.01732

Country:

North America > United States (0.14)
Europe > Slovenia > Drava > Municipality of Benedikt > Benedikt (0.04)
Asia > Middle East > Israel > Southern District > Beer-Sheva (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Tennis (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

An Approach to Multiple Comparison Benchmark Evaluations that is Stable Under Manipulation of the Comparate Set

Ismail-Fawaz, Ali, Dempster, Angus, Tan, Chang Wei, Herrmann, Matthieu, Miller, Lynn, Schmidt, Daniel F., Berretti, Stefano, Weber, Jonathan, Devanne, Maxime, Forestier, Germain, Webb, Geoffrey I.

arXiv.org Artificial IntelligenceMay-19-2023

The measurement of progress using benchmarks evaluations is ubiquitous in computer science and machine learning. However, common approaches to analyzing and presenting the results of benchmark comparisons of multiple algorithms over multiple datasets, such as the critical difference diagram introduced by Dem\v{s}ar (2006), have important shortcomings and, we show, are open to both inadvertent and intentional manipulation. To address these issues, we propose a new approach to presenting the results of benchmark comparisons, the Multiple Comparison Matrix (MCM), that prioritizes pairwise comparisons and precludes the means of manipulating experimental results in existing approaches. MCM can be used to show the results of an all-pairs comparison, or to show the results of a comparison between one or more selected algorithms and the state of the art. MCM is implemented in Python and is publicly available.

artificial intelligence, data mining, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.11921

Country: Europe > United Kingdom > England > Greater Manchester > Manchester (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Proximity Forest 2.0: A new effective and scalable similarity-based classifier for time series

Herrmann, Matthieu, Tan, Chang Wei, Salehi, Mahsa, Webb, Geoffrey I.

arXiv.org Artificial IntelligenceApr-13-2023

Time series classification (TSC) is a challenging task due to the diversity of types of feature that may be relevant for different classification tasks, including trends, variance, frequency, magnitude, and various patterns. To address this challenge, several alternative classes of approach have been developed, including similarity-based, features and intervals, shapelets, dictionary, kernel, neural network, and hybrid approaches. While kernel, neural network, and hybrid approaches perform well overall, some specialized approaches are better suited for specific tasks. In this paper, we propose a new similarity-based classifier, Proximity Forest version 2.0 (PF 2.0), which outperforms previous state-of-the-art similarity-based classifiers across the UCR benchmark and outperforms state-of-the-art kernel, neural network, and hybrid methods on specific datasets in the benchmark that are best addressed by similarity-base methods. PF 2.0 incorporates three recent advances in time series similarity measures -- (1) computationally efficient early abandoning and pruning to speedup elastic similarity computations; (2) a new elastic similarity measure, Amerced Dynamic Time Warping (ADTW); and (3) cost function tuning. It rationalizes the set of similarity measures employed, reducing the eight base measures of the original PF to three and using the first derivative transform with all similarity measures, rather than a limited subset. We have implemented both PF 1.0 and PF 2.0 in a single C++ framework, making the PF framework more efficient.

data mining, machine learning, similarity measure, (16 more...)

arXiv.org Artificial Intelligence

2304.058

Country:

Oceania > Australia (0.04)
North America > United States > California > Riverside County > Riverside (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Semi-Supervised Constrained Clustering: An In-Depth Overview, Ranked Taxonomy and Future Research Directions

González-Almagro, Germán, Peralta, Daniel, De Poorter, Eli, Cano, José-Ramón, García, Salvador

arXiv.org Artificial IntelligenceFeb-28-2023

Clustering is a well-known unsupervised machine learning approach capable of automatically grouping discrete sets of instances with similar characteristics. Constrained clustering is a semi-supervised extension to this process that can be used when expert knowledge is available to indicate constraints that can be exploited. Well-known examples of such constraints are must-link (indicating that two instances belong to the same group) and cannot-link (two instances definitely do not belong together). The research area of constrained clustering has grown significantly over the years with a large variety of new algorithms and more advanced types of constraints being proposed. However, no unifying overview is available to easily understand the wide variety of available methods, constraints and benchmarks. To remedy this, this study presents in-detail the background of constrained clustering and provides a novel ranked taxonomy of the types of constraints that can be used in constrained clustering. In addition, it focuses on the instance-level pairwise constraints, and gives an overview of its applications and its historical context. Finally, it presents a statistical analysis covering 307 constrained clustering methods, categorizes them according to their features, and provides a ranking score indicating which methods have the most potential based on their popularity and validation quality. Finally, based upon this analysis, potential pitfalls and future research directions are provided.

artificial intelligence, evolutionary algorithm, pattern analysis and machine intelligence, (19 more...)

arXiv.org Artificial Intelligence

2303.00522

Country:

Asia > Middle East > Jordan (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
(7 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
(5 more...)

Add feedback

Composite model of seismic monitoring data analysis during mining operations on the example of the Kukisvumchorrskoye deposit of JSC Apatit

Revin, Ilia

arXiv.org Artificial IntelligenceJan-13-2023

Geomechanical monitoring of a rock massif is an actively developing branch of geomechanics. It is almost impossible to single out a methodology and approaches for data collection and analysis in developing seismic monitoring systems. In the process of mining in rock massif, changes in the state of structural inhomogeneities are most clearly manifested. Existing natural structural inhomogeneities are revealed, there are movements in discontinuous disturbances, and new technogenic disturbances are formed, which are accompanied by a change in the natural stress state of various blocks of the massif. An important task is to develop a mining forecasting model that can take into account the structural heterogeneity of the rock massif and select the necessary forecast horizon depending on monitoring data The developed method of evaluating the results of monitoring geomechanical processes in the rock massif allowed us to forecast of zones of possible rock bursts.

artificial intelligence, evolutionary algorithm, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2301.05701

Genre: Research Report (0.40)

Industry:

Materials > Metals & Mining (1.00)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.93)
Information Technology > Sensing and Signal Processing (0.89)
Information Technology > Data Science (0.83)

Add feedback