AITopics | Arzamasov, Vadim

Collaborating Authors

Arzamasov, Vadim

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Generalizability of experimental studies

Matteucci, Federico, Arzamasov, Vadim, Cribeiro-Ramallo, Jose, Heyden, Marco, Ntounas, Konstantin, Böhm, Klemens

arXiv.org Artificial IntelligenceJun-25-2024

Experimental studies are a cornerstone of machine learning (ML) research. A common, but often implicit, assumption is that the results of a study will generalize beyond the study itself, e.g. to new data. That is, there is a high probability that repeating the study under different conditions will yield similar results. Despite the importance of the concept, the problem of measuring generalizability remains open. This is probably due to the lack of a mathematical formalization of experimental studies. In this paper, we propose such a formalization and develop a quantifiable notion of generalizability. This notion allows to explore the generalizability of existing studies and to estimate the number of experiments needed to achieve the generalizability of new studies. To demonstrate its usefulness, we apply it to two recently published benchmarks to discern generalizable and non-generalizable results. We also publish a Python module that allows our analysis to be repeated for other experimental studies.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2406.17374

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)

Add feedback

Generative Subspace Adversarial Active Learning for Outlier Detection in Multiple Views of High-dimensional Data

Cribeiro-Ramallo, Jose, Arzamasov, Vadim, Matteucci, Federico, Wambold, Denis, Böhm, Klemens

arXiv.org Artificial IntelligenceApr-20-2024

Outlier detection in high-dimensional tabular data is an important task in data mining, essential for many downstream tasks and applications. Existing unsupervised outlier detection algorithms face one or more problems, including inlier assumption (IA), curse of dimensionality (CD), and multiple views (MV). To address these issues, we introduce Generative Subspace Adversarial Active Learning (GSAAL), a novel approach that uses a Generative Adversarial Network with multiple adversaries. These adversaries learn the marginal class probability functions over different data subspaces, while a single generator in the full space models the entire distribution of the inlier class. GSAAL is specifically designed to address the MV limitation while also handling the IA and CD, being the only method to do so. We provide a comprehensive mathematical formulation of MV, convergence guarantees for the discriminators, and scalability results for GSAAL. Our extensive experiments demonstrate the effectiveness and scalability of GSAAL, highlighting its superior performance compared to other popular OD methods, especially in MV scenarios.

data mining, gsaal, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2404.14451

Country: North America > United States > New Jersey > Mercer County > Princeton (0.14)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A benchmark of categorical encoders for binary classification

Matteucci, Federico, Arzamasov, Vadim, Boehm, Klemens

arXiv.org Artificial IntelligenceNov-20-2023

Categorical encoders transform categorical features into numerical representations that are indispensable for a wide range of machine learning models. Existing encoder benchmark studies lack generalizability because of their limited choice of (1) encoders, (2) experimental factors, and (3) datasets. Additionally, inconsistencies arise from the adoption of varying aggregation strategies. This paper is the most comprehensive benchmark of categorical encoders to date, including an extensive evaluation of 32 configurations of encoders from diverse families, with 36 combinations of experimental factors, and on 50 datasets. The study shows the profound influence of dataset selection, experimental factors, and aggregation strategies on the benchmark's conclusions -- aspects disregarded in previous encoder benchmarks.

artificial intelligence, encoder, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2307.09191

Country:

North America > United States (0.14)
Europe > Germany > Baden-Württemberg (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Adaptive Bernstein Change Detector for High-Dimensional Data Streams

Heyden, Marco, Fouché, Edouard, Arzamasov, Vadim, Fenn, Tanja, Kalinke, Florian, Böhm, Klemens

arXiv.org Artificial IntelligenceJun-22-2023

Change detection is of fundamental importance when analyzing data streams. Detecting changes both quickly and accurately enables monitoring and prediction systems to react, e.g., by issuing an alarm or by updating a learning algorithm. However, detecting changes is challenging when observations are high-dimensional. In high-dimensional data, change detectors should not only be able to identify when changes happen, but also in which subspace they occur. Ideally, one should also quantify how severe they are. Our approach, ABCD, has these properties. ABCD learns an encoder-decoder model and monitors its accuracy over a window of adaptive size. ABCD derives a change score based on Bernstein's inequality to detect deviations in terms of accuracy, which indicate changes. Our experiments demonstrate that ABCD outperforms its best competitor by at least 8% and up to 23% in F1-score on average. It can also accurately estimate changes' subspace, together with a severity measure that correlates with the ground truth.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2306.12974

Country:

Europe > Germany (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry:

Energy (1.00)
Automobiles & Trucks (0.67)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Scenario Discovery via Rule Extraction

Arzamasov, Vadim, Böhm, Klemens

arXiv.org Machine LearningOct-3-2019

Scenario discovery is the process of finding areas of interest, commonly referred to as scenarios, in data spaces resulting from simulations. For instance, one might search for conditions - which are inputs of the simulation model - where the system under investigation is unstable. A commonly used algorithm for scenario discovery is PRIM. It yields scenarios in the form of hyper-rectangles which are human-comprehensible. When the simulation model has many inputs, and the simulations are computationally expensive, PRIM may not produce good results, given the affordable volume of data. So we propose a new procedure for scenario discovery - we train an intermediate statistical model which generalizes fast, and use it to label (a lot of) data for PRIM. We provide the statistical intuition behind our idea. Our experimental study shows that this method is much better than PRIM itself. Specifically, our method reduces the number of simulations runs necessary by 75% on average.

modeling & simulation, scenario discovery, survey article, (17 more...)

arXiv.org Machine Learning

1910.01713

Country: North America > United States > Wisconsin (0.14)

Genre:

Research Report > New Finding (0.87)
Research Report > Promising Solution (0.67)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.50)

Add feedback