AITopics | Somogyvári, Zoltán

Collaborating Authors

Somogyvári, Zoltán

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS)

Hanczár, Gergely, Stippinger, Marcell, Hanák, Dávid, Kurbucz, Marcell T., Törteli, Olivér M., Chripkó, Ágnes, Somogyvári, Zoltán

arXiv.org Artificial IntelligenceMay-25-2023

In recent years, numerous screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features; however, most of these features cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods while simultaneously possessing many advantages over these methods.

artificial intelligence, machine learning, random forest-based multiround screening, (3 more...)

arXiv.org Artificial Intelligence

doi: 10.1088/2632-2153/ad020e

2305.15793

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.60)

Add feedback

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space

Stippinger, Marcell, Hanák, Dávid, Kurbucz, Marcell T., Hanczár, Gergely, Törteli, Olivér M., Somogyvári, Zoltán

arXiv.org Artificial IntelligenceApr-25-2023

The lack of freely available (real-life or synthetic) high or ultra-high dimensional, multi-class datasets may hamper the rapidly growing research on feature screening, especially in the field of biometrics, where the usage of such datasets is common. This paper reports a Python package called BiometricBlender, which is an ultra-high dimensional, multi-class synthetic data generator to benchmark a wide range of feature screening methods. During the data generation process, the overall usefulness and the intercorrelations of blended features can be controlled by the user, thus the synthetic feature space is able to imitate the key properties of a real biometric dataset. C5 Code versioning system Git used C6 Software code languages, Python tools, and services used C7, Compilation and Python 3.7.1+, Since these datasets typically contain only a relatively few relevant, non-redundant predictors, a screening step that removes irrelevant features prior to the main analysis is often employed for reaching a better prediction accuracy and much faster computation [2].

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.softx.2023.101366

2206.10747

Country: Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Manifold-adaptive dimension estimation revisited

Benkő, Zsigmond, Stippinger, Marcell, Rehus, Roberta, Bencze, Attila, Fabó, Dániel, Hajnal, Boglárka, Erőss, Loránd, Telcs, András, Somogyvári, Zoltán

arXiv.org Machine LearningAug-10-2020

Data dimensionality informs us about data complexity and sets limit on the structure of successful signal processing pipelines. In this work we revisit and improve the manifold-adaptive Farahmand-Szepesv\'ari-Audibert (FSA) dimension estimator, making it one of the best nearest neighbor-based dimension estimators available. We compute the probability density function of local FSA estimates, if the local manifold density is uniform. Based on the probability density function, we propose to use the median of local estimates as a basic global measure of intrinsic dimensionality, and we demonstrate the advantages of this asymptotically unbiased estimator over the previously proposed statistics: the mode and the mean. Additionally, from the probability density function, we derive the maximum likelihood formula for global intrinsic dimensionality, if i.i.d. holds. We tackle edge and finite-sample effects with an exponential correction formula, calibrated on hypercube datasets. We compare the performance of the corrected-median-FSA estimator with kNN estimators: maximum likelihood (ML, Levina-Bickel) and two implementations of DANCo (R and matlab). We show that corrected-median-FSA estimator beats the ML estimator and it is on equal footing with DANCo for standard synthetic benchmarks according to mean percentage error and error rate metrics. With the median-FSA algorithm, we reveal diverse changes in the neural dynamics while resting state and during epileptic seizures. We identify brain areas with lower-dimensional dynamics that are possible causal sources and candidates for being seizure onset zones.

epilepsy, estimator, survey article, (21 more...)

arXiv.org Machine Learning

2008.03221

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

How to find a unicorn: a novel model-free, unsupervised anomaly detection method for time series

Benkő, Zsigmond, Bábel, Tamás, Somogyvári, Zoltán

arXiv.org Machine LearningMay-5-2020

Recognition of anomalous events is a challenging but critical task in many scientific and industrial fields, especially when the properties of anomalies are unknown. In this paper, we present a new anomaly concept called "unicorn" or unique event and present a new, model-independent, unsupervised detection algorithm to detect unicorns. The Temporal Outlier Factor (TOF) is introduced to measure the uniqueness of events in continuous data sets from dynamic systems. The concept of unique events differs significantly from traditional outliers in many aspects: while repetitive outliers are no longer unique events, a unique event is not necessarily outlier in either pointwise or collective sense; it does not necessarily fall out from the distribution of normal activity. The performance of our algorithm was examined in recognizing unique events on different types of simulated data sets with anomalies and it was compared with the standard Local Outlier Factor (LOF). TOF had superior performance compared to LOF even in recognizing traditional outliers and it also recognized unique events that LOF did not. Benefits of the unicorn concept and the new detection method were illustrated by example data sets from very different scientific fields. Our algorithm successfully recognized unique events in those cases where they were already known such as the gravitational waves of a black hole merger on LIGO detector data and the signs of respiratory failure on ECG data series. Furthermore, unique events were found on the LIBOR data set of the last 30 years.

time series, us government, vascular disease, (23 more...)

arXiv.org Machine Learning

2004.11468

Country: North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report > Promising Solution (0.40)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Banking & Finance (1.00)
Health & Medicine > Diagnostic Medicine (0.88)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback