AITopics | intrinsic dimension estimation

Collaborating Authors

intrinsic dimension estimation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

IntrinsicDimension,PersistentHomologyand GeneralizationinNeuralNetworks SupplementaryMaterial

Neural Information Processing SystemsFeb-8-2026, 05:24:21 GMT

S1 firsts gives some of the formal definitions and interpretations omitted from the main paper due to space limitations.

artificial intelligence, dimension, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Neural Information Processing SystemsDec-26-2025, 05:06:24 GMT

Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over text domains and various proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant of human texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings of a given text sample. We show that the average intrinsic dimensionality of fluent texts in natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.

intrinsic dimension estimation, name change, robust detection, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Intrinsic Dimension Estimation for Radio Galaxy Zoo using Diffusion Models

Roset, Joan Font-Quer, Mohan, Devina, Scaife, Anna

arXiv.org Artificial IntelligenceNov-17-2025

In this work, we estimate the intrinsic dimension (iD) of the Radio Galaxy Zoo (RGZ) dataset using a score-based diffusion model. We examine how the iD estimates vary as a function of Bayesian neural network (BNN) energy scores, which measure how similar the radio sources are to the MiraBest subset of the RGZ dataset. We find that out-of-distribution sources exhibit higher iD values, and that the overall iD for RGZ exceeds those typically reported for natural image datasets. Furthermore, we analyse how iD varies across Fanaroff-Riley (FR) morphological classes and as a function of the signal-to-noise ratio (SNR). While no relationship is found between FR I and FR II classes, a weak trend toward higher SNR at lower iD. Future work using the RGZ dataset could make use of the relationship between iD and energy scores to quantitatively study and improve the representations learned by various self-supervised learning algorithms.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.1149

Country:

Europe > United Kingdom > England > Greater Manchester > Manchester (0.05)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.47)

Add feedback

A Novel Approach for Intrinsic Dimension Estimation

Özçoban, Kadir, Manguoğlu, Murat, Yetkin, Emrullah Fatih

arXiv.org Machine LearningMar-12-2025

Dimensionality reduction approaches are crucial in various applications of machine learning tasks such as computer vision, robotics, natural language processing, medical diagnosis, recommendation systems or industrial IoT applications such as predictive maintenance which need to generate and process large amounts of data and variables. In general, dimensionality reduction improves the performance of machine learning tasks' by removing redundant features. In this regard, both linear and non-linear dimensionality reduction methods, specifically the manifold learning techniques are particularly efficient since they are based on the preservation of the geometric structure of the original feature space. In this manner, there are several approaches already available and studied extensively in the literature such as principal component analysis (PCA), Multidimensional scaling (MDS), Laplacian Eigenmaps (LE) and other. We refer the reader to (Lee and Verleysen, 2007) for a comprehensive survey of the available methods.

eigenvalue, intrinsic dimension estimation, novel approach, (10 more...)

arXiv.org Machine Learning

2503.09485

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.52)
Overview > Innovation (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Neural Information Processing SystemsJan-19-2025, 09:58:52 GMT

Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over text domains and various proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant of human texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings of a given text sample. We show that the average intrinsic dimensionality of fluent texts in natural language is hovering around the value 9 for several alphabet-based languages and around 7 for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is \approx 1.5 lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.

ai-generated text, intrinsic dimension estimation, intrinsic dimensionality, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

Intrinsic Dimension Estimation for Robust Detection of AI-Generated Texts

Tulchinskii, Eduard, Kuznetsov, Kristian, Kushnareva, Laida, Cherniavskii, Daniil, Barannikov, Serguei, Piontkovskaya, Irina, Nikolenko, Sergey, Burnaev, Evgeny

arXiv.org Artificial IntelligenceOct-31-2023

Rapidly increasing quality of AI-generated content makes it difficult to distinguish between human and AI-generated texts, which may lead to undesirable consequences for society. Therefore, it becomes increasingly important to study the properties of human texts that are invariant over different text domains and varying proficiency of human writers, can be easily calculated for any language, and can robustly separate natural and AI-generated texts regardless of the generation model and sampling method. In this work, we propose such an invariant for human-written texts, namely the intrinsic dimensionality of the manifold underlying the set of embeddings for a given text sample. We show that the average intrinsic dimensionality of fluent texts in a natural language is hovering around the value $9$ for several alphabet-based languages and around $7$ for Chinese, while the average intrinsic dimensionality of AI-generated texts for each language is $\approx 1.5$ lower, with a clear statistical separation between human-generated and AI-generated distributions. This property allows us to build a score-based artificial text detector. The proposed detector's accuracy is stable over text domains, generator models, and human writer proficiency levels, outperforming SOTA detectors in model-agnostic and cross-domain scenarios by a significant margin.

artificial intelligence, chatbot, natural language, (3 more...)

arXiv.org Artificial Intelligence

2306.04723

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)

Add feedback

CA-PCA: Manifold Dimension Estimation, Adapted for Curvature

Gilbert, Anna C., O'Neill, Kevin

arXiv.org Machine LearningSep-23-2023

Much of modern data analysis in high dimensions relies on the premise that data, while embedded in a high-dimensional space, lie on or near a submanifold of lower dimension. This allows one to embed the data in a space of lower dimension while preserving much of the essential structure, with benefits including faster computation and data visualization. This lower dimension, hereafter referred to as the intrinsic dimension (ID) of the underlying manifold, often enters as a parameter of the dimension-reduction scheme. For instance, in each of the Johnson-Lindenstrauss-type results for manifolds by [13] and [4] the target dimension depends on the ID. Furthermore, the ID is a parameter of popular dimension reduction methods such as t-SNE [28] and multidimensional scaling [12, 16]. Therefore, it may be beneficial to estimate the ID before running further analysis since compressing the data too much may destroy underlying structure and it may be computationally expensive to re-run algorithms with a new dimension parameter, if such an error is even detectable.

artificial intelligence, machine learning, manifold, (17 more...)

arXiv.org Machine Learning

2309.13478

Country: Europe > France (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Intrinsic Dimension Estimation Using Packing Numbers

Neural Information Processing SystemsApr-6-2023, 16:17:29 GMT

We propose a new algorithm to estimate the intrinsic dimension of data sets. The method is based on geometric properties of the data and re- quires neither parametric assumptions on the data generating model nor input parameters to set. The method is compared to a similar, widely- used algorithm from the same family of geometric techniques. Experi- ments show that our method is more robust in terms of the data generating distribution and more reliable in the presence of noise.

algorithm, intrinsic dimension estimation, packing number

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.56)

Add feedback

Intrinsic dimension estimation for discrete metrics

Macocco, Iuri, Glielmo, Aldo, Grilli, Jacopo, Laio, Alessandro

arXiv.org Artificial IntelligenceMar-12-2023

Real world-datasets characterized by discrete features are ubiquitous: from categorical surveys to clinical questionnaires, from unweighted networks to DNA sequences. Nevertheless, the most common unsupervised dimensional reduction methods are designed for continuous spaces, and their use for discrete spaces can lead to errors and biases. In this letter we introduce an algorithm to infer the intrinsic dimension (ID) of datasets embedded in discrete spaces. We demonstrate its accuracy on benchmark datasets, and we apply it to analyze a metagenomic dataset for species fingerprinting, finding a surprisingly small ID, of order 2. This suggests that evolutive pressure acts on a low-dimensional manifold despite the high-dimensionality of sequences' space.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1103/PhysRevLett.130.067401

2207.09688

Country: Europe > Italy > Friuli Venezia Giulia > Trieste Province > Trieste (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Scikit-dimension: a Python package for intrinsic dimension estimation

Bac, Jonathan, Mirkes, Evgeny M., Gorban, Alexander N., Tyukin, Ivan, Zinovyev, Andrei

arXiv.org Machine LearningSep-6-2021

Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source Python package for intrinsic dimension estimation. \texttt{scikit-dimension} package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface to evaluate global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data. The source code is available from https://github.com/j-bac/scikit-dimension , the documentation is available from https://scikit-dimension.readthedocs.io .

dataset, estimation, estimator, (15 more...)

arXiv.org Machine Learning

2109.02596

Country:

Asia > Russia (0.14)
Europe > France > Île-de-France > Paris > Paris (0.05)
Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback