AITopics | data complexity

Collaborating Authors

data complexity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data Complexity of Querying Description Logic Knowledge Bases under Cost-Based Semantics

Bienvenu, Meghyn, Manière, Quentin

arXiv.org Artificial IntelligenceNov-17-2025

In this paper, we study the data complexity of querying inconsistent weighted description logic (DL) knowledge bases under recently-introduced cost-based semantics. In a nutshell, the idea is to assign each interpretation a cost based upon the weights of the violated axioms and assertions, and certain and possible query answers are determined by considering all (resp. some) interpretations having optimal or bounded cost. Whereas the initial study of cost-based semantics focused on DLs between $\mathcal{EL}_\bot$ and $\mathcal{ALCO}$, we consider DLs that may contain inverse roles and role inclusions, thus covering prominent DL-Lite dialects. Our data complexity analysis goes significantly beyond existing results by sharpening several lower bounds and pinpointing the precise complexity of optimal-cost certain answer semantics (no non-trivial upper bound was known). Moreover, while all existing results show the intractability of cost-based semantics, our most challenging and surprising result establishes that if we consider $\text{DL-Lite}^\mathcal{H}_\mathsf{bool}$ ontologies and a fixed cost bound, certain answers for instance queries and possible answers for conjunctive queries can be computed using first-order rewriting and thus enjoy the lowest possible data complexity ($\mathsf{TC}_0$).

artificial intelligence, description logic, interpretation, (19 more...)

arXiv.org Artificial Intelligence

2511.07095

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Description Logic (1.00)

Add feedback

Curriculum-Guided Layer Scaling for Language Model Pretraining

Singh, Karanpartap, Band, Neil, Adeli, Ehsan

arXiv.org Artificial IntelligenceSep-30-2025

As the cost of pretraining large language models grows, there is continued interest in strategies to improve learning efficiency during this core training stage. Motivated by cognitive development, where humans gradually build knowledge as their brains mature, we propose Curriculum-Guided Layer Scaling (CGLS), a framework for compute-efficient pretraining that synchronizes increasing data difficulty with model growth through progressive layer stacking (i.e., gradually adding layers during training). At the 100M parameter scale, using a curriculum transitioning from synthetic short stories to general web data, CGLS outperforms baseline methods on the question-answering benchmarks PIQA and ARC. Our results show that progressively increasing model depth alongside sample difficulty leads to better generalization and zero-shot performance on various downstream benchmarks. Altogether, our findings demonstrate that CGLS unlocks the potential of progressive stacking, offering a simple yet effective strategy for improving generalization on knowledge-intensive and reasoning tasks. Large language models (LLMs) are typically pretrained in a single, continuous pass, processing all tokens with a uniform amount of computation regardless of their complexity or relevance to downstream tasks of interest. While this approach has shown remarkable success in large-scale models like GPT -4 (OpenAI et al., 2023) and Llama 3 (Dubey et al., 2024), it differs significantly from how humans learn, often leading to models that excel in generating coherent text but struggle with long-context reasoning across varied tasks (Schnabel et al., 2025). Recent works like Phi-3 (Abdin et al., 2024), MiniCPM (Hu et al., 2024), and others (Feng et al., 2024) have explored midtraining, adjusting the training data distribution partway through training by incorporating higher-quality, multilingual, or long-form text. However, this coarse-grained curriculum is applied on fixed model architectures. Inspired by how humans progressively build knowledge alongside their physically growing brains, we explore whether gradually scaling a model in tandem with increasingly complex data can enable more efficient and effective learning.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.11389

Genre: Research Report > New Finding (1.00)

Industry:

Education (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Knowledge Distillation for Variational Quantum Convolutional Neural Networks on Heterogeneous Data

Yu, Kai, Cai, Binbin, Lin, Song

arXiv.org Artificial IntelligenceSep-23-2025

Distributed quantum machine learning faces significant challenges due to heterogeneous client data and variations in local model structures, which hinder global model aggregation. To address these challenges, we propose a knowledge distillation framework for variational quantum convolutional neural networks on heterogeneous data. The framework features a quantum gate number estimation mechanism based on client data, which guides the construction of resource-adaptive VQCNN circuits. Particle swarm optimization is employed to efficiently generate personalized quantum models tailored to local data characteristics. During aggregation, a knowledge distillation strategy integrating both soft-label and hard-label supervision consolidates knowledge from heterogeneous clients using a public dataset, forming a global model while avoiding parameter exposure and privacy leakage. Theoretical analysis shows that proposed framework benefits from quantum high-dimensional representation, offering advantages over classical approaches, and minimizes communication by exchanging only model indices and test outputs. Extensive simulations on the PennyLane platform validate the effectiveness of the gate number estimation and distillation-based aggregation. Experimental results demonstrate that the aggregated global model achieves accuracy close to fully supervised centralized training. These results shown that proposed methods can effectively handle heterogeneity, reduce resource consumption, and maintain performance, highlighting its potential for scalable and privacy-preserving distributed quantum learning.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.16699

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Analysing Temporal Reasoning in Description Logics Using Formal Grammars

Bourgaux, Camille, Gnatenko, Anton, Thomazo, Michaël

arXiv.org Artificial IntelligenceAug-4-2025

We establish a correspondence between (fragments of) $\mathcal{TEL}^\bigcirc$, a temporal extension of the $\mathcal{EL}$ description logic with the LTL operator $\bigcirc^k$, and some specific kinds of formal grammars, in particular, conjunctive grammars (context-free grammars equipped with the operation of intersection). This connection implies that $\mathcal{TEL}^\bigcirc$ does not possess the property of ultimate periodicity of models, and further leads to undecidability of query answering in $\mathcal{TEL}^\bigcirc$, closing a question left open since the introduction of $\mathcal{TEL}^\bigcirc$. Moreover, it also allows to establish decidability of query answering for some new interesting fragments of $\mathcal{TEL}^\bigcirc$, and to reuse for this purpose existing tools and algorithms for conjunctive grammars.

artificial intelligence, derivation, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.00575

Country: Europe (0.45)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Description Logic (0.84)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.66)

Add feedback

Tractable Responsibility Measures for Ontology-Mediated Query Answering

Bienvenu, Meghyn, Figueira, Diego, Lafourcade, Pierre

arXiv.org Artificial IntelligenceAug-1-2025

Recent work on quantitative approaches to explaining query answers employs responsibility measures to assign scores to facts in order to quantify their respective contributions to obtaining a given answer. In this paper, we study the complexity of computing such responsibility scores in the setting of ontology-mediated query answering, focusing on a very recently introduced family of Shapley-value-based responsibility measures defined in terms of weighted sums of minimal supports (WSMS). By exploiting results from the database setting, we can show that such measures enjoy polynomial data complexity for classes of ontology-mediated queries that are first-order-rewritable, whereas the problem becomes "shP"-hard when the ontology language can encode reachability queries (via axioms like $\exists R. A \sqsubseteq A$). To better understand the tractability frontier, we next explore the combined complexity of WSMS computation. We prove that intractability applies already to atomic queries if the ontology language supports conjunction, as well as to unions of `well-behaved' conjunctive queries, even in the absence of an ontology. By contrast, our study yields positive results for common DL-Lite dialects: by means of careful analysis, we identify classes of structurally restricted conjunctive queries (which intuitively disallow undesirable interactions between query atoms) that admit tractable WSMS computation.

artificial intelligence, minimal support, query, (17 more...)

arXiv.org Artificial Intelligence

2507.23191

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

CQE under Epistemic Dependencies: Algorithms and Experiments (extended version)

Marconi, Lorenzo, Ricci, Flavia, Rosati, Riccardo

arXiv.org Artificial IntelligenceJul-29-2025

We investigate Controlled Query Evaluation (CQE) over ontologies, where information disclosure is regulated by epistemic dependencies (EDs), a family of logical rules recently proposed for the CQE framework. In particular, we combine EDs with the notion of optimal GA censors, i.e. maximal sets of ground atoms that are entailed by the ontology and can be safely revealed. We focus on answering Boolean unions of conjunctive queries (BUCQs) with respect to the intersection of all optimal GA censors - an approach that has been shown in other contexts to ensure strong security guarantees with favorable computational behavior. First, we characterize the security of this intersection-based approach and identify a class of EDs (namely, full EDs) for which it remains safe. Then, for a subclass of EDs and for DL-Lite_R ontologies, we show that answering BUCQs in the above CQE semantics is in AC^0 in data complexity by presenting a suitable, detailed first-order rewriting algorithm. Finally, we report on experiments conducted in two different evaluation scenarios, showing the practical feasibility of our rewriting function.

artificial intelligence, optimal ga, reachablevia, (14 more...)

arXiv.org Artificial Intelligence

2507.17487

Country: Europe (0.67)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)

Add feedback

A Geometric View of Data Complexity: Efficient Local Intrinsic Dimension Estimation with Diffusion Models

Neural Information Processing SystemsMay-26-2025, 23:04:01 GMT

High-dimensional data commonly lies on low-dimensional submanifolds, and estimating the local intrinsic dimension (LID) of a datum -- i.e. the dimension of the submanifold it belongs to -- is a longstanding problem. LID can be understood as the number of local factors of variation: the more factors of variation a datum has, the more complex it tends to be. Estimating this quantity has proven useful in contexts ranging from generalization in neural networks to detection of out-of-distribution data, adversarial examples, and AI-generated text. The recent successes of deep generative models present an opportunity to leverage them for LID estimation, but current methods based on generative models produce inaccurate estimates, require more than a single pre-trained model, are computationally intensive, or do not exploit the best available deep generative models: diffusion models (DMs). In this work, we show that the Fokker-Planck equation associated with a DM can provide an LID estimator which addresses the aforementioned deficiencies.

artificial intelligence, deep learning, machine learning, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.51)

Add feedback

Uncovering Fairness through Data Complexity as an Early Indicator

Ferreira, Juliett Suárez, Slavkovik, Marija, Casillas, Jorge

arXiv.org Artificial IntelligenceApr-9-2025

Fairness constitutes a concern within machine learning (ML) applications. Currently, there is no study on how disparities in classification complexity between privileged and unprivileged groups could influence the fairness of solutions, which serves as a preliminary indicator of potential unfairness. In this work, we investigate this gap, specifically, we focus on synthetic datasets designed to capture a variety of biases ranging from historical bias to measurement and representational bias to evaluate how various complexity metrics differences correlate with group fairness metrics. We then apply association rule mining to identify patterns that link disproportionate complexity differences between groups with fairness-related outcomes, offering data-centric indicators to guide bias mitigation. Our findings are also validated by their application in real-world problems, providing evidence that quantifying group-wise classification complexity can uncover early indicators of potential fairness challenges. This investigation helps practitioners to proactively address bias in classification tasks.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2504.05923

Country:

North America > United States (0.46)
Europe (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback

Your Model is Overconfident, and Other Lies We Tell Ourselves

Mickus, Timothee, Sinha, Aman, Vázquez, Raúl

arXiv.org Artificial IntelligenceMar-3-2025

The difficulty intrinsic to a given example, rooted in its inherent ambiguity, is a key yet often overlooked factor in evaluating neural NLP models. We investigate the interplay and divergence among various metrics for assessing intrinsic difficulty, including annotator dissensus, training dynamics, and model confidence. Through a comprehensive analysis using 29 models on three datasets, we reveal that while correlations exist among these metrics, their relationships are neither linear nor monotonic. By disentangling these dimensions of uncertainty, we aim to refine our understanding of data complexity and its implications for evaluating and improving NLP models.

ent 0, indicator, ref 1, (16 more...)

arXiv.org Artificial Intelligence

2503.01235

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(14 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

On Deciding the Data Complexity of Answering Linear Monadic Datalog Queries with LTL Operators(Extended Version)

Artale, Alessandro, Gnatenko, Anton, Ryzhikov, Vladislav, Zakharyaschev, Michael

arXiv.org Artificial IntelligenceJan-23-2025

Our concern is the data complexity of answering linear monadic datalog queries whose atoms in the rule bodies can be prefixed by operators of linear temporal logic LTL. We first observe that, for data complexity, answering any connected query with operators $\bigcirc/\bigcirc^-$ (at the next/previous moment) is either in AC0, or in $ACC0\!\setminus\!AC0$, or $NC^1$-complete, or LogSpace-hard and in NLogSpace. Then we show that the problem of deciding LogSpace-hardness of answering such queries is PSpace-complete, while checking membership in the classes AC0 and ACC0 as well as $NC^1$-completeness can be done in ExpSpace. Finally, we prove that membership in AC0 or in ACC0, $NC^1$-completeness, and LogSpace-hardness are undecidable for queries with operators $\Diamond_f/\Diamond_p$ (sometime in the future/past) provided that $NC^1 \ne NLogSpace$, and $LogSpace \ne NLogSpace$.

artificial intelligence, logic & formal reasoning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.13762

Country:

Europe > Italy (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Asia > China (0.04)
(14 more...)

Genre: Research Report (0.49)

Technology:

Information Technology > Databases (0.92)
Information Technology > Communications (0.92)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.67)

Add feedback