AITopics | knowledge capacity

Collaborating Authors

knowledge capacity

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Neural Information Processing SystemsMar-18-2026, 08:28:06 GMT

Expert-designed close-ended benchmarks are indispensable in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations

artificial intelligence, large language model, natural language, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

149578235c90954e4f10e40fa181918f-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-8-2026, 07:06:20 GMT

knowledge capacity, llm, perturbation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.67)

Industry:

Education (0.94)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Scaling Latent Reasoning via Looped Language Models

Zhu, Rui-Jie, Wang, Zixuan, Hua, Kai, Zhang, Tianyu, Li, Ziniu, Que, Haoran, Wei, Boyi, Wen, Zixin, Yin, Fan, Xing, He, Li, Lu, Shi, Jiajun, Ma, Kaijing, Li, Shanda, Kergan, Taylor, Smith, Andrew, Qu, Xingwei, Hui, Mude, Wu, Bohong, Min, Qiyang, Huang, Hongzhi, Zhou, Xun, Ye, Wei, Liu, Jiaheng, Yang, Jian, Shi, Yunfeng, Lin, Chenghua, Zhao, Enduo, Cai, Tianle, Zhang, Ge, Huang, Wenhao, Bengio, Yoshua, Eshraghian, Jason

arXiv.org Artificial IntelligenceNov-19-2025

Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We present and open-source Ouro, named after the recursive Ouroboros, a family of pre-trained Looped Language Models (LoopLM) that instead build reasoning into the pre-training phase through (i) iterative computation in latent space, (ii) an entropy-regularized objective for learned depth allocation, and (iii) scaling to 7.7T tokens. Ouro 1.4B and 2.6B models enjoy superior performance that match the results of up to 12B SOTA LLMs across a wide range of benchmarks. Through controlled experiments, we show this advantage stems not from increased knowledge capacity, but from superior knowledge manipulation capabilities. We also show that LoopLM yields reasoning traces more aligned with final outputs than explicit CoT. We hope our results show the potential of LoopLM as a novel scaling direction in the reasoning era. Our model is available here: http://ouro-llm.github.io.

large language model, machine learning, recurrent step, (19 more...)

arXiv.org Artificial Intelligence

2510.25741

Country: North America > United States > Minnesota (0.27)

Genre: Research Report > New Finding (1.00)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Beyond Parameters: Exploring Virtual Logic Depth for Scaling Laws

Zhu, Ruike, Zhang, Hanwen, Li, Kevin, Shi, Tianyu, Duan, Yiqun, Wang, Chi, Zhou, Tianyi, Banerjee, Arindam, Qin, Zengyi

arXiv.org Artificial IntelligenceOct-14-2025

Scaling large language models typically involves three dimensions: depth, width, and parameter count. In this work, we explore a fourth dimension, \textbf{virtual logical depth} (VLD), which increases effective algorithmic depth without changing parameter count by reusing weights. While parameter reuse is not new, its role in scaling has been underexplored. Unlike recent test-time methods that scale token-wise, VLD alters the internal computation graph during training and inference. Through controlled experiments, we obtain three key insights. (1) \textit{Knowledge capacity vs. parameters}: at fixed parameter count, VLD leaves knowledge capacity nearly unchanged, while across models capacity still scales with parameters. (2) \textit{Reasoning vs. reuse}: properly implemented VLD substantially improves reasoning ability \emph{without} more parameters, decoupling reasoning from size. This suggests a new scaling path beyond token-wise test-time methods. (3) \textit{Robustness and generality}: reasoning gains persist across architectures and reuse schedules, showing VLD captures a general scaling behavior. These results provide insight into future scaling strategies and raise a deeper question: does superintelligence require ever-larger models, or can it be achieved by reusing parameters and increasing logical depth? We argue many unknown dynamics in scaling remain to be explored. Code is available at https://anonymous.4open.science/r/virtual_logical_depth-8024/.

large language model, machine learning, reasoning capability, (18 more...)

arXiv.org Artificial Intelligence

2506.18233

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.95)

Add feedback

149578235c90954e4f10e40fa181918f-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-9-2025, 19:03:22 GMT

knowledge capacity, llm, perturbation, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)
(10 more...)

Genre:

Research Report > New Finding (1.00)
Overview (0.67)

Industry:

Education (0.94)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PertEval: Unveiling Real Knowledge Capacity of LLMs with Knowledge-Invariant Perturbations

Li, Jiatong, Hu, Renjun, Huang, Kunzhe, Zhuang, Yan, Liu, Qi, Zhu, Mengxiao, Shi, Xing, Lin, Wei

arXiv.org Artificial IntelligenceMay-30-2024

Expert-designed close-ended benchmarks serve as vital tools in assessing the knowledge capacity of large language models (LLMs). Despite their widespread use, concerns have mounted regarding their reliability due to limited test scenarios and an unavoidable risk of data contamination. To rectify this, we present PertEval, a toolkit devised for in-depth probing of LLMs' knowledge capacity through knowledge-invariant perturbations. These perturbations employ human-like restatement techniques to generate on-the-fly test samples from static benchmarks, meticulously retaining knowledge-critical content while altering irrelevant details. Our toolkit further includes a suite of transition analyses that compare performance on raw vs. perturbed test sets to precisely assess LLMs' genuine knowledge capacity. Six state-of-the-art LLMs are re-evaluated using PertEval. Results reveal significantly inflated performance of the LLMs on raw benchmarks, including an absolute 21% overestimation for GPT-4. Additionally, through a nuanced response pattern analysis, we discover that PertEval retains LLMs' uncertainty to specious knowledge, potentially being resolved through rote memorization and leading to inflated performance. We also find that the detailed transition analyses by PertEval could illuminate weaknesses in existing LLMs' knowledge mastery and guide the development of refinement. Given these insights, we posit that PertEval can act as an essential tool that, when applied alongside any close-ended benchmark, unveils the true knowledge capacity of LLMs, marking a significant step toward more trustworthy LLM evaluation.

knowledge capacity, llm, perturbation, (13 more...)

arXiv.org Artificial Intelligence

2405.1974

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > Singapore (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(6 more...)

Genre: Research Report (0.82)

Industry: Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Do Language Models Perform Generalizable Commonsense Inference?

Wang, Peifeng, Ilievski, Filip, Chen, Muhao, Ren, Xiang

arXiv.org Artificial IntelligenceJun-22-2021

Inspired by evidence that pretrained language models (LMs) encode commonsense knowledge, recent work has applied LMs to automatically populate commonsense knowledge graphs (CKGs). However, there is a lack of understanding on their generalization to multiple CKGs, unseen relations, and novel entities. This paper analyzes the ability of LMs to perform generalizable commonsense inference, in terms of knowledge capacity, transferability, and induction. Our experiments with these three aspects show that: (1) LMs can adapt to different schemas defined by multiple CKGs but fail to reuse the knowledge to generalize to new relations. (2) Adapted LMs generalize well to unseen subjects, but less so on novel objects. Future work should investigate how to improve the transferability and induction of commonsense mining from LMs.

ckg, computational linguistic, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2106.11533

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.14)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback