AITopics | proficiency

Collaborating Authors

proficiency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Architecture

Neural Information Processing SystemsApr-24-2026, 04:16:49 GMT

In this section, we provide comprehensive details about the Transformer model architectures considered in this work. We implement all models in PyTorch [61] and adapt the implementation of Transformer-XL from VPT [4]. A.1 Observation Encoding Experiments conducted on both DMLab and RoboMimic include RGB image observations. For models trained on DMLab, we use a ConvNet [29] similar to the one used in Espeholt et al. [20]. For models trained on RoboMimic, we follow Mandlekar et al. [53] to use a ResNet-18 network [29] followed by a spatial-softmax layer [23].

artificial intelligence, curriculum, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Neural Generalized Mixed-Effects Models

Slavutsky, Yuli, Salazar, Sebastian, Blei, David M.

arXiv.org Machine LearningApr-14-2026

Generalized linear mixed-effects models (GLMMs) are widely used to analyze grouped and hierarchical data. In a GLMM, each response is assumed to follow an exponential-family distribution where the natural parameter is given by a linear function of observed covariates and a latent group-specific random effect. Since exact marginalization over the random effects is typically intractable, model parameters are estimated by maximizing an approximate marginal likelihood. In this paper, we replace the linear function with neural networks. The result is a more flexible model, the neural generalized mixed-effects model (NGMM), which captures complex relationships between covariates and responses. To fit NGMM to data, we introduce an efficient optimization procedure that maximizes the approximate marginal likelihood and is differentiable with respect to network parameters. We show that the approximation error of our objective decays at a Gaussian-tail rate in a user-chosen parameter. On synthetic data, NGMM improves over GLMMs when covariate-response relationships are nonlinear, and on real-world datasets it outperforms prior methods. Finally, we analyze a large dataset of student proficiency to demonstrate how NGMM can be extended to more complex latent-variable models.

artificial intelligence, machine learning, mixed-effect model, (18 more...)

arXiv.org Machine Learning

2604.10976

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Tennessee (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

PT ADisc: A Cross-Course Dataset Supporting Personalized Learning in Cold-Start Scenarios

Neural Information Processing SystemsFeb-15-2026, 19:20:45 GMT

The focus of our work is on diagnostic tasks in personalized learning, such as cognitive diagnosis and knowledge tracing. The goal of these tasks is to assess students'

artificial intelligence, machine learning, programming language, (19 more...)

Neural Information Processing Systems

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.47)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.85)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Disentangling Cognitive Diagnosis with Limited Exercise Labels Xiangzhi Chen 1 Le Wu

Neural Information Processing SystemsFeb-10-2026, 10:59:32 GMT

Specifically, we utilize students' response records to model student

artificial intelligence, knowledge concept, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Anhui Province > Hefei (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Michigan (0.04)

Industry: Education > Educational Setting > Online (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency

Guo, Jiacheng, Huang, Suozhi, Yao, Zixin, Zhang, Yifan, Lu, Yifu, Liu, Jiashuo, Li, Zihao, Deng, Nicholas, Xiao, Qixin, Tian, Jia, Zhan, Kanghong, Li, Tianyi, Liu, Xiaochen, Ge, Jason, He, Chaoyang, Huang, Kaixuan, Yang, Lin, Huang, Wenhao, Wang, Mengdi

arXiv.org Artificial IntelligenceDec-11-2025

This paper introduces CryptoBench, the first expert-curated, dynamic benchmark designed to rigorously evaluate the real-world capabilities of Large Language Model (LLM) agents in the uniquely demanding and fast-paced cryptocurrency domain. Unlike general-purpose agent benchmarks for search and prediction, professional crypto analysis presents specific challenges: \emph{extreme time-sensitivity}, \emph{a highly adversarial information environment}, and the critical need to synthesize data from \emph{diverse, specialized sources}, such as on-chain intelligence platforms and real-time Decentralized Finance (DeFi) dashboards. CryptoBench thus serves as a much more challenging and valuable scenario for LLM agent assessment. To address these challenges, we constructed a live, dynamic benchmark featuring 50 questions per month, expertly designed by crypto-native professionals to mirror actual analyst workflows. These tasks are rigorously categorized within a four-quadrant system: Simple Retrieval, Complex Retrieval, Simple Prediction, and Complex Prediction. This granular categorization enables a precise assessment of an LLM agent's foundational data-gathering capabilities alongside its advanced analytical and forecasting skills. Our evaluation of ten LLMs, both directly and within an agentic framework, reveals a performance hierarchy and uncovers a failure mode. We observe a \textit{retrieval-prediction imbalance}, where many leading models, despite being proficient at data retrieval, demonstrate a pronounced weakness in tasks requiring predictive analysis. This highlights a problematic tendency for agents to appear factually grounded while lacking the deeper analytical capabilities to synthesize information.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.00417

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

A Definition of AGI

Hendrycks, Dan, Song, Dawn, Szegedy, Christian, Lee, Honglak, Gal, Yarin, Brynjolfsson, Erik, Li, Sharon, Zou, Andy, Levine, Lionel, Han, Bo, Fu, Jie, Liu, Ziwei, Shin, Jinwoo, Lee, Kimin, Mazeika, Mantas, Phan, Long, Ingebretsen, George, Khoja, Adam, Xie, Cihang, Salaudeen, Olawale, Hein, Matthias, Zhao, Kevin, Pan, Alexander, Duvenaud, David, Li, Bo, Omohundro, Steve, Alfour, Gabriel, Tegmark, Max, McGrew, Kevin, Marcus, Gary, Tallinn, Jaan, Schmidt, Eric, Bengio, Yoshua

arXiv.org Artificial IntelligenceDec-4-2025

The lack of a concrete definition for Artificial General Intelligence (AGI) obscures the gap between today's specialized AI and human-level cognition. This paper introduces a quantifiable framework to address this, defining AGI as matching the cognitive versatility and proficiency of a well-educated adult. To operationalize this, we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition. The framework dissects general intelligence into ten core cognitive domains-including reasoning, memory, and perception-and adapts established human psychometric batteries to evaluate AI systems. Application of this framework reveals a highly "jagged" cognitive profile in contemporary models. While proficient in knowledge-intensive domains, current AI systems have critical deficits in foundational cognitive machinery, particularly long-term memory storage. The resulting AGI scores (e.g., GPT-4 at 27%, GPT-5 at 57%) concretely quantify both rapid progress and the substantial gap remaining before AGI.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.18212

Country:

North America > United States > California (0.28)
Europe > United Kingdom > England (0.27)

Genre: Research Report (0.51)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)
Education (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

A Coherence-Based Measure of AGI

Fourati, Fares

arXiv.org Artificial IntelligenceDec-1-2025

Recent approaches to evaluating Artificial General Intelligence (AGI) typically summarize a system's capability using the arithmetic mean of its proficiencies across multiple cognitive domains. While simple, this implicitly assumes compensability: exceptional performance in some areas can offset severe deficiencies in others. Genuine general intelligence, however, requires coherent sufficiency: balanced competence across all essential faculties. We introduce a coherence-based measure of AGI that integrates the generalized mean over a continuum of compensability exponents. This yields an area-under-the-curve (AUC) metric spanning arithmetic, geometric, and harmonic regimes, quantifying how robust an evaluated capability remains as compensability assumptions become stricter. Unlike the arithmetic mean, which rewards specialization, the AUC penalizes imbalance and exposes bottlenecks that constrain performance. To illustrate the framework, we apply it to cognitive profiles derived from the Cattell-Horn-Carroll (CHC) model, showing how coherence-based aggregation highlights imbalances that are obscured by arithmetic averaging. As a second, independent example, we apply the same methodology to a set of 17 heterogeneous benchmarks, demonstrating how coherence-based evaluation can reveal unevenness even in narrower task collections. These examples show that the proposed approach offers a principled, interpretable, and stricter foundation for measuring progress toward AGI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.20784

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.74)
(2 more...)

Add feedback

Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels

Das, Sourya Dipta, Kumar, Shubham, Yadav, Kuldeep

arXiv.org Artificial IntelligenceNov-18-2025

Grammar competency estimation is essential for assessing linguistic proficiency in both written and spoken language; however, the spoken modality presents additional challenges due to its spontaneous, unstructured, and disfluent nature. Developing accurate grammar scoring models further requires extensive expert annotation, making large-scale data creation impractical. To address these limitations, we propose a zero-shot grammar competency estimation framework that leverages unlabeled data and Large Language Models (LLMs) without relying on manual labels. During training, we employ LLM-generated predictions on unlabeled data by using grammar competency rubric-based prompts. These predictions, treated as pseudo labels, are utilized to train a transformer-based model through a novel training framework designed to handle label noise effectively. We show that the choice of LLM for pseudo-label generation critically affects model performance and that the ratio of clean-to-noisy samples during training strongly influences stability and accuracy. Finally, a qualitative analysis of error intensity and score prediction confirms the robustness and interpretability of our approach. Experimental results demonstrate the efficacy of our approach in estimating grammar competency scores with high accuracy, paving the way for scalable, low-resource grammar assessment systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.13152

Genre: Research Report > New Finding (0.66)

Industry:

Education > Assessment & Standards (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MetaGDPO: Alleviating Catastrophic Forgetting with Metacognitive Knowledge through Group Direct Preference Optimization

Zhang, Lanxue, Xie, Yuqiang, Fang, Fang, Dong, Fanglong, Liu, Rui, Cao, Yanan

arXiv.org Artificial IntelligenceNov-18-2025

Large Language Models demonstrate strong reasoning capabilities, which can be effectively compressed into smaller models. However, existing datasets and fine-tuning approaches still face challenges that lead to catastrophic forgetting, particularly for models smaller than 8B. First, most datasets typically ignore the relationship between training data knowledge and the model's inherent abilities, making it difficult to preserve prior knowledge. Second, conventional training objectives often fail to constrain inherent knowledge preservation, which can result in forgetting of previously learned skills. To address these issues, we propose a comprehensive solution that alleviates catastrophic forgetting from both the data and fine-tuning approach perspectives. On the data side, we construct a dataset of 5K instances that covers multiple reasoning tasks and incorporates metacognitive knowledge, making it more tolerant and effective for distillation into smaller models. We annotate the metacognitive knowledge required to solve each question and filter the data based on task knowledge and the model's inherent skills. On the training side, we introduce GDPO (Group Direction Preference Optimization), which is better suited for resource-limited scenarios and can efficiently approximate the performance of GRPO. Guided by the large model and by implicitly constraining the optimization path through a reference model, GDPO enables more effective knowledge transfer from the large model and constrains excessive parameter drift. Extensive experiments demonstrate that our approach significantly alleviates catastrophic forgetting and improves reasoning performance on smaller models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.12113

Country:

Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

proficiency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

3a14ae9951e8153a8fc814b5f506b5b7-Paper-Conference.pdf

Architecture

Neural Generalized Mixed-Effects Models

PT ADisc: A Cross-Course Dataset Supporting Personalized Learning in Cold-Start Scenarios

Disentangling Cognitive Diagnosis with Limited Exercise Labels Xiangzhi Chen 1 Le Wu

CryptoBench: A Dynamic Benchmark for Expert-Level Evaluation of LLM Agents in Cryptocurrency

A Definition of AGI

A Coherence-Based Measure of AGI

Zero-Shot Grammar Competency Estimation Using Large Language Model Generated Pseudo Labels

MetaGDPO: Alleviating Catastrophic Forgetting with Metacognitive Knowledge through Group Direct Preference Optimization