AITopics | difficulty level

The ability to maintain and manipulate information over time is a fundamental aspect of living beings and Artificial Intelligence. While modern models have achieved remarkable success in tasks like natural language processing, evaluating the capacity of novel architectures to process sequential information remains computationally expensive and time-consuming. Testing a new architecture often requires scaling up to massive datasets and models, leading to vast computational costs and slow iteration cycles. In this paper, we propose CogScale, a benchmark of 14 scalable synthetic tasks designed to isolate and evaluate specific cognitive and memory abilities at different parametrizable scales. By providing a standardized, lightweight framework, CogScale allows researchers to rapidly validate architectural innovations before committing to large-scale training. To establish a solid baseline, we evaluate seven distinct architectures: Gated Recurrent Unit (GRU), Long Short-Term Memory (LSTM), xLSTM, Echo State Network (ESN), Mamba, Transformer Decoder, and Transformer Encoder-Decoder. These evaluations are conducted under strict parameter budgets (1k, 10k, and 100k) and across different difficulty levels and scales. Our results show that while classical RNNs and Echo State Networks excel at basic retention within strict parameter budgets, only attention mechanisms and modern state-space models consistently maintain high performance as reasoning complexity and task difficulty scale.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2605.19758

Country: Europe > France (0.28)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Appendix Reinforcement Learning Baselines

Neural Information Processing SystemsApr-25-2026, 05:28:40 GMT

DrQ: This model-free, off-policy reinforcement learning algorithm, is based on Soft Actor-Critic (SAC) [19]. DrQ enhances training stability via applying data augmentation to regularize the Q value of state-action pairs. The key of DrQ is to promote similarity between augmented state-action pairs. The Q-regularization technique is shown in Eq 1, where K is the number of samples, T is the collection of augmentation. Q(f (s,νk),ak) where νk T and ak π( | f (s,νk)) (1) DrQ-v2: An improved version of DrQ. DrQ-v2 fuses essential elements from the DDPG algorithm with data augmentation to strengthen visual RL agents' performance. DrQ-v2 also incorporates techniques such as n-step return and target critic, leading to commendable results in most of the medium and hard level DM-Control tasks. The auxiliary contrastive loss (Eq 3) allows the agent to obtain better image representation during training, thus mitigating the optimization difficulty under high-dimensional inputs.

artificial intelligence, machine learning, reinforcement learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

15c9f64ec172b046470d2a4d2b7669fc-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-25-2026, 05:28:37 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Genre:

Research Report (0.46)
Overview (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.99)

Add feedback

architectures

Neural Information Processing SystemsApr-25-2026, 04:58:01 GMT

A.1 Face experiments For the encoder, we use a resnet-50 backbone followed by projection heads that output pointwise, lower and upper quantile predictions. Each projection head consists of a convolution layer followed by a Leaky-Relu activation and a global average pooling layer. The input to each projection head is the output of the backbone network - a feature map of size 512 4 4 and the output dimension is the number of style dimensions - in the case of the pretrained FFHQ styleGAN2 used in our experiments, this value is 9088. For the generator, we use a FFHQ pretrained styleGAN2 trained to output faces of resolution 1024 1024 obtained from the official implementation. No discriminator is used during training.

artificial intelligence, dimension, experiment, (17 more...)

Neural Information Processing Systems

Genre: Research Report (0.35)

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

218344619d8fb95d504ccfa11804073f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 02:07:54 GMT

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Transportation (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Architecture

Neural Information Processing SystemsApr-24-2026, 04:16:49 GMT

In this section, we provide comprehensive details about the Transformer model architectures considered in this work. We implement all models in PyTorch [61] and adapt the implementation of Transformer-XL from VPT [4]. A.1 Observation Encoding Experiments conducted on both DMLab and RoboMimic include RGB image observations. For models trained on DMLab, we use a ConvNet [29] similar to the one used in Espeholt et al. [20]. For models trained on RoboMimic, we follow Mandlekar et al. [53] to use a ResNet-18 network [29] followed by a spatial-softmax layer [23].

artificial intelligence, curriculum, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

IDGen: ItemDiscriminationInduced PromptGenerationforLLMEvaluation

Neural Information Processing SystemsFeb-17-2026, 01:57:56 GMT

Item Discrimination (ID) theory, which is widely used in educational assessment, measures the ability of individual test items to differentiate between high and low performers. Inspired by this theory, wepropose anID-induced prompt synthesis frameworkforevaluating LLMs to ensure the evaluation set can continually update and refine according to model abilities.

justification, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

5687bf1b132e2e26200c53d04d398e86-Paper-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 17:57:41 GMT

artificial intelligence, learning, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Oceania > Australia > Western Australia > Perth (0.04)
North America > United States > Texas > Dallas County > Richardson (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

Neural Information Processing SystemsFeb-12-2026, 20:57:21 GMT

Aiming to address this limitation, we present Easy2Hard-Bench, a consistently formatted collection of 6 benchmark datasets spanning various domains, such as mathematics and programming problems, chess puzzles, and reasoning questions.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > United States > California (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.92)
Questionnaire & Opinion Survey (0.67)

Industry:

Law (1.00)
Education (1.00)
Information Technology > Security & Privacy (0.93)
(2 more...)

Technology:

Information Technology > Software (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

MeasuringSystematicGeneralization inNeuralProofGenerationwithTransformers

Neural Information Processing SystemsFeb-11-2026, 05:53:55 GMT

Specifically,weperform soft theorem-proving by leveraging TLMs to generate natural language proofs. We test the generated proofs for logical consistency, along with the accuracy of the final inference.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Quebec (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(3 more...)

Genre: Research Report (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.31)

Add feedback

Filters

Collaborating Authors

difficulty level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

CogScale: Scalable Benchmark for Sequence Processing

Appendix Reinforcement Learning Baselines

15c9f64ec172b046470d2a4d2b7669fc-Paper-Datasets_and_Benchmarks.pdf

architectures

218344619d8fb95d504ccfa11804073f-Supplemental.pdf

Architecture

IDGen: ItemDiscriminationInduced PromptGenerationforLLMEvaluation

5687bf1b132e2e26200c53d04d398e86-Paper-Conference.pdf

Easy2Hard-Bench: Standardized Difficulty Labels for Profiling LLM Performance and Generalization

MeasuringSystematicGeneralization inNeuralProofGenerationwithTransformers