AITopics | cognitive capability

Collaborating Authors

cognitive capability

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

World-aware Planning Narratives Enhance Large Vision-Language Model Planner

Shi, Junhao, Fei, Zhaoye, Wang, Siyin, Guo, Qipeng, Gong, Jingjing, Qiu, Xipeng

arXiv.org Artificial IntelligenceJul-3-2025

Large Vision-Language Models (LVLMs) show promise for embodied planning tasks but struggle with complex scenarios involving unfamiliar environments and multi-step goals. Current approaches rely on environment-agnostic imitation learning that disconnects instructions from environmental contexts, causing models to struggle with context-sensitive instructions and rely on supplementary cues rather than visual reasoning during long-horizon interactions. In this work, we propose World-Aware Planning Narrative Enhancement (WAP), a framework that infuses LVLMs with comprehensive environmental understanding through four cognitive capabilities (visual appearance modeling, spatial reasoning, functional abstraction, and syntactic grounding) while developing and evaluating models using only raw visual observations through curriculum learning. Evaluations on the EB-ALFRED benchmark demonstrate substantial improvements, with Qwen2.5-VL achieving a 60.7 absolute improvement in task success rates, particularly in commonsense reasoning (+60.0) and long-horizon planning (+70.0). Notably, our enhanced open-source models outperform proprietary systems like GPT-4o and Claude-3.5-Sonnet by a large margin.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2506.2123

Genre:

Research Report (0.51)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Line Goes Up? Inherent Limitations of Benchmarks for Evaluating Large Language Models

Fodor, James

arXiv.org Artificial IntelligenceFeb-20-2025

Large language models (LLMs) regularly demonstrate new and impressive performance on a wide range of language, knowledge, and reasoning benchmarks. Such rapid progress has led many commentators to argue that LLM general cognitive capabilities have likewise rapidly improved, with the implication that such models are becoming progressively more capable on various real-world tasks. Here I summarise theoretical and empirical considerations to challenge this narrative. I argue that inherent limitations with the benchmarking paradigm, along with specific limitations of existing benchmarks, render benchmark performance highly unsuitable as a metric for generalisable competence over cognitive tasks. I also contend that alternative methods for assessing LLM capabilities, including adversarial stimuli and interpretability techniques, have shown that LLMs do not have robust competence in many language and reasoning tasks, and often fail to learn representations which facilitate generalisable inferences. I conclude that benchmark performance should not be used as a reliable indicator of general LLM cognitive capabilities.

arxiv preprint arxiv, benchmark, llm, (13 more...)

arXiv.org Artificial Intelligence

2502.14318

Country:

Oceania > Australia > Victoria > Melbourne (0.04)
Asia > Middle East > Yemen > Amran Governorate > Amran (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Parameters of Educability

Valiant, Leslie G.

arXiv.org Artificial IntelligenceDec-12-2024

The educability model is a computational model that has been recently proposed to describe the cognitive capability that makes humans unique among existing biological species on Earth in being able to create advanced civilizations. Educability is defined as a capability for acquiring and applying knowledge. It is intended both to describe human capabilities and, equally, as an aspirational description of what can be usefully realized by machines. While the intention is to have a mathematically well-defined computational model, in constructing an instance of the model there are a number of decisions to make. We call these decisions {\it parameters}. In a standard computer, two parameters are the memory capacity and clock rate. There is no universally optimal choice for either one, or even for their ratio. Similarly, in a standard machine learning system, two parameters are the learning algorithm and the dataset used for training. Again, there are no universally optimal choices known for either. An educable system has many more parameters than either of these two kinds of system. This short paper discusses some of the main parameters of educable systems, and the broader implications of their existence.

artificial intelligence, computational model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.0948

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(3 more...)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Cognitive Capabilities of Generative AI: A Comparative Analysis with Human Benchmarks

Galatzer-Levy, Isaac R., Munday, David, McGiffin, Jed, Liu, Xin, Karmon, Danny, Labzovsky, Ilia, Moroshko, Rivka, Zait, Amir, McDuff, Daniel

arXiv.org Artificial IntelligenceOct-9-2024

There is increasing interest in tracking the capabilities of general intelligence foundation models. This study benchmarks leading large language models and vision language models against human performance on the Wechsler Adult Intelligence Scale (WAIS-IV), a comprehensive, population-normed assessment of underlying human cognition and intellectual abilities, with a focus on the domains of VerbalComprehension (VCI), Working Memory (WMI), and Perceptual Reasoning (PRI). Most models demonstrated exceptional capabilities in the storage, retrieval, and manipulation of tokens such as arbitrary sequences of letters and numbers, with performance on the Working Memory Index (WMI) greater or equal to the 99.5th percentile when compared to human population normative ability. Performance on the Verbal Comprehension Index (VCI) which measures retrieval of acquired information, and linguistic understanding about the meaning of words and their relationships to each other, also demonstrated consistent performance at or above the 98th percentile. Despite these broad strengths, we observed consistently poor performance on the Perceptual Reasoning Index (PRI; range 0.1-10th percentile) from multimodal models indicating profound inability to interpret and reason on visual information. Smaller and older model versions consistently performed worse, indicating that training data, parameter count and advances in tuning are resulting in significant advances in cognitive ability.

cognitive capability, comparative analysis, generative ai, (12 more...)

arXiv.org Artificial Intelligence

2410.07391

Country:

North America > United States (0.28)
Asia (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.67)

Add feedback

Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study

Zhang, Yichi, Huang, Yao, Sun, Yitong, Liu, Chang, Zhao, Zhe, Fang, Zhengwei, Wang, Yifan, Chen, Huanran, Yang, Xiao, Wei, Xingxing, Su, Hang, Dong, Yinpeng, Zhu, Jun

arXiv.org Artificial IntelligenceJun-11-2024

Despite the superior capabilities of Multimodal Large Language Models (MLLMs) across diverse tasks, they still face significant trustworthiness challenges. Yet, current literature on the assessment of trustworthy MLLMs remains limited, lacking a holistic evaluation to offer thorough insights into future improvements. In this work, we establish MultiTrust, the first comprehensive and unified benchmark on the trustworthiness of MLLMs across five primary aspects: truthfulness, safety, robustness, fairness, and privacy. Our benchmark employs a rigorous evaluation strategy that addresses both multimodal risks and cross-modal impacts, encompassing 32 diverse tasks with self-curated datasets. Extensive experiments with 21 modern MLLMs reveal some previously unexplored trustworthiness issues and risks, highlighting the complexities introduced by the multimodality and underscoring the necessity for advanced methodologies to enhance their reliability. For instance, typical proprietary models still struggle with the perception of visually confusing images and are vulnerable to multimodal jailbreaking and adversarial attacks; MLLMs are more inclined to disclose privacy in text and reveal ideological and cultural biases even when paired with irrelevant images in inference, indicating that the multimodality amplifies the internal risks from base LLMs. Additionally, we release a scalable toolbox for standardized trustworthiness research, aiming to facilitate future advancements in this important field. Code and resources are publicly available at: https://multi-trust.github.io/.

gpt-4-vision sharegpt4v llava-rlhf llava-next, internvl-chat internlm-xc internlm-xc2, lvis-instruct4v llava-rlhf llava-next minigpt-4-13b minigpt-4-l2, (16 more...)

arXiv.org Artificial Intelligence

2406.07057

Country:

North America > United States > New York > New York County > New York City (0.14)
Asia > China > Shanghai > Shanghai (0.04)
Africa > Ethiopia (0.04)
(7 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)
Research Report > Promising Solution (0.45)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Energy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Exploring the LLM Journey from Cognition to Expression with Linear Representations

Yan, Yuzi, Li, Jialian, Zhang, Yipin, Yan, Dong

arXiv.org Artificial IntelligenceMay-27-2024

This paper presents an in-depth examination of the evolution and interplay of cognitive and expressive capabilities in large language models (LLMs), with a specific focus on Baichuan-7B and Baichuan-33B, an advanced bilingual (Chinese and English) LLM series. We define and explore the model's cognitive and expressive capabilities through linear representations across three critical phases: Pretraining, Supervised Fine-Tuning (SFT), and Reinforcement Learning from Human Feedback (RLHF). Cognitive capability is defined as the quantity and quality of information conveyed by the neuron output vectors within the network, similar to the neural signal processing in human cognition. Expressive capability is defined as the model's capability to produce word-level output. Our findings unveil a sequential development pattern, where cognitive abilities are largely established during Pretraining, whereas expressive abilities predominantly advance during SFT and RLHF. Statistical analyses confirm a significant correlation between the two capabilities, suggesting that cognitive capacity may limit expressive potential. The paper also explores the theoretical underpinnings of these divergent developmental trajectories and their connection to the LLMs' architectural design. Moreover, we evaluate various optimization-independent strategies, such as few-shot learning and repeated sampling, which bridge the gap between cognitive and expressive capabilities. This research reveals the potential connection between the hidden space and the output space, contributing valuable insights into the interpretability and controllability of their training processes.

cognitive capability, expressive capability, representation, (13 more...)

arXiv.org Artificial Intelligence

2405.16964

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.66)

Industry:

Education (1.00)
Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Assessing the nature of large language models: A caution against anthropocentrism

Speed, Ann

arXiv.org Artificial IntelligenceFeb-5-2024

Generative AI models garnered a large amount of public attention and speculation with the release of OpenAIs chatbot, ChatGPT. At least two opinion camps exist: one excited about possibilities these models offer for fundamental changes to human tasks, and another highly concerned about power these models seem to have. To address these concerns, we assessed several LLMs, primarily GPT 3.5, using standard, normed, and validated cognitive and personality measures. For this seedling project, we developed a battery of tests that allowed us to estimate the boundaries of some of these models capabilities, how stable those capabilities are over a short period of time, and how they compare to humans. Our results indicate that LLMs are unlikely to have developed sentience, although its ability to respond to personality inventories is interesting. GPT3.5 did display large variability in both cognitive and personality measures over repeated observations, which is not expected if it had a human-like personality. Variability notwithstanding, LLMs display what in a human would be considered poor mental health, including low self-esteem, marked dissociation from reality, and in some cases narcissism and psychopathy, despite upbeat and helpful responses.

gpt 3, intelligence, language model, (15 more...)

arXiv.org Artificial Intelligence

2309.07683

Country:

North America > United States > New York (0.04)
North America > United States > Minnesota (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Consumer Health (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy (1.00)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

CognitiveOS: Large Multimodal Model based System to Endow Any Type of Robot with Generative AI

Lykov, Artem, Konenkov, Mikhail, Gbagbe, Koffivi Fidèle, Litvinov, Mikhail, Peter, Robinroy, Davletshin, Denis, Fedoseev, Aleksey, Kobzarev, Oleg, Alabbas, Ali, Alyounes, Oussama, Cabrera, Miguel Altamirano, Tsetserukou, Dzmitry

arXiv.org Artificial IntelligenceJan-29-2024

In cognitive robotics, the scientific community recognized the high generalization capability of these large models as a key to developing a robot that could perform new tasks based on generalized knowledge derived from familiar actions expressed in natural language. However, efforts to apply LLMs in robotics faced challenges, particularly in understanding and processing the external world. Previous attempts to convey the model's understanding of the world through text-only approaches [1], [20], [8] struggled with ambiguities and the assumption of static objects unless interacted with. The introduction of multi-modal transformer-based models such as GPT-4 [16] and Gemini [18], capable of processing images, opened up new possibilities for robotics [5], allowing robots to comprehend their environment and enhancing their'Embodied Experience' [15]. Cognitive robots have been developed on various platforms, ranging from mobile manipulators [5], [3] to bio-inspired humanoid robots [21] and quadrupedal robots [6]. In the latter, cognitive abilities were developed using an'Inner Monologue' approach [10], with improvements inspired by the'Autogen' concept [25]. The cognition of the robot is facilitated through internal communication between agent models, leveraging their strengths to provide different cognitive capabilities to the system.

module, platform, robot, (14 more...)

arXiv.org Artificial Intelligence

2401.16205

Country:

North America > United States > Hawaii (0.06)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Colorado > Boulder County > Boulder (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry: Health & Medicine > Consumer Health (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.50)

Add feedback

What can we know about that which we cannot even imagine?

Wolpert, David H.

arXiv.org Artificial IntelligenceSep-10-2023

It is often argued that the underlying reason for this aversion to thinking is to reduce the associated fitness costs [15, 108]. Indeed, such costs to thinking are not difficult to find. In particular, it turns out that brains are extraordinarily expensive metabolically on a per-unit-mass basis, far more than almost all other organs (the heart and liver being the sole exceptions -- see [29, 108, 79, 16]). Consistent with this, it is not just that the software comprising our minds that seems tailored to reduce metabolic costs; the hardware supporting that software -- the physical architecture of our brains -- also seems tailored to reduce metabolic costs. We do not have a good understanding of exactly how our hardware is used to provide the ability of humans to engage in activities requiring high levels of abstract intelligence.

physical reality, sequence, universe, (17 more...)

arXiv.org Artificial Intelligence

2208.03886

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(8 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.69)
Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Add feedback

Define, Evaluate, and Improve Task-Oriented Cognitive Capabilities for Instruction Generation Models

Zhao, Lingjun, Nguyen, Khanh, Daumé, Hal III

arXiv.org Artificial IntelligenceMay-28-2023

Recent work studies the cognitive capabilities of language models through psychological tests designed for humans. While these studies are helpful for understanding the general capabilities of these models, there is no guarantee that a model possessing sufficient capabilities to pass those tests would actually use those capabilities in performing real-life tasks. In this work, we formulate task-oriented cognitive capabilities, which are human-like cognitive capabilities that language models leverage to perform tasks. These capabilities are (i) the ability to quickly generate good candidate utterances (the search capability) (ii) the ability to predict how a listener interprets those utterances and choose the most appropriate one (the pragmatic capability). We design an evaluation scheme for comparing these capabilities of a language model with those of a human. Applying this scheme to examine various models in a navigation instruction generation problem, we find that their pragmatic capability is severely lacking. This insight leads us to augment them with better models of the listener and obtain a significant boost of 11% in success rate in guiding real humans. Our work advocates for having a principled procedure for aligning language models with humans that involves (i) formulating task-oriented capabilities, (ii) devising a method to quantify their deficiency, and (iii) iteratively improving them.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.05149

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)

Add feedback