AITopics | Tang, Zhisheng

Collaborating Authors

Tang, Zhisheng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Humanlike Cognitive Patterns as Emergent Phenomena in Large Language Models

Tang, Zhisheng, Kejriwal, Mayank

arXiv.org Artificial IntelligenceDec-19-2024

Research on emergent patterns in Large Language Models (LLMs) has gained significant traction in both psychology and artificial intelligence, motivating the need for a comprehensive review that offers a synthesis of this complex landscape. In this article, we systematically review LLMs' capabilities across three important cognitive domains: decision-making biases, reasoning, and creativity. We use empirical studies drawing on established psychological tests and compare LLMs' performance to human benchmarks. On decision-making, our synthesis reveals that while LLMs demonstrate several human-like biases, some biases observed in humans are absent, indicating cognitive patterns that only partially align with human decision-making. On reasoning, advanced LLMs like GPT-4 exhibit deliberative reasoning akin to human System-2 thinking, while smaller models fall short of human-level performance. A distinct dichotomy emerges in creativity: while LLMs excel in language-based creative tasks, such as storytelling, they struggle with divergent thinking tasks that require real-world context. Nonetheless, studies suggest that LLMs hold considerable potential as collaborators, augmenting creativity in human-machine problem-solving settings. Discussing key limitations, we also offer guidance for future research in areas such as memory, attention, and open-source model development.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.15501

Country: North America > United States > California (0.14)

Genre:

Overview (1.00)
Research Report > Experimental Study (0.88)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GRASP: A Grid-Based Benchmark for Evaluating Commonsense Spatial Reasoning

Tang, Zhisheng, Kejriwal, Mayank

arXiv.org Artificial IntelligenceJul-1-2024

Spatial reasoning, an important faculty of human cognition with many practical applications, is one of the core commonsense skills that is not purely language-based and, for satisfying (as opposed to optimal) solutions, requires some minimum degree of planning. Existing benchmarks of Commonsense Spatial Reasoning (CSR) tend to evaluate how Large Language Models (LLMs) interpret text-based spatial descriptions rather than directly evaluate a plan produced by the LLM in response to a spatial reasoning scenario. In this paper, we construct a large-scale benchmark called $\textbf{GRASP}$, which consists of 16,000 grid-based environments where the agent is tasked with an energy collection problem. These environments include 100 grid instances instantiated using each of the 160 different grid settings, involving five different energy distributions, two modes of agent starting position, and two distinct obstacle configurations, as well as three kinds of agent constraints. Using GRASP, we compare classic baseline approaches, such as random walk and greedy search methods, with advanced LLMs like GPT-3.5-Turbo and GPT-4o. The experimental results indicate that even these advanced LLMs struggle to consistently achieve satisfactory solutions.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2407.01892

Country: North America > United States > California (0.14)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

An Evaluation of Estimative Uncertainty in Large Language Models

Tang, Zhisheng, Shen, Ke, Kejriwal, Mayank

arXiv.org Artificial IntelligenceMay-23-2024

Words of estimative probability (WEPs), such as ''maybe'' or ''probably not'' are ubiquitous in natural language for communicating estimative uncertainty, compared with direct statements involving numerical probability. Human estimative uncertainty, and its calibration with numerical estimates, has long been an area of study -- including by intelligence agencies like the CIA. This study compares estimative uncertainty in commonly used large language models (LLMs) like GPT-4 and ERNIE-4 to that of humans, and to each other. Here we show that LLMs like GPT-3.5 and GPT-4 align with human estimates for some, but not all, WEPs presented in English. Divergence is also observed when the LLM is presented with gendered roles and Chinese contexts. Further study shows that an advanced LLM like GPT-4 can consistently map between statistical and estimative uncertainty, but a significant performance gap remains. The results contribute to a growing body of research on human-LLM alignment.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2405.15185

Country: North America > United States > California (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.70)

Industry: Government > Regional Government > North America Government > United States Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.78)

Add feedback

A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning

Tang, Zhisheng, Kejriwal, Mayank

arXiv.org Artificial IntelligenceFeb-15-2023

An early popular example is the Bidirectional Encoder Representations from Transformer (BERT) model [2], which soon led to many domain-specific variants, as well as a more optimized version that was able to yield significant improvements without major changes to the original BERT architecture [3]. Perhaps because of its success, researchers have been attempting to empirically understand the properties (including biases and blind spots [4]) of even early transformer models such as BERT, along multiple dimensions [5-7]. While these tests, some of which have been adversarial by design, have revealed some problems, a growing body of research also shows that these models have achieved truly impressive, non-incremental performance advances on various natural language understanding problems [8]. While it can be convenient to overweight mistakes by the models, especially if the mistakes are'un-humanlike' and made in seemingly simple situations, and to dismiss them as incapable of semantics or symbolic processing, such commentating potentially opens the door to confirmation bias. We are not denying the utility of critical and adversarial testing of such models [9,10]; however, we do caution that there is a danger of their interpretations being taken out of context. Arguably, the latest transformer models, such as ChatGPT and DALL-E, captured the public spotlight by being able to process relatively complex human inputs with unprecedented skill [11]. They have also ignited an AI arms race of sorts between large technology corporations. Some of this discourse is hyped, but some could be argued to be justified as correctly describing a major leap in AI progress, at least in an empirical sense [12, 13].

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.09068

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback