A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning

Feb-15-2023–arXiv.org Artificial Intelligence

An early popular example is the Bidirectional Encoder Representations from Transformer (BERT) model [2], which soon led to many domain-specific variants, as well as a more optimized version that was able to yield significant improvements without major changes to the original BERT architecture [3]. Perhaps because of its success, researchers have been attempting to empirically understand the properties (including biases and blind spots [4]) of even early transformer models such as BERT, along multiple dimensions [5-7]. While these tests, some of which have been adversarial by design, have revealed some problems, a growing body of research also shows that these models have achieved truly impressive, non-incremental performance advances on various natural language understanding problems [8]. While it can be convenient to overweight mistakes by the models, especially if the mistakes are'un-humanlike' and made in seemingly simple situations, and to dismiss them as incapable of semantics or symbolic processing, such commentating potentially opens the door to confirmation bias. We are not denying the utility of critical and adversarial testing of such models [9,10]; however, we do caution that there is a danger of their interpretations being taken out of context. Arguably, the latest transformer models, such as ChatGPT and DALL-E, captured the public spotlight by being able to process relatively complex human inputs with unprecedented skill [11]. They have also ignited an AI arms race of sorts between large technology corporations. Some of this discourse is hyped, but some could be argued to be justified as correctly describing a major leap in AI progress, at least in an empirical sense [12, 13].

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Feb-15-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning > Generative AI (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found