A Pilot Evaluation of ChatGPT and DALL-E 2 on Decision Making and Spatial Reasoning
Tang, Zhisheng, Kejriwal, Mayank
–arXiv.org Artificial Intelligence
An early popular example is the Bidirectional Encoder Representations from Transformer (BERT) model [2], which soon led to many domain-specific variants, as well as a more optimized version that was able to yield significant improvements without major changes to the original BERT architecture [3]. Perhaps because of its success, researchers have been attempting to empirically understand the properties (including biases and blind spots [4]) of even early transformer models such as BERT, along multiple dimensions [5-7]. While these tests, some of which have been adversarial by design, have revealed some problems, a growing body of research also shows that these models have achieved truly impressive, non-incremental performance advances on various natural language understanding problems [8]. While it can be convenient to overweight mistakes by the models, especially if the mistakes are'un-humanlike' and made in seemingly simple situations, and to dismiss them as incapable of semantics or symbolic processing, such commentating potentially opens the door to confirmation bias. We are not denying the utility of critical and adversarial testing of such models [9,10]; however, we do caution that there is a danger of their interpretations being taken out of context. Arguably, the latest transformer models, such as ChatGPT and DALL-E, captured the public spotlight by being able to process relatively complex human inputs with unprecedented skill [11]. They have also ignited an AI arms race of sorts between large technology corporations. Some of this discourse is hyped, but some could be argued to be justified as correctly describing a major leap in AI progress, at least in an empirical sense [12, 13].
arXiv.org Artificial Intelligence
Feb-15-2023
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Technology: