On LLM Wizards: Identifying Large Language Models' Behaviors for Wizard of Oz Experiments

Fang, Jingchao, Arechiga, Nikos, Namaoshi, Keiichi, Bravo, Nayeli, Hogan, Candice, Shamma, David A.

Jul-10-2024–arXiv.org Artificial Intelligence

The Wizard of Oz (WoZ) method is a widely adopted research approach where a human Wizard "role-plays" a not readily available technology and interacts with participants to elicit user behaviors and probe the design space. With the growing ability for modern large language models (LLMs) to role-play, one can apply LLMs as Wizards in WoZ experiments with better scalability and lower cost than the traditional approach. However, methodological guidance on responsibly applying LLMs in WoZ experiments and a systematic evaluation of LLMs' role-playing ability are lacking. Through two LLM-powered WoZ studies, we take the first step towards identifying an experiment lifecycle for researchers to safely integrate Figure 1: An overview of our proposed experiment lifecycle LLMs into WoZ experiments and interpret data generated compared to traditional Wizard of Oz experiments. We ask from settings that involve Wizards role-played by LLMs. We also GPT-4 empowered agents to play the role of "Wizards" in contribute a heuristic-based evaluation framework that allows the conversation-based Wizard of Oz experiments. The agents estimation of LLMs' role-playing ability in WoZ experiments and talk to either Simulacrums powered by GPT-4 (in Study 1) or reveals LLMs' behavior patterns at scale.

experiment, wizard, wol, (16 more...)

arXiv.org Artificial Intelligence

Jul-10-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Kansas (0.04)
    - Florida > Orange County
      - Orlando (0.04)
    - Colorado > Denver County
      - Denver (0.04)
    - Massachusetts > Suffolk County
      - Boston (0.04)
    - Hawaii > Honolulu County
      - Honolulu (0.04)
    - Virginia > Arlington County
      - Arlington (0.04)
    - Pennsylvania > Philadelphia County
      - Philadelphia (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - California
      - San Francisco County > San Francisco (0.14)
      - Santa Clara County > Los Altos (0.04)
      - San Diego County > San Diego (0.04)
    - New York > New York County
      - New York City (0.05)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Germany > Hamburg (0.04)
  - United Kingdom
    - Scotland > City of Glasgow
      - Glasgow (0.04)
    - England > Tyne and Wear
      - Sunderland (0.04)
  - Sweden > Vaestra Goetaland
    - Gothenburg (0.04)
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)
  - Middle East > Malta
    - Port Region > Southern Harbour District > Valletta (0.04)
- Asia
  - China > Hong Kong (0.04)
  - Myanmar > Tanintharyi Region
    - Dawei (0.04)
  - Japan
    - Kyūshū & Okinawa > Kyūshū
      - Fukuoka Prefecture > Fukuoka (0.04)
    - Honshū > Kantō
      - Kanagawa Prefecture > Yokohama (0.04)
  - India > Maharashtra
    - Mumbai (0.04)

Genre:
- Research Report > Experimental Study (0.93)
- Personal > Interview (0.93)

Industry:
- Health & Medicine (1.00)
- Energy > Renewable (1.00)
- Automobiles & Trucks (0.94)
- Government (0.92)
- Education (0.92)
- Transportation
  - Ground > Road (1.00)
  - Electric Vehicle (0.69)
  - Passenger (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found