Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning

Liu, Michael Xieyang, Petridis, Savvas, Tsai, Vivian, Fiannaca, Alexander J., Olwal, Alex, Terry, Michael, Cai, Carrie J.

Jan-26-2025–arXiv.org Artificial Intelligence

Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we discuss how unique characteristics of MLLMs--such as hallucinations and inconsistent responses--can impact the sensor-creation process. These findings contribute to the design of future intelligent sensing systems that are intuitive and customizable by everyday users.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Jan-26-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Wisconsin (0.04)
    - Michigan (0.04)
    - Virginia (0.04)
    - Colorado > Denver County
      - Denver (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Santa Clara County
      - Mountain View (0.04)
    - Massachusetts > Middlesex County
      - Cambridge (0.04)
    - Washington > King County
      - Seattle (0.04)
    - New York > New York County
      - New York City (0.06)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.14)
  - Canada > Ontario
    - Toronto (0.04)
- Europe
  - Germany > Hamburg (0.04)
  - Slovenia > Drava
    - Municipality of Benedikt > Benedikt (0.04)
  - Italy > Sardinia
    - Cagliari (0.05)
- Asia
  - Middle East > Jordan (0.04)
  - South Korea > Seoul
    - Seoul (0.04)

Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.66)

Industry:
- Health & Medicine > Consumer Health (0.68)
- Information Technology > Security & Privacy (0.45)

Technology:
- Information Technology
  - Human Computer Interaction > Interfaces (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (1.00)
    - Cognitive Science (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found