Gensors: Authoring Personalized Visual Sensors with Multimodal Foundation Models and Reasoning
Liu, Michael Xieyang, Petridis, Savvas, Tsai, Vivian, Fiannaca, Alexander J., Olwal, Alex, Terry, Michael, Cai, Carrie J.
–arXiv.org Artificial Intelligence
Multimodal large language models (MLLMs), with their expansive world knowledge and reasoning capabilities, present a unique opportunity for end-users to create personalized AI sensors capable of reasoning about complex situations. A user could describe a desired sensing task in natural language (e.g., "alert if my toddler is getting into mischief"), with the MLLM analyzing the camera feed and responding within seconds. In a formative study, we found that users saw substantial value in defining their own sensors, yet struggled to articulate their unique personal requirements and debug the sensors through prompting alone. To address these challenges, we developed Gensors, a system that empowers users to define customized sensors supported by the reasoning capabilities of MLLMs. Gensors 1) assists users in eliciting requirements through both automatically-generated and manually created sensor criteria, 2) facilitates debugging by allowing users to isolate and test individual criteria in parallel, 3) suggests additional criteria based on user-provided images, and 4) proposes test cases to help users "stress test" sensors on potentially unforeseen scenarios. In a user study, participants reported significantly greater sense of control, understanding, and ease of communication when defining sensors using Gensors. Beyond addressing model limitations, Gensors supported users in debugging, eliciting requirements, and expressing unique personal requirements to the sensor through criteria-based reasoning; it also helped uncover users' "blind spots" by exposing overlooked criteria and revealing unanticipated failure modes. Finally, we discuss how unique characteristics of MLLMs--such as hallucinations and inconsistent responses--can impact the sensor-creation process. These findings contribute to the design of future intelligent sensing systems that are intuitive and customizable by everyday users.
arXiv.org Artificial Intelligence
Jan-26-2025
- Country:
- North America
- United States
- Wisconsin (0.04)
- Michigan (0.04)
- Virginia (0.04)
- Colorado > Denver County
- Denver (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- California > Santa Clara County
- Mountain View (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- Washington > King County
- Seattle (0.04)
- New York > New York County
- New York City (0.06)
- Pennsylvania > Allegheny County
- Pittsburgh (0.14)
- Canada > Ontario
- Toronto (0.04)
- United States
- Europe
- Asia
- Middle East > Jordan (0.04)
- South Korea > Seoul
- Seoul (0.04)
- North America
- Genre:
- Questionnaire & Opinion Survey (1.00)
- Research Report
- New Finding (1.00)
- Experimental Study (0.66)
- Industry:
- Health & Medicine > Consumer Health (0.68)
- Information Technology > Security & Privacy (0.45)
- Technology: