PUO-Bench: APanel Understanding and Operation Benchmark with APrivacy-Preserving Framework
–Neural Information Processing Systems
Recent advancements in Vision-Language Models (VLMs) have enabled GUI agents to leverage visual features for interface understanding and operation in the digital world. However, limited research has addressed the interpretation and interaction with control panels in real-world settings. To bridge this gap, we propose the Panel Understanding and Operation (PUO) benchmark, comprising annotated panel images from appliances and associated vision-language instruction pairs. Experimental results on the benchmark demonstrate significant performance disparities between zero-shot and fine-tuned VLMs, revealing the lack of PUOspecific capabilities in existing language models. Furthermore, we introduce a Privacy-Preserving Framework (PPF) to address privacy concerns in cloud-based panel parsing and reasoning. PPF employs a dual-stage architecture, performing panel understanding on edge devices while delegating complex reasoning to cloudbased LLMs. Although this design introduces a performance trade-off due to edge model limitations, it eliminates the transmission of raw visual data, thereby mitigating privacy risks. Overall, this work provides foundational resources and methodologies for advancing interactive human-machine systems and robotic field in panel-centric applications.
Neural Information Processing Systems
Jun-16-2026, 03:23:24 GMT
- Country:
- Asia (0.46)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.67)
- Research Report
- Industry:
- Information Technology
- Security & Privacy (1.00)
- Services (0.68)
- Information Technology
- Technology:
- Information Technology > Artificial Intelligence
- Vision (1.00)
- Natural Language > Large Language Model (1.00)
- Machine Learning > Neural Networks (0.93)
- Information Technology > Artificial Intelligence