AITopics | Peng, Yi-Hao

Collaborating Authors

Peng, Yi-Hao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AutoPresent: Designing Structured Visuals from Scratch

Ge, Jiaxin, Wang, Zora Zhiruo, Zhou, Xuhui, Peng, Yi-Hao, Subramanian, Sanjay, Tan, Qinyue, Sap, Maarten, Suhr, Alane, Fried, Daniel, Neubig, Graham, Darrell, Trevor

arXiv.org Artificial IntelligenceJan-1-2025

Designing structured visuals such as presentation slides is essential for communicative needs, necessitating both content creation and visual planning skills. In this work, we tackle the challenge of automated slide generation, where models produce slide presentations from natural language (NL) instructions. We first introduce the SlidesBench benchmark, the first benchmark for slide generation with 7k training and 585 testing examples derived from 310 slide decks across 10 domains. SlidesBench supports evaluations that are (i)reference-based to measure similarity to a target slide, and (ii)reference-free to measure the design quality of generated slides alone. We benchmark end-to-end image generation and program generation methods with a variety of models, and find that programmatic methods produce higher-quality slides in user-interactable formats. Built on the success of program generation, we create AutoPresent, an 8B Llama-based model trained on 7k pairs of instructions paired with code for slide generation, and achieve results comparable to the closed-source model GPT-4o. We further explore iterative design refinement where the model is tasked to self-refine its own output, and we found that this process improves the slide's quality. We hope that our work will provide a basis for future work on generating structured visuals.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.00912

Country: North America > United States > California (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback

DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Peng, Yi-Hao, Huq, Faria, Jiang, Yue, Wu, Jason, Li, Amanda Xin Yue, Bigham, Jeffrey, Pavel, Amy

arXiv.org Artificial IntelligenceSep-30-2024

Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2410.00201

Country:

North America > United States > Texas (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.69)
Education (0.46)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

Shen, Hua, Knearem, Tiffany, Ghosh, Reshmi, Alkiek, Kenan, Krishna, Kundan, Liu, Yachuan, Ma, Ziqiao, Petridis, Savvas, Peng, Yi-Hao, Qiwei, Li, Rakshit, Sushrita, Si, Chenglei, Xie, Yutong, Bigham, Jeffrey P., Bentley, Frank, Chai, Joyce, Lipton, Zachary, Mei, Qiaozhu, Mihalcea, Rada, Terry, Michael, Yang, Diyi, Morris, Meredith Ringel, Resnick, Paul, Jurgens, David

arXiv.org Artificial IntelligenceJun-17-2024

Recent advancements in general-purpose AI have highlighted the importance of guiding AI systems towards the intended goals, ethical principles, and values of individuals and groups, a concept broadly recognized as alignment. However, the lack of clarified definitions and scopes of human-AI alignment poses a significant obstacle, hampering collaborative efforts across research domains to achieve this alignment. In particular, ML- and philosophy-oriented alignment research often views AI alignment as a static, unidirectional process (i.e., aiming to ensure that AI systems' objectives match humans) rather than an ongoing, mutual alignment problem [429]. This perspective largely neglects the long-term interaction and dynamic changes of alignment. To understand these gaps, we introduce a systematic review of over 400 papers published between 2019 and January 2024, spanning multiple domains such as Human-Computer Interaction (HCI), Natural Language Processing (NLP), Machine Learning (ML), and others. We characterize, define and scope human-AI alignment. From this, we present a conceptual framework of "Bidirectional Human-AI Alignment" to organize the literature from a human-centered perspective. This framework encompasses both 1) conventional studies of aligning AI to humans that ensures AI produces the intended outcomes determined by humans, and 2) a proposed concept of aligning humans to AI, which aims to help individuals and society adjust to AI advancements both cognitively and behaviorally. Additionally, we articulate the key findings derived from literature analysis, including discussions about human values, interaction techniques, and evaluations. To pave the way for future studies, we envision three key challenges for future directions and propose examples of potential future solutions.

large language model, machine learning, simulation of human behavior, (19 more...)

arXiv.org Artificial Intelligence

2406.09264

Country: North America > United States (1.00)

Genre:

Questionnaire & Opinion Survey (1.00)
Overview (1.00)
Research Report > Experimental Study (0.46)
Research Report > Promising Solution (0.45)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.92)
(4 more...)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(6 more...)

Add feedback

"This really lets us see the entire world:" Designing a conversational telepresence robot for homebound older adults

Hu, Yaxin, Stegner, Laura, Kotturi, Yasmine, Zhang, Caroline, Peng, Yi-Hao, Huq, Faria, Zhao, Yuhang, Bigham, Jeffrey P., Mutlu, Bilge

arXiv.org Artificial IntelligenceMay-23-2024

In this paper, we explore the design and use of conversational telepresence robots to help homebound older adults interact with the external world. An initial needfinding study (N=8) using video vignettes revealed older adults' experiential needs for robot-mediated remote experiences such as exploration, reminiscence and social participation. We then designed a prototype system to support these goals and conducted a technology probe study (N=11) to garner a deeper understanding of user preferences for remote experiences. The study revealed user interactive patterns in each desired experience, highlighting the need of robot guidance, social engagements with the robot and the remote bystanders. Our work identifies a novel design space where conversational telepresence robots can be used to foster meaningful interactions in the remote physical environment. We offer design insights into the robot's proactive role in providing guidance and using dialogue to create personalized, contextualized and meaningful experiences.

artificial intelligence, participant, robot, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3643834.3660710

2405.15037

Country:

Europe (1.00)
North America > United States > Wisconsin (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Questionnaire & Opinion Survey (1.00)

Industry:

Health & Medicine > Government Relations & Public Policy (1.00)
Health & Medicine > Consumer Health (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

Add feedback

UIClip: A Data-driven Model for Assessing User Interface Design

Wu, Jason, Peng, Yi-Hao, Li, Amanda, Swearngin, Amanda, Bigham, Jeffrey P., Nichols, Jeffrey

arXiv.org Artificial IntelligenceApr-18-2024

User interface (UI) design is a difficult yet important task for ensuring the usability, accessibility, and aesthetic qualities of applications. In our paper, we develop a machine-learned model, UIClip, for assessing the design quality and visual relevance of a UI given its screenshot and natural language description. To train UIClip, we used a combination of automated crawling, synthetic augmentation, and human ratings to construct a large-scale dataset of UIs, collated by description and ranked by design quality. Through training on the dataset, UIClip implicitly learns properties of good and bad designs by i) assigning a numerical score that represents a UI design's relevance and quality and ii) providing design suggestions. In an evaluation that compared the outputs of UIClip and other baselines to UIs rated by 12 human designers, we found that UIClip achieved the highest agreement with ground-truth rankings. Finally, we present three example applications that demonstrate how UIClip can facilitate downstream applications that rely on instantaneous assessment of UI design quality: i) UI code generation, ii) UI design tips generation, and iii) quality-aware UI example search.

large language model, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2404.125

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback