AITopics | Cheng, Chin-Yi

Collaborating Authors

Cheng, Chin-Yi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Revision Matters: Generative Design Guided by Revision Edits

Li, Tao, Cheng, Chin-Yi, Xie, Amber, Li, Gang, Li, Yang

arXiv.org Artificial IntelligenceMay-27-2024

Layout design, such as user interface or graphical layout in general, is fundamentally an iterative revision process. Through revising a design repeatedly, the designer converges on an ideal layout. In this paper, we investigate how revision edits from human designer can benefit a multimodal generative model. To do so, we curate an expert dataset that traces how human designers iteratively edit and improve a layout generation with a prompted language goal. Based on such data, we explore various supervised fine-tuning task setups on top of a Gemini multimodal backbone, a large multimodal model. Our results show that human revision plays a critical role in iterative layout refinement. While being noisy, expert revision edits lead our model to a surprisingly strong design FID score ~10 which is close to human performance (~6). In contrast, self-revisions that fully rely on model's own judgement, lead to an echo chamber that prevents iterative improvement, and sometimes leads to generative degradation. Fortunately, we found that providing human guidance plays at early stage plays a critical role in final generation. In such human-in-the-loop scenario, our work paves the way for iterative design revision based on pre-trained large multimodal models.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.18559

Country:

North America > United States > California > Santa Clara County (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

CoLay: Controllable Layout Generation through Multi-conditional Latent Diffusion

Cheng, Chin-Yi, Gao, Ruiqi, Huang, Forrest, Li, Yang

arXiv.org Artificial IntelligenceMay-18-2024

Layout design generation has recently gained significant attention due to its potential applications in various fields, including UI, graphic, and floor plan design. However, existing models face two main challenges that limits their adoption in practice. Firstly, the limited expressiveness of individual condition types used in previous works restricts designers' ability to convey complex design intentions and constraints. Secondly, most existing models focus on generating labels and coordinates, while real layouts contain a range of style properties. To address these limitations, we propose a novel framework, CoLay, that integrates multiple condition types and generates complex layouts with diverse style properties. Our approach outperforms prior works in terms of generation quality and condition satisfaction while empowering users to express their design intents using a flexible combination of modalities, including natural language prompts, layout guidelines, element types, and partially completed designs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.13045

Country:

North America > United States (0.28)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

Leveraging Human Revisions for Improving Text-to-Layout Models

Xie, Amber, Cheng, Chin-Yi, Huang, Forrest, Li, Yang

arXiv.org Artificial IntelligenceMay-15-2024

Learning from human feedback has shown success in aligning large, pretrained models with human values. Prior works have mostly focused on learning from high-level labels, such as preferences between pairs of model outputs. On the other hand, many domains could benefit from more involved, detailed feedback, such as revisions, explanations, and reasoning of human users. Our work proposes using nuanced feedback through the form of human revisions for stronger alignment. In this paper, we ask expert designers to fix layouts generated from a generative layout model that is pretrained on a large-scale dataset of mobile screens. Then, we train a reward model based on how human designers revise these generated layouts. With the learned reward model, we optimize our model with reinforcement learning from human feedback (RLHF). Our method, Revision-Aware Reward Models ($\method$), allows a generative text-to-layout model to produce more modern, designer-aligned layouts, showing the potential for utilizing human revisions and stronger forms of feedback in improving generative models.

artificial intelligence, layout, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.13026

Country:

North America > United States (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)

Add feedback

Representation Learning for Sequential Volumetric Design Tasks

Alam, Md Ferdous, Wang, Yi, Tran, Linh, Cheng, Chin-Yi, Luo, Jieliang

arXiv.org Artificial IntelligenceSep-5-2023

Volumetric design, also called massing design, is the first and critical step in professional building design which is sequential in nature. As the volumetric design process is complex, the underlying sequential design process encodes valuable information for designers. Many efforts have been made to automatically generate reasonable volumetric designs, but the quality of the generated design solutions varies, and evaluating a design solution requires either a prohibitively comprehensive set of metrics or expensive human expertise. While previous approaches focused on learning only the final design instead of sequential design tasks, we propose to encode the design knowledge from a collection of expert or high-performing design sequences and extract useful representations using transformer-based models. Later we propose to utilize the learned representations for crucial downstream applications such as design preference evaluation and procedural design generation. We develop the preference model by estimating the density of the learned representations whereas we train an autoregressive transformer model for sequential design generation. We demonstrate our ideas by leveraging a novel dataset of thousands of sequential volumetric designs. Our preference model can compare two arbitrarily given design sequences and is almost 90% accurate in evaluation against random design sequences. Our autoregressive model is also capable of autocompleting a volumetric design sequence from a partial design sequence.

artificial intelligence, representation learning, sequential volumetric design task

arXiv.org Artificial Intelligence

2309.02583

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

PLay: Parametrically Conditioned Layout Generation using Latent Diffusion

Cheng, Chin-Yi, Huang, Forrest, Li, Gang, Li, Yang

arXiv.org Artificial IntelligenceJun-21-2023

Layout design is an important task in various design fields, including user interface, document, and graphic design. As this task requires tedious manual effort by designers, prior works have attempted to automate this process using generative models, but commonly fell short of providing intuitive user controls and achieving design objectives. In this paper, we build a conditional latent diffusion model, PLay, that generates parametrically conditioned layouts in vector graphic space from user-specified guidelines, which are commonly used by designers for representing their design intents in current practices. Our method outperforms prior works across three datasets on metrics including FID and FD-VG, and in user study. Moreover, it brings a novel and interactive experience to professional layout design processes.

artificial intelligence, human computer interaction, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.11529

Country:

North America > United States > Hawaii (0.14)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

IKEA-Manual: Seeing Shape Assembly Step by Step

Wang, Ruocheng, Zhang, Yunzhi, Mao, Jiayuan, Zhang, Ran, Cheng, Chin-Yi, Wu, Jiajun

arXiv.org Artificial IntelligenceFeb-3-2023

Human-designed visual manuals are crucial components in shape assembly activities. They provide step-by-step guidance on how we should move and connect different parts in a convenient and physically-realizable way. While there has been an ongoing effort in building agents that perform assembly tasks, the information in human-design manuals has been largely overlooked. We identify that this is due to 1) a lack of realistic 3D assembly objects that have paired manuals and 2) the difficulty of extracting structured information from purely image-based manuals. Motivated by this observation, we present IKEA-Manual, a dataset consisting of 102 IKEA objects paired with assembly manuals. We provide fine-grained annotations on the IKEA objects and assembly manuals, including decomposed assembly parts, assembly plans, manual segmentation, and 2D-3D correspondence between 3D parts and visual manuals. We illustrate the broad application of our dataset on four tasks related to shape assembly: assembly plan generation, part segmentation, pose estimation, and 3D part assembly.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2302.01881

Country:

Asia (0.16)
North America > United States (0.14)

Genre:

Research Report (0.50)
Workflow (0.46)

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.35)

Add feedback

Translating a Visual LEGO Manual to a Machine-Executable Plan

Wang, Ruocheng, Zhang, Yunzhi, Mao, Jiayuan, Cheng, Chin-Yi, Wu, Jiajun

arXiv.org Artificial IntelligenceJul-25-2022

We study the problem of translating an image-based, step-by-step assembly manual created by human designers into machine-interpretable instructions. We formulate this problem as a sequential prediction task: at each step, our model reads the manual, locates the components to be added to the current shape, and infers their 3D poses. This task poses the challenge of establishing a 2D-3D correspondence between the manual image and the real 3D object, and 3D pose estimation for unseen 3D objects, since a new component to be added in a step can be an object built from previous steps. To address these two challenges, we present a novel learning-based framework, the Manual-to-Executable-Plan Network (MEPNet), which reconstructs the assembly steps from a sequence of manual images. The key idea is to integrate neural 2D keypoint detection modules and 2D-3D projection algorithms for high-precision prediction and strong generalization to unseen components. The MEPNet outperforms existing methods on three newly collected LEGO manual datasets and a Minecraft house dataset.

artificial intelligence, dataset, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2207.12572

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Sensing and Signal Processing > Image Processing (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation

Sanghi, Aditya, Chu, Hang, Lambourne, Joseph G., Wang, Ye, Cheng, Chin-Yi, Fumero, Marco

arXiv.org Artificial IntelligenceOct-6-2021

While recent progress has been made in text-to-image generation, text-to-shape generation remains a challenging problem due to the unavailability of paired text and shape data at a large scale. We present a simple yet effective method for zeroshot text-to-shape generation based on a two-stage training process, which only depends on an unlabelled shape dataset and a pre-trained image-text network such as CLIP. Our method not only demonstrates promising zero-shot generalization, but also avoids expensive inference time optimization and can generate multiple shapes for a given text. "a cuboid sofa" "a round sofa" "an airplane" "a space shuttle" "an suv" "a pickup truck" Figure 1: CLIP-Forge generates meaningful shapes without using any shape-text pairing labels.

artificial intelligence, machine learning, north america government, (17 more...)

arXiv.org Artificial Intelligence

2110.02624

Country: North America > United States (0.34)

Genre: Research Report (0.40)

Industry:

Transportation > Air (0.34)
Automobiles & Trucks > Manufacturer (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Sensing and Signal Processing > Image Processing (0.91)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.63)

Add feedback