AITopics | Chau, Polo

Collaborating Authors

Chau, Polo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Interactive Visual Learning for Stable Diffusion

Lee, Seongmin, Hoover, Benjamin, Strobelt, Hendrik, Wang, Zijie J., Peng, ShengYun, Wright, Austin, Li, Kevin, Park, Haekyu, Yang, Haoyang, Chau, Polo

arXiv.org Artificial IntelligenceApr-22-2024

Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex internal structures and operations often pose challenges for non-experts to grasp. We introduce Diffusion Explainer, the first interactive visualization tool designed to elucidate how Stable Diffusion transforms text prompts into images. It tightly integrates a visual overview of Stable Diffusion's complex components with detailed explanations of their underlying operations. This integration enables users to fluidly transition between multiple levels of abstraction through animations and interactive elements. Offering real-time hands-on experience, Diffusion Explainer allows users to adjust Stable Diffusion's hyperparameters and prompts without the need for installation or specialized hardware. Accessible via users' web browsers, Diffusion Explainer is making significant strides in democratizing AI education, fostering broader public access. More than 7,200 users spanning 113 countries have used our open-sourced tool at https://poloclub.github.io/diffusion-explainer/. A video demo is available at https://youtu.be/MbkIADZjPnA.

artificial intelligence, machine learning, stable diffusion, (19 more...)

arXiv.org Artificial Intelligence

2404.16069

Country: Europe > Germany (0.14)

Genre: Research Report (0.40)

Industry:

Media (0.47)
Government (0.32)
Law (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing

Helbling, Alec, Lee, Seongmin, Chau, Polo

arXiv.org Artificial IntelligenceApr-5-2024

Recently, researchers have proposed powerful systems for generating and manipulating images using natural language instructions. However, it is difficult to precisely specify many common classes of image transformations with text alone. For example, a user may wish to change the location and breed of a particular dog in an image with several similar dogs. This task is quite difficult with natural language alone, and would require a user to write a laboriously complex prompt that both disambiguates the target dog and describes the destination. We propose ClickDiffusion, a system for precise image manipulation and generation that combines natural language instructions with visual feedback provided by the user through a direct manipulation interface. We demonstrate that by serializing both an image and a multi-modal instruction into a textual representation it is possible to leverage LLMs to perform precise transformations of the layout and appearance of an image. Code available at https://github.com/poloclub/ClickDiffusion.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2404.04376

Country: North America > United States (0.15)

Genre: Research Report (0.50)

Industry: Media > Photography (0.44)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Point and Instruct: Enabling Precise Image Editing by Unifying Direct Manipulation and Text Instructions

Helbling, Alec, Lee, Seongmin, Chau, Polo

arXiv.org Artificial IntelligenceFeb-5-2024

Machine learning has enabled the development of powerful systems capable of editing images from natural language instructions. However, in many common scenarios it is difficult for users to specify precise image transformations with text alone. For example, in an image with several dogs, it is difficult to select a particular dog and move it to a precise location. Doing this with text alone would require a complex prompt that disambiguates the target dog and describes the destination. However, direct manipulation is well suited to visual tasks like selecting objects and specifying locations. We introduce Point and Instruct, a system for seamlessly combining familiar direct manipulation and textual instructions to enable precise image manipulation. With our system, a user can visually mark objects and locations, and reference them in textual instructions. This allows users to benefit from both the visual descriptiveness of natural language and the spatial precision of direct manipulation.

large language model, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2402.07925

Country: North America > United States (0.94)

Genre: Research Report (0.40)

Industry: Media > Photography (0.44)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.51)

Add feedback