AITopics | Chen, Liang-Chieh

Collaborating Authors

Chen, Liang-Chieh

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

1.58-bit FLUX

Yang, Chenglin, Liu, Celong, Deng, Xueqing, Kim, Dongwon, Mei, Xing, Shen, Xiaohui, Chen, Liang-Chieh

arXiv.org Artificial IntelligenceDec-24-2024

We present 1.58-bit FLUX, the first successful approach to quantizing the state-of-the-art text-to-image generation model, FLUX.1-dev, using 1.58-bit weights (i.e., values in {-1, 0, +1}) while maintaining comparable performance for generating 1024 x 1024 images. Notably, our quantization method operates without access to image data, relying solely on self-supervision from the FLUX.1-dev model. Additionally, we develop a custom kernel optimized for 1.58-bit operations, achieving a 7.7x reduction in model storage, a 5.1x reduction in inference memory, and improved inference latency. Extensive evaluations on the GenEval and T2I Compbench benchmarks demonstrate the effectiveness of 1.58-bit FLUX in maintaining generation quality while significantly enhancing computational efficiency.

large language model, machine learning, quantization, (19 more...)

arXiv.org Artificial Intelligence

2412.18653

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

MaskBit: Embedding-free Image Generation via Bit Tokens

Weber, Mark, Yu, Lijun, Yu, Qihang, Deng, Xueqing, Shen, Xiaohui, Cremers, Daniel, Chen, Liang-Chieh

arXiv.org Artificial IntelligenceDec-8-2024

Masked transformer models for class-conditional image generation have become a compelling alternative to diffusion models. Typically comprising two stages - an initial VQGAN model for transitioning between latent space and image space, and a subsequent Transformer model for image generation within latent space - these frameworks offer promising avenues for image synthesis. In this study, we present two primary contributions: Firstly, an empirical and systematic examination of VQGANs, leading to a modernized VQGAN. Secondly, a novel embedding-free generation network operating directly on bit tokens - a binary quantized representation of tokens with rich semantics. The first contribution furnishes a transparent, reproducible, and high-performing VQGAN model, enhancing accessibility and matching the performance of current state-of-the-art methods while revealing previously undisclosed details. The second contribution demonstrates that embedding-free image generation using bit tokens achieves a new state-of-the-art FID of 1.52 on the ImageNet 256x256 benchmark, with a compact generator model of mere 305M parameters. The code for this project is available on https://github.com/markweberdev/maskbit.

artificial intelligence, bit token, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2409.16211

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

Rashwan, Abdullah, Zhang, Jiageng, Taalimi, Ali, Yang, Fan, Zhou, Xingyi, Yan, Chaochao, Chen, Liang-Chieh, Li, Yeqing

arXiv.org Artificial IntelligenceDec-10-2023

In recent years, transformer-based models have dominated panoptic segmentation, thanks to their strong modeling capabilities and their unified representation for both semantic and instance classes as global binary masks. In this paper, we revisit pure convolution model and propose a novel panoptic architecture named MaskConver. MaskConver proposes to fully unify things and stuff representation by predicting their centers. To that extent, it creates a lightweight class embedding module that can break the ties when multiple centers co-exist in the same location. Furthermore, our study shows that the decoder design is critical in ensuring that the model has sufficient context for accurate detection and segmentation. We introduce a powerful ConvNeXt-UNet decoder that closes the performance gap between convolution- and transformerbased models. With ResNet50 backbone, our MaskConver achieves 53.6% PQ on the COCO panoptic val set, outperforming the modern convolution-based model, Panoptic FCN, by 9.3% as well as transformer-based models such as Mask2Former (+1.7% PQ) and kMaX-DeepLab (+0.6% PQ). Additionally, MaskConver with a MobileNet backbone reaches 37.2% PQ, improving over Panoptic-DeepLab by +6.4% under the same FLOPs/latency constraints. A further optimized version of MaskConver achieves 29.7% PQ, while running in real-time on mobile devices. The code and model weights will be publicly available

machine learning, natural language, segmentation, (19 more...)

arXiv.org Artificial Intelligence

2312.06052

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model

Gu, Xiuye, Cui, Yin, Huang, Jonathan, Rashwan, Abdullah, Yang, Xuan, Zhou, Xingyi, Ghiasi, Golnaz, Kuo, Weicheng, Chen, Huizhong, Chen, Liang-Chieh, Ross, David A

arXiv.org Artificial IntelligenceJun-2-2023

Observing the close relationship among panoptic, semantic and instance segmentation tasks, we propose to train a universal multi-dataset multi-task segmentation model: DaTaSeg.We use a shared representation (mask proposals with class predictions) for all tasks. To tackle task discrepancy, we adopt different merge operations and post-processing for different tasks. We also leverage weak-supervision, allowing our segmentation model to benefit from cheaper bounding box annotations. To share knowledge across datasets, we use text embeddings from the same semantic embedding space as classifiers and share all network parameters among datasets. We train DaTaSeg on ADE semantic, COCO panoptic, and Objects365 detection datasets. DaTaSeg improves performance on all datasets, especially small-scale datasets, achieving 54.0 mIoU on ADE semantic and 53.5 PQ on COCO panoptic. DaTaSeg also enables weakly-supervised knowledge transfer on ADE panoptic and Objects365 instance segmentation. Experiments show DaTaSeg scales with the number of training datasets and enables open-vocabulary segmentation through direct transfer. In addition, we annotate an Objects365 instance segmentation set of 1,000 images and will release it as a public benchmark.

artificial intelligence, machine learning, segmentation, (17 more...)

arXiv.org Artificial Intelligence

2306.01736

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
(2 more...)

Add feedback

A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision

Beyer, Lucas, Wan, Bo, Madan, Gagan, Pavetic, Filip, Steiner, Andreas, Kolesnikov, Alexander, Pinto, André Susano, Bugliarello, Emanuele, Wang, Xiao, Yu, Qihang, Chen, Liang-Chieh, Zhai, Xiaohua

arXiv.org Artificial IntelligenceMar-30-2023

There has been a recent explosion of computer vision models which perform many tasks and are composed of an image encoder (usually a ViT) and an autoregressive decoder (usually a Transformer). However, most of this work simply presents one system and its results, leaving many questions regarding design decisions and trade-offs of such systems unanswered. In this work, we aim to provide such answers. We take a close look at autoregressive decoders for multi-task learning in multimodal computer vision, including classification, captioning, visual question answering, and optical character recognition. Through extensive systematic experiments, we study the effects of task and data mixture, training and regularization hyperparameters, conditioning type and specificity, modality combination, and more. Importantly, we compare these to well-tuned single-task baselines to highlight the cost incurred by multi-tasking. A key finding is that a small decoder learned on top of a frozen pretrained encoder works surprisingly well. We call this setup locked-image tuning with decoder (LiT-decoder). It can be seen as teaching a decoder to interact with a pretrained vision model via natural language.

decoder, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.17376

Country:

Oceania > Australia (0.14)
Europe (0.14)
Asia > Middle East > UAE (0.14)
Africa > Ethiopia (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Panoptic-DeepLab

Cheng, Bowen, Collins, Maxwell D., Zhu, Yukun, Liu, Ting, Huang, Thomas S., Adam, Hartwig, Chen, Liang-Chieh

arXiv.org Machine LearningOct-23-2019

Our Panoptic-DeepLab is conceptually simple and delivers state-of-the-art results. In particular, we adopt the dual-ASPP and dual-decoder structures specific to semantic, and instance segmentation, respectively. The semantic segmentation branch is the same as the typical design of any semantic segmentation model ( e.g., DeepLab), while the instance segmentation branch is class-agnostic, involving a simple instance center regression. Our single Panoptic-DeepLab sets the new state-of-art at all three Cityscapes benchmarks, reaching 84.2% mIoU, 39.0% AP, and 65.5% PQ on test set, and advances results on the other challenging Mapillary Vistas. 1. Introduction Our bottom-up Panoptic-DeepLab is conceptually simple and delivers state-of-the-art panoptic segmentation results [7]. We adopt dual-ASPP and dual-decoder modules, specific to semantic segmentation and instance segmentation, respectively. The semantic segmentation branch follows the typical design of any semantic segmentation model (e.g., DeepLab [2]), while the instance segmentation prediction involves a simple instance center regression [1, 5], where the model learns to predict instance centers as well as the offset from each pixel to its corresponding center.

artificial intelligence, machine learning, segmentation, (16 more...)

arXiv.org Machine Learning

1910.04751

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Chen, Liang-Chieh, Collins, Maxwell, Zhu, Yukun, Papandreou, George, Zoph, Barret, Schroff, Florian, Adam, Hartwig, Shlens, Jon

Neural Information Processing SystemsDec-31-2018

The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7% on Cityscapes (street scene parsing), 71.3% on PASCAL-Person-Part (person-part segmentation), and 87.9% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems.

artificial intelligence, machine learning, survey article, (18 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre:

Overview (0.54)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Chen, Liang-Chieh, Collins, Maxwell, Zhu, Yukun, Papandreou, George, Zoph, Barret, Schroff, Florian, Adam, Hartwig, Shlens, Jon

Neural Information Processing SystemsDec-31-2018

architecture, deep learning, neural network, (18 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Genre:

Overview (0.54)
Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction

Chen, Liang-Chieh, Collins, Maxwell D., Zhu, Yukun, Papandreou, George, Zoph, Barret, Schroff, Florian, Adam, Hartwig, Shlens, Jonathon

arXiv.org Machine LearningSep-11-2018

The design of neural network architectures is an important component for achieving state-of-the-art performance with machine learning systems across a broad array of tasks. Much work has endeavored to design and build architectures automatically through clever construction of a search space paired with simple learning algorithms. Recent progress has demonstrated that such meta-learning methods may exceed scalable human-invented architectures on image classification tasks. An open question is the degree to which such methods may generalize to new domains. In this work we explore the construction of meta-learning techniques for dense image prediction focused on the tasks of scene parsing, person-part segmentation, and semantic image segmentation. Constructing viable search spaces in this domain is challenging because of the multi-scale representation of visual information and the necessity to operate on high resolution imagery. Based on a survey of techniques in dense image prediction, we construct a recursive search space and demonstrate that even with efficient random search, we can identify architectures that outperform human-invented architectures and achieve state-of-the-art performance on three dense prediction tasks including 82.7\% on Cityscapes (street scene parsing), 71.3\% on PASCAL-Person-Part (person-part segmentation), and 87.9\% on PASCAL VOC 2012 (semantic image segmentation). Additionally, the resulting architecture is more computationally efficient, requiring half the parameters and half the computational cost as previous state of the art systems.

architecture, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1809.04184

Country: North America > Canada (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback