AITopics | Luo, Simian

Collaborating Authors

Luo, Simian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Trajectory Models are Scalable Motion Predictors and Planners

Sun, Qiao, Zhang, Shiduo, Ma, Danjiao, Shi, Jingzhe, Li, Derun, Luo, Simian, Wang, Yu, Xu, Ningyi, Cao, Guangzhi, Zhao, Hang

arXiv.org Artificial IntelligenceJan-31-2024

Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). Our approach unites trajectory generation problems with other sequence modeling problems, powering rapid iterations with breakthroughs in neighbor domains such as language modeling. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations. Motion planning and prediction in autonomous driving rely on the ability to semantically understand complex driving environments and interactions between various road users. Learning-based methods are pivotal to overcoming this complexity as rule-based and scenario-specific strategies often prove inadequate to cover all possible situations and unexpected events that may occur during operations. Such learning problems can be regarded as conditional sequence-to-sequence tasks, where models leverage past trajectories to generate future ones, depending on the observations. Notably, these problems share structural similarities with other sequence modeling problems, such as language generation. Recent studies (Mirchandani et al., 2023; Zeng et al., 2023) have demonstrated that the LLMs excel not only in natural language generation but also in tackling a wide range of sequence modeling problems and time series forecasting challenges. Building on these insights, prior research (Chen et al., 2021; Janner et al., 2021; Sun et al., 2023) have effectively utilized conditional causal transformers to address motion planning as a large sequence modeling problem, with both behavior cloning and reinforcement learning. Furthermore, (Brohan et al., 2023) replace the transformer backbone with language models, demonstrating the potential to merge motion planning along with other modalities within one large sequence for LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2310.1962

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Transportation > Ground > Road (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Luo, Simian, Tan, Yiqin, Patil, Suraj, Gu, Daniel, von Platen, Patrick, Passos, Apolinário, Huang, Longbo, Li, Jian, Zhao, Hang

arXiv.org Artificial IntelligenceNov-9-2023

Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks, producing high-quality images with minimal inference steps. LCMs are distilled from pre-trained latent diffusion models (LDMs), requiring only ~32 A100 GPU training hours. This report further extends LCMs' potential in two aspects: First, by applying LoRA distillation to Stable-Diffusion models including SD-V1.5, SSD-1B, and SDXL, we have expanded LCM's scope to larger models with significantly less memory consumption, achieving superior image generation quality. Second, we identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA. LCM-LoRA can be directly plugged into various Stable-Diffusion fine-tuned models or LoRAs without training, thus representing a universally applicable accelerator for diverse image generation tasks. Compared with previous numerical PF-ODE solvers such as DDIM, DPM-Solver, LCM-LoRA can be viewed as a plug-in neural PF-ODE solver that possesses strong generalization abilities. Project page: https://github.com/luosiallen/latent-consistency-model.

artificial intelligence, arxiv preprint arxiv, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2311.05556

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Luo, Simian, Tan, Yiqin, Huang, Longbo, Li, Jian, Zhao, Hang

arXiv.org Artificial IntelligenceOct-6-2023

Latent Diffusion models (LDMs) have achieved remarkable results in synthesizing high-resolution images. However, the iterative sampling process is computationally intensive and leads to slow generation. Inspired by Consistency Models (song et al.), we propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs, including Stable Diffusion (rombach et al). Viewing the guided reverse diffusion process as solving an augmented probability flow ODE (PF-ODE), LCMs are designed to directly predict the solution of such ODE in latent space, mitigating the need for numerous iterations and allowing rapid, high-fidelity sampling. Efficiently distilled from pre-trained classifier-free guided diffusion models, a high-quality 768 x 768 2~4-step LCM takes only 32 A100 GPU hours for training. Furthermore, we introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets. Evaluation on the LAION-5B-Aesthetics dataset demonstrates that LCMs achieve state-of-the-art text-to-image generation performance with few-step inference. Project Page: https://latent-consistency-models.github.io/

artificial intelligence, machine learning, synthesizing high-resolution image, (2 more...)

arXiv.org Artificial Intelligence

2310.04378

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.53)

Add feedback

Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models

Luo, Simian, Yan, Chuanhao, Hu, Chenxu, Zhao, Hang

arXiv.org Artificial IntelligenceJun-29-2023

The Video-to-Audio (V2A) model has recently gained attention for its practical application in generating audio directly from silent videos, particularly in video/film production. However, previous methods in V2A have limited generation quality in terms of temporal synchronization and audio-visual relevance. We present Diff-Foley, a synchronized Video-to-Audio synthesis method with a latent diffusion model (LDM) that generates high-quality audio with improved synchronization and audio-visual relevance. We adopt contrastive audio-visual pretraining (CAVP) to learn more temporally and semantically aligned features, then train an LDM with CAVP-aligned visual features on spectrogram latent space. The CAVP-aligned features enable LDM to capture the subtler audio-visual correlation via a cross-attention module. We further significantly improve sample quality with `double guidance'. Diff-Foley achieves state-of-the-art V2A performance on current large scale V2A dataset. Furthermore, we demonstrate Diff-Foley practical applicability and generalization capabilities via downstream finetuning. Project Page: see https://diff-foley.github.io/

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.17203

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment (0.66)
Media > Film (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ChatDB: Augmenting LLMs with Databases as Their Symbolic Memory

Hu, Chenxu, Fu, Jie, Du, Chenzhuang, Luo, Simian, Zhao, Junbo, Zhao, Hang

arXiv.org Artificial IntelligenceJun-7-2023

Large language models (LLMs) with memory are computationally universal. However, mainstream LLMs are not taking full advantage of memory, and the designs are heavily influenced by biological brains. Due to their approximate nature and proneness to the accumulation of errors, conventional neural memory mechanisms cannot support LLMs to simulate complex reasoning. In this paper, we seek inspiration from modern computer architectures to augment LLMs with symbolic memory for complex multi-hop reasoning. Such a symbolic memory framework is instantiated as an LLM and a set of SQL databases, where the LLM generates SQL instructions to manipulate the SQL databases. We validate the effectiveness of the proposed memory framework on a synthetic dataset requiring complex reasoning. The project website is available at https://chatdatabase.github.io/ .

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2306.03901

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback