AITopics | Luo, Tiange

Collaborating Authors

Luo, Tiange

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Probing Visual Language Priors in VLMs

Luo, Tiange, Cao, Ang, Lee, Gunhee, Johnson, Justin, Lee, Honglak

arXiv.org Artificial IntelligenceDec-31-2024

Despite recent advances in Vision-Language Models (VLMs), many still over-rely on visual language priors present in their training data rather than true visual reasoning. To examine the situation, we introduce ViLP, a visual question answering (VQA) benchmark that pairs each question with three potential answers and three corresponding images: one image whose answer can be inferred from text alone, and two images that demand visual reasoning. By leveraging image generative models, we ensure significant variation in texture, shape, conceptual combinations, hallucinated elements, and proverb-based contexts, making our benchmark images distinctly out-of-distribution. While humans achieve near-perfect accuracy, modern VLMs falter; for instance, GPT-4 achieves only 66.17% on ViLP. To alleviate this, we propose a self-improving framework in which models generate new VQA pairs and images, then apply pixel-level and semantic corruptions to form "good-bad" image pairs for self-training. Our training objectives compel VLMs to focus more on actual visual inputs and have demonstrated their effectiveness in enhancing the performance of open-source VLMs, including LLaVA-v1.5 and Cambrian.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2501.00569

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (1.00)

Industry:

Transportation (1.00)
Leisure & Entertainment > Sports (1.00)
Health & Medicine (0.93)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Fine-grained Text Style Transfer with Diffusion-Based Language Models

Lyu, Yiwei, Luo, Tiange, Shi, Jiacheng, Hollon, Todd C., Lee, Honglak

arXiv.org Artificial IntelligenceJun-11-2023

Diffusion probabilistic models have shown great success in generating high-quality images controllably, and researchers have tried to utilize this controllability into text generation domain. Previous works on diffusion-based language models have shown that they can be trained without external knowledge (such as pre-trained weights) and still achieve stable performance and controllability. In this paper, we trained a diffusion-based model on StylePTB dataset, the standard benchmark for fine-grained text style transfers. The tasks in StylePTB requires much more refined control over the output text compared to tasks evaluated in previous works, and our model was able to achieve state-of-the-art performance on StylePTB on both individual and compositional transfers. Moreover, our model, trained on limited data from StylePTB without external knowledge, outperforms previous works that utilized pretrained weights, embeddings, and external grammar parsers, and this may indicate that diffusion-based language models have great potential under low-resource settings.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2305.19512

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Neural Shape Compiler: A Unified Framework for Transforming between Text, Point Cloud, and Program

Luo, Tiange, Lee, Honglak, Johnson, Justin

arXiv.org Artificial IntelligenceApr-6-2023

This paper presents a unified framework to translate between pairs of shape abstractions: Text Point Cloud Program. We propose Neural Shape Compiler to model the abstraction transformation as a conditional generation process. It converts 3D shapes of three abstract types into discrete shape code, transforms each shape code into code of other abstract types through the proposed ShapeCode Transformer, and decodes them to output the target shape abstraction. Point Cloud code is obtained in a class-agnostic way by the proposed PointVQVAE. On Text2Shape, ShapeGlot, ABO, Genre, and Program Synthetic datasets, Neural Shape Compiler shows strengths in Text = Point Cloud, Point Cloud = Text, Point Cloud = Program, and Point Cloud Completion tasks.

cloud computing, machine learning, natural language, (12 more...)

arXiv.org Artificial Intelligence

2212.12952

Genre: Research Report (1.00)

Industry: Energy (0.34)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Add feedback

Multimodal Subtask Graph Generation from Instructional Videos

Jang, Yunseok, Sohn, Sungryull, Logeswaran, Lajanugen, Luo, Tiange, Lee, Moontae, Lee, Honglak

arXiv.org Artificial IntelligenceFeb-16-2023

Real-world tasks consist of multiple inter-dependent subtasks (e.g., a dirty pan needs to be washed before it can be used for cooking). In this work, we aim to model the causal dependencies between such subtasks from instructional videos describing the task. This is a challenging problem since complete information about the world is often inaccessible from videos, which demands robust learning mechanisms to understand the causal structure of events. We present Multimodal Subtask Graph Generation (MSG2), an approach that constructs a Subtask Graph defining the dependency between a task's subtasks relevant to a task from noisy web videos. Graphs generated by our multimodal approach are closer to human-annotated graphs compared to prior approaches. MSG2 further performs the downstream task of next subtask prediction 85% and 30% more accurately than recent video transformer models in the ProceL and CrossTask datasets, respectively.

logic & formal reasoning, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2302.08672

Genre:

Research Report (0.81)
Instructional Material > Course Syllabus & Notes (0.61)

Industry:

Education > Educational Technology > Media (0.61)
Education > Educational Technology > Audio & Video (0.61)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.47)

Add feedback