AITopics | Xuan, Hong

Collaborating Authors

Xuan, Hong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LLaVA-Zip: Adaptive Visual Token Compression with Intrinsic Image Information

Wang, Ke, Xuan, Hong

arXiv.org Artificial IntelligenceDec-11-2024

Multi-modal large language models (MLLMs) utilizing instruction-following data, such as LLaVA, have achieved great progress in the industry. A major limitation in these models is that visual tokens consume a substantial portion of the maximum token limit in large language models (LLMs), leading to increased computational demands and decreased performance when prompts include multiple images or videos. Industry solutions often mitigate this issue by increasing computational power, but this approach is less feasible in academic environments with limited resources. In this study, we propose Dynamic Feature Map Reduction (DFMR) based on LLaVA-1.5 to address the challenge of visual token overload. DFMR dynamically compresses the visual tokens, freeing up token capacity. Our experimental results demonstrate that integrating DFMR into LLaVA-1.5 significantly improves the performance of LLaVA in varied visual token lengths, offering a promising solution for extending LLaVA to handle multi-image and video scenarios in resource-constrained academic environments and it can also be applied in industry settings for data augmentation to help mitigate the scarcity of open-domain image-text pair datasets in the continued pretraining stage.

large language model, llava-1, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.08771

Country: Europe > Switzerland (0.28)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback

Dissecting the impact of different loss functions with gradient surgery

Xuan, Hong, Pless, Robert

arXiv.org Artificial IntelligenceJan-26-2022

Pair-wise loss is an approach to metric learning that learns a semantic embedding by optimizing a loss function that encourages images from the same semantic class to be mapped closer than images from different classes. The literature reports a large and growing set of variations of the pair-wise loss strategies. Here we decompose the gradient of these loss functions into components that relate to how they push the relative feature positions of the anchor-positive and anchor-negative pairs. This decomposition allows the unification of a large collection of current pair-wise loss functions. Additionally, explicitly constructing pair-wise gradient updates to separate out these effects gives insights into which have the biggest impact, and leads to a simple algorithm that beats the state of the art for image retrieval on the CAR, CUB and Stanford Online products datasets.

artificial intelligence, loss function, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2201.11307

Country: Oceania > Australia (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Visualizing How Embeddings Generalize

Liu, Xiaotong, Xuan, Hong, Zhang, Zeyu, Stylianou, Abby, Pless, Robert

arXiv.org Machine LearningSep-16-2019

Deep metric learning is often used to learn an embedding function that captures the semantic differences within a dataset. A key factor in many problem domains is how this embedding generalizes to new classes of data. In observing many triplet selection strategies for Metric Learning, we find that the best performance consistently arises from approaches that focus on a few, well selected triplets.We introduce visualization tools to illustrate how an embedding generalizes beyond measuring accuracy on validation data, and we illustrate the behavior of a range of triplet selection strategies.

artificial intelligence, neural network, training data, (20 more...)

arXiv.org Machine Learning

1909.07464

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Hotels-50K: A Global Hotel Recognition Dataset

Stylianou, Abby, Xuan, Hong, Shende, Maya, Brandt, Jonathan, Souvenir, Richard, Pless, Robert

arXiv.org Machine LearningJan-26-2019

Recognizing a hotel from an image of a hotel room is important for human trafficking investigations. Images directly link victims to places and can help verify where victims have been trafficked, and where their traffickers might move them or others in the future. Recognizing the hotel from images is challenging because of low image quality, uncommon camera perspectives, large occlusions (often the victim), and the similarity of objects (e.g., furniture, art, bedding) across different hotel rooms. To support efforts towards this hotel recognition task, we have curated a dataset of over 1 million annotated hotel room images from 50,000 hotels. These images include professionally captured photographs from travel websites and crowd-sourced images from a mobile application, which are more similar to the types of images analyzed in real-world investigations. We present a baseline approach based on a standard network architecture and a collection of data-augmentation approaches tuned to this problem domain.

crowdsourcing, law enforcement, recognition, (21 more...)

arXiv.org Machine Learning

1901.11397

Country:

North America > United States (0.46)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry:

Law (1.00)
Consumer Products & Services > Hotels (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.94)

Technology:

Information Technology > Artificial Intelligence > Vision (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.68)
Information Technology > Communications > Social Media > Crowdsourcing (0.67)

Add feedback