AITopics | Lee, Chankyu

Collaborating Authors

Lee, Chankyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs

Lin, Sheng-Chieh, Lee, Chankyu, Shoeybi, Mohammad, Lin, Jimmy, Catanzaro, Bryan, Ping, Wei

arXiv.org Artificial IntelligenceNov-4-2024

State-of-the-art retrieval models typically address a straightforward search scenario, where retrieval tasks are fixed (e.g., finding a passage to answer a specific question) and only a single modality is supported for both queries and retrieved results. This paper introduces techniques for advancing information retrieval with multimodal large language models (MLLMs), enabling a broader search scenario, termed universal multimodal retrieval, where multiple modalities and diverse retrieval tasks are accommodated. To this end, we first study fine-tuning an MLLM as a bi-encoder retriever on 10 datasets with 16 retrieval tasks. Our empirical results show that the fine-tuned MLLM retriever is capable of understanding challenging queries, composed of both text and image, but underperforms a smaller CLIP retriever in cross-modal retrieval tasks due to modality bias from MLLMs. To address the issue, we propose modality-aware hard negative mining to mitigate the modality bias exhibited by MLLM retrievers. Second, we propose to continually fine-tune the universal multimodal retriever to enhance its text retrieval capability while maintaining multimodal retrieval capability. As a result, our model, MM-Embed, achieves state-of-the-art performance on the multimodal retrieval benchmark M-BEIR, which spans multiple domains and tasks, while also surpassing the state-of-the-art text retrieval model, NV-Embed-v1, on MTEB retrieval benchmark. Finally, we explore to prompt the off-the-shelf MLLMs as the zero-shot rerankers to refine the ranking of the candidates from the multimodal retriever. We find that through prompt-and-reranking, MLLMs can further improve multimodal retrieval when the user queries (e.g., text-image composed queries) are more complex and challenging to understand. These findings also pave the way to advance universal multimodal retrieval in the future.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.02571

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models

Lee, Chankyu, Roy, Rajarshi, Xu, Mengyao, Raiman, Jonathan, Shoeybi, Mohammad, Catanzaro, Bryan, Ping, Wei

arXiv.org Artificial IntelligenceMay-27-2024

Decoder-only large language model (LLM)-based embedding models are beginning to outperform BERT or T5-based embedding models in general-purpose text embedding tasks, including dense vector-based retrieval. In this work, we introduce the NV-Embed model with a variety of architectural designs and training procedures to significantly enhance the performance of LLM as a versatile embedding model, while maintaining its simplicity and reproducibility. For model architecture, we propose a latent attention layer to obtain pooled embeddings, which consistently improves retrieval and downstream task accuracy compared to mean pooling or using the last token embedding from LLMs. To enhance representation learning, we remove the causal attention mask of LLMs during contrastive training. For model training, we introduce a two-stage contrastive instruction-tuning method. It first applies contrastive training with instructions on retrieval datasets, utilizing in-batch negatives and curated hard negative examples. At stage-2, it blends various non-retrieval datasets into instruction tuning, which not only enhances non-retrieval task accuracy but also improves retrieval performance. Combining these techniques, our NV-Embed model, using only publicly available data, has achieved a record-high score of 69.32, ranking No. 1 on the Massive Text Embedding Benchmark (MTEB) (as of May 24, 2024), with 56 tasks, encompassing retrieval, reranking, classification, clustering, and semantic textual similarity tasks. Notably, our model also attains the highest score of 59.36 on 15 retrieval tasks in the MTEB benchmark (also known as BEIR).

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2405.17428

Country: North America > United States > Oregon (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

ChatQA: Building GPT-4 Level Conversational QA Models

Liu, Zihan, Ping, Wei, Roy, Rajarshi, Xu, Peng, Lee, Chankyu, Shoeybi, Mohammad, Catanzaro, Bryan

arXiv.org Artificial IntelligenceJan-23-2024

In this work, we introduce ChatQA, a family of conversational question answering (QA) models that obtain GPT-4 level accuracies. Specifically, we propose a two-stage instruction tuning method that can significantly improve the zero-shot conversational QA results from large language models (LLMs). To handle retrieval-augmented generation in conversational QA, we fine-tune a dense retriever on a multi-turn QA dataset, which provides comparable results to using the state-of-the-art query rewriting model while largely reducing deployment cost. Notably, our ChatQA-70B can outperform GPT-4 in terms of average score on 10 conversational QA datasets (54.14 vs. 53.90), without relying on any synthetic data from OpenAI GPT models.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2401.10225

Country:

Europe > Germany (0.46)
Europe > France (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Endocrinology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Towards Understanding the Effect of Leak in Spiking Neural Networks

Chowdhury, Sayeed Shafayet, Lee, Chankyu, Roy, Kaushik

arXiv.org Machine LearningJun-15-2020

Over the past few years, the advancements of deep artificial neural networks (ANNs) have led to remarkable success in various cognitive tasks (e.g., vision, language and behavior). In some cases, neural networks have outperformed the conventional algorithms and achieved human-level performance [1, 2]. However, recent ANNs are becoming extremely compute-intensive and often do not generalize well to previously unseen data during training. On the other hand, human brain can reliably learn and compute intricate cognitive tasks with only a few watts of power budget. Recently, Spiking Neural Networks (SNNs) have been explored toward realizing robust and energy-efficient machine intelligence guided by the cues from neuroscience experiments [3]. SNNs are categorized as the new generation neural networks [4] based on their neuronal functionalities. A variety of spiking neuron models largely resemble biological neuronal mechanisms, which transmit information through discrete spatiotemporal events (or spikes). These spiking neuron models can be characterized by their internal state called the membrane potential. A spiking neuron integrates the inputs over time and fires a spike-output whenever the membrane potential exceeds a threshold.

deep learning, neural network, neuron model, (19 more...)

arXiv.org Machine Learning

2006.08761

Country:

North America > United States > Indiana > Tippecanoe County (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback