AITopics | Phan, Hung

Collaborating Authors

Phan, Hung

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RAG vs. Long Context: Examining Frontier Large Language Models for Environmental Review Document Comprehension

Phan, Hung, Acharya, Anurag, Chaturvedi, Sarthak, Sharma, Shivam, Parker, Mike, Nally, Dan, Jannesari, Ali, Pazdernik, Karl, Halappanavar, Mahantesh, Munikoti, Sai, Horawalavithana, Sameera

arXiv.org Artificial IntelligenceJul-9-2024

Large Language Models (LLMs) have been applied to many research problems across various domains. One of the applications of LLMs is providing question-answering systems that cater to users from different fields. The effectiveness of LLM-based question-answering systems has already been established at an acceptable level for users posing questions in popular and public domains such as trivia and literature. However, it has not often been established in niche domains that traditionally require specialized expertise. To this end, we construct the NEPAQuAD1.0 benchmark to evaluate the performance of three frontier LLMs -- Claude Sonnet, Gemini, and GPT-4 -- when answering questions originating from Environmental Impact Statements prepared by U.S. federal government agencies in accordance with the National Environmental Environmental Act (NEPA). We specifically measure the ability of LLMs to understand the nuances of legal, technical, and compliance-related information present in NEPA documents in different contextual scenarios. For example, we test the LLMs' internal prior NEPA knowledge by providing questions without any context, as well as assess how LLMs synthesize the contextual information present in long NEPA documents to facilitate the question/answering task. We compare the performance of the long context LLMs and RAG powered models in handling different types of questions (e.g., problem-solving, divergent). Our results suggest that RAG powered models significantly outperform the long context models in the answer accuracy regardless of the choice of the frontier LLM. Our further analysis reveals that many models perform better answering closed questions than divergent and problem-solving questions.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.07321

Country:

North America > United States > Washington > Benton County > Richland (0.14)
North America > United States > Nevada (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Law > Environmental Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

The Landscape and Challenges of HPC Research and LLMs

Chen, Le, Ahmed, Nesreen K., Dutta, Akash, Bhattacharjee, Arijit, Yu, Sixing, Mahmud, Quazi Ishtiaque, Abebe, Waqwoya, Phan, Hung, Sarkar, Aishwarya, Butler, Branden, Hasabnis, Niranjan, Oren, Gal, Vo, Vy A., Munoz, Juan Pablo, Willke, Theodore L., Mattson, Tim, Jannesari, Ali

arXiv.org Artificial IntelligenceFeb-6-2024

Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing and code-based tasks. Over the past several years, many research labs and institutions have invested heavily in high-performance computing, approaching or breaching exascale performance levels. In this paper, we posit that adapting and utilizing such language model-based techniques for tasks in high-performance computing (HPC) would be very beneficial. This study presents our reasoning behind the aforementioned position and highlights how existing ideas can be improved and adapted for HPC tasks.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.02018

Country: North America > United States > New York (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry:

Information Technology > Services (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Parallelize with OpenMP by Augmented Heterogeneous AST Representation

Chen, Le, Mahmud, Quazi Ishtiaque, Phan, Hung, Ahmed, Nesreen K., Jannesari, Ali

arXiv.org Artificial IntelligenceMay-9-2023

Detecting parallelizable code regions is a challenging task, even for experienced developers. Numerous recent studies have explored the use of machine learning for code analysis and program synthesis, including parallelization, in light of the success of machine learning in natural language processing. However, applying machine learning techniques to parallelism detection presents several challenges, such as the lack of an adequate dataset for training, an effective code representation with rich information, and a suitable machine learning model to learn the latent features of code for diverse analyses. To address these challenges, we propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code. The proposed approach primarily focused on loop-level parallelization with OpenMP. Moreover, we create an OMP\_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models. Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85\% accuracy and outperforms the state-of-the-art token-based machine learning approach. These results indicate that our approach is competitive with state-of-the-art tools and capable of handling loops with complex structures that other tools may overlook.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2305.05779

Country: North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.89)

Add feedback