AITopics | Trinh, Ha

Collaborating Authors

Trinh, Ha

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Principal Graph Encoder Embedding and Principal Community Detection

Shen, Cencheng, Dong, Yuexiao, Priebe, Carey E., Larson, Jonathan, Trinh, Ha, Park, Youngser

arXiv.org Machine LearningJan-24-2025

In this paper, we introduce the concept of principal communities and propose a principal graph encoder embedding method that concurrently detects these communities and achieves vertex embedding. Given a graph adjacency matrix with vertex labels, the method computes a sample community score for each community, ranking them to measure community importance and estimate a set of principal communities. The method then produces a vertex embedding by retaining only the dimensions corresponding to these principal communities. Theoretically, we define the population version of the encoder embedding and the community score based on a random Bernoulli graph distribution. We prove that the population principal graph encoder embedding preserves the conditional density of the vertex labels and that the population community score successfully distinguishes the principal communities. We conduct a variety of simulations to demonstrate the finite-sample accuracy in detecting ground-truth principal communities, as well as the advantages in embedding visualization and subsequent vertex classification. The method is further applied to a set of real-world graphs, showcasing its numerical advantages, including robustness to label noise and computational scalability.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2501.14939

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Refined Graph Encoder Embedding via Self-Training and Latent Community Recovery

Shen, Cencheng, Larson, Jonathan, Trinh, Ha, Priebe, Carey E.

arXiv.org Machine LearningMay-21-2024

This paper introduces a refined graph encoder embedding method, enhancing the original graph encoder embedding using linear transformation, self-training, and hidden community recovery within observed communities. We provide the theoretical rationale for the refinement procedure, demonstrating how and why our proposed method can effectively identify useful hidden communities via stochastic block models, and how the refinement method leads to improved vertex embedding and better decision boundaries for subsequent vertex classification. The efficacy of our approach is validated through a collection of simulated and real-world graph data.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Machine Learning

2405.12797

Country: North America > United States > Delaware > New Castle County > Newark (0.14)

Genre: Research Report (0.40)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

Edge, Darren, Trinh, Ha, Cheng, Newman, Bradley, Joshua, Chao, Alex, Mody, Apurva, Truitt, Steven, Larson, Jonathan

arXiv.org Artificial IntelligenceApr-24-2024

The use of retrieval-augmented generation (RAG) to retrieve relevant information from an external knowledge source enables large language models (LLMs) to answer questions over private and/or previously unseen document collections. However, RAG fails on global questions directed at an entire text corpus, such as "What are the main themes in the dataset?", since this is inherently a query-focused summarization (QFS) task, rather than an explicit retrieval task. Prior QFS methods, meanwhile, fail to scale to the quantities of text indexed by typical RAG systems. To combine the strengths of these contrasting methods, we propose a Graph RAG approach to question answering over private text corpora that scales with both the generality of user questions and the quantity of source text to be indexed. Our approach uses an LLM to build a graph-based text index in two stages: first to derive an entity knowledge graph from the source documents, then to pregenerate community summaries for all groups of closely-related entities. Given a question, each community summary is used to generate a partial response, before all partial responses are again summarized in a final response to the user. For a class of global sensemaking questions over datasets in the 1 million token range, we show that Graph RAG leads to substantial improvements over a na\"ive RAG baseline for both the comprehensiveness and diversity of generated answers. An open-source, Python-based implementation of both global and local Graph RAG approaches is forthcoming at https://aka.ms/graphrag.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2404.1613

Country: North America > Canada (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment (0.94)
Media (0.94)
Law (0.67)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Discovering Communication Pattern Shifts in Large-Scale Labeled Networks using Encoder Embedding and Vertex Dynamics

Shen, Cencheng, Larson, Jonathan, Trinh, Ha, Qin, Xihan, Park, Youngser, Priebe, Carey E.

arXiv.org Machine LearningNov-29-2023

Analyzing large-scale time-series network data, such as social media and email communications, poses a significant challenge in understanding social dynamics, detecting anomalies, and predicting trends. In particular, the scalability of graph analysis is a critical hurdle impeding progress in large-scale downstream inference. To address this challenge, we introduce a temporal encoder embedding method. This approach leverages ground-truth or estimated vertex labels, enabling an efficient embedding of large-scale graph data and the processing of billions of edges within minutes. Furthermore, this embedding unveils a temporal dynamic statistic capable of detecting communication pattern shifts across all levels, ranging from individual vertices to vertex communities and the overall graph structure. We provide theoretical support to confirm its soundness under random graph models, and demonstrate its numerical advantages in capturing evolving communities and identifying outliers. Finally, we showcase the practical application of our approach by analyzing an anonymized time-series communication network from a large organization spanning 2019-2020, enabling us to assess the impact of Covid-19 on workplace communication patterns.

data mining, machine learning, vertex, (15 more...)

arXiv.org Machine Learning

2305.02381

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.54)
Health & Medicine > Therapeutic Area > Immunology (0.54)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Synergistic Graph Fusion via Encoder Embedding

Shen, Cencheng, Priebe, Carey E., Larson, Jonathan, Trinh, Ha

arXiv.org Machine LearningOct-6-2023

This information can be captured by an n n adjacency matrix A, where A(i, j) = 1 indicates the existence of an edge between vertices i and j, and A(i, j) = 0 indicates the absence of an edge. Traditionally, the adjacency matrix is binary and primarily used for representing unweighted graphs. Its capacity extends to handling weighted graphs, where the elements of A are assigned the corresponding edge weights. Moreover, graph representation can be used for any data via proper transformations. For instance, when vertex attributes are available, they can be transformed into a pairwise distance or kernel matrix, thus creating a general graph structure. This adaptability enables the incorporation of additional information from vertex attributes, making graph representation a powerful tool for data analysis that goes beyond traditional binary or weighted graphs. Graph embedding is a fundamental and versatile approach for analyzing and exploring graph data, encompassing a wide range of techniques such as spectral embedding [7, 8], graph convolutional neural networks [9, 10], node2vec [11, 12], among others. By projecting the vertices into a low-dimensional space while preserving the structural information of the graph, graph embedding yields vertex representations in Euclidean space that facilitate various downstream inference tasks, such as community detection [13, 14], vertex classification [15, 9], outlier detection [16, 17], etc.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2303.18051

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.48)

Add feedback