Goto

Collaborating Authors

 graph db


Aligning Large Language Models to a Domain-specific Graph Database

Liang, Yuanyuan, Tan, Keren, Xie, Tingyu, Tao, Wenbiao, Wang, Siyuan, Lan, Yunshi, Qian, Weining

arXiv.org Artificial Intelligence

Graph Databases (Graph DB) are widely applied in various fields, including finance, social networks, and medicine. However, translating Natural Language (NL) into the Graph Query Language (GQL), commonly known as NL2GQL, proves to be challenging due to its inherent complexity and specialized nature. Some approaches have sought to utilize Large Language Models (LLMs) to address analogous tasks like text2SQL. Nevertheless, when it comes to NL2GQL taskson a particular domain, the absence of domain-specific NL-GQL data pairs makes it difficult to establish alignment between LLMs and the graph DB. To address this challenge, we propose a well-defined pipeline. Specifically, we utilize ChatGPT to create NL-GQL data pairs based on the given graph DB with self-instruct. Then, we use the created data to fine-tune LLMs, thereby achieving alignment between LLMs and the graph DB. Additionally, during inference, we propose a method that extracts relevant schema to the queried NL as the input context to guide LLMs for generating accurate GQLs.We evaluate our method on two constructed datasets deriving from graph DBs in finance domain and medicine domain, namely FinGQL and MediGQL. Experimental results demonstrate that our method significantly outperforms a set of baseline methods, with improvements of 5.90 and 6.36 absolute points on EM, and 6.00 and 7.09 absolute points on EX, respectively.


Neural Graph Databases. A new milestone in graph data…

#artificialintelligence

Vanilla graph databases are pretty much everywhere thanks to the ever-growing graphs in production, flexible graph data models, and expressive query languages. Query engines assume that graphs in classical graph DBs are complete. Under the completeness assumption, we can build indexes, store the graphs in a variety of read/write-optimized formats and expect the DB would return what is there. But this assumption does not often hold in practice (we'd say, doesn't hold way too often). If we look at some prominent knowledge graphs (KGs): in Freebase, 93.8% of people have no place of birth and 78.5% have no nationality, about 68% of people do not have any profession, while in Wikidata, about 50% of artists have no date of birth, and only 0.4% of known buildings have information about height.


Graph Analytics: Part 1

#artificialintelligence

In my past 3 years as a Data Science professional, I have worked extensively with both RDBMS (Postgres) & Cassandra (NoSQL) but didn't get a chance to explore Graph databases. So, it's time to jump onto graph databases & how they can be integrated into different data science solutions. Consider this: Observe Google Maps for any city. A graph is basically a collection of Nodes (the landmarks) & edges(the roads). Nodes are connected (or may not be connected at all)to each other using the edges. Neo4j is the most popular database for analyzing graph data.