Goto

Collaborating Authors

 Semantic Networks


Investigating Idiomaticity in Word Representations

arXiv.org Artificial Intelligence

Idiomatic expressions are an integral part of human languages, often used to express complex ideas in compressed or conventional ways (e.g. eager beaver as a keen and enthusiastic person). However, their interpretations may not be straightforwardly linked to the meanings of their individual components in isolation and this may have an impact for compositional approaches. In this paper, we investigate to what extent word representation models are able to go beyond compositional word combinations and capture multiword expression idiomaticity and some of the expected properties related to idiomatic meanings. We focus on noun compounds of varying levels of idiomaticity in two languages (English and Portuguese), presenting a dataset of minimal pairs containing human idiomaticity judgments for each noun compound at both type and token levels, their paraphrases and their occurrences in naturalistic and sense-neutral contexts, totalling 32,200 sentences. We propose this set of minimal pairs for evaluating how well a model captures idiomatic meanings, and define a set of fine-grained metrics of Affinity and Scaled Similarity, to determine how sensitive the models are to perturbations that may lead to changes in idiomaticity. The results obtained with a variety of representative and widely used models indicate that, despite superficial indications to the contrary in the form of high similarities, idiomaticity is not yet accurately represented in current models. Moreover, the performance of models with different levels of contextualisation suggests that their ability to capture context is not yet able to go beyond more superficial lexical clues provided by the words and to actually incorporate the relevant semantic clues needed for idiomaticity.


JPEC: A Novel Graph Neural Network for Competitor Retrieval in Financial Knowledge Graphs

arXiv.org Artificial Intelligence

Knowledge graphs have gained popularity for their ability to organize and analyze complex data effectively. When combined with graph embedding techniques, such as graph neural networks (GNNs), knowledge graphs become a potent tool in providing valuable insights. This study explores the application of graph embedding in identifying competitors from a financial knowledge graph. Existing state-of-the-art(SOTA) models face challenges due to the unique attributes of our knowledge graph, including directed and undirected relationships, attributed nodes, and minimal annotated competitor connections. To address these challenges, we propose a novel graph embedding model, JPEC(JPMorgan Proximity Embedding for Competitor Detection), which utilizes graph neural network to learn from both first-order and second-order node proximity together with vital features for competitor retrieval. JPEC had outperformed most existing models in extensive experiments, showcasing its effectiveness in competitor retrieval.


Attr-Int: A Simple and Effective Entity Alignment Framework for Heterogeneous Knowledge Graphs

arXiv.org Artificial Intelligence

Entity alignment (EA) refers to the task of linking entities in different knowledge graphs (KGs). Existing EA methods rely heavily on structural isomorphism. However, in real-world KGs, aligned entities usually have non-isomorphic neighborhood structures, which paralyses the application of these structure-dependent methods. In this paper, we investigate and tackle the problem of entity alignment between heterogeneous KGs. First, we propose two new benchmarks to closely simulate real-world EA scenarios of heterogeneity. Then we conduct extensive experiments to evaluate the performance of representative EA methods on the new benchmarks. Finally, we propose a simple and effective entity alignment framework called Attr-Int, in which innovative attribute information interaction methods can be seamlessly integrated with any embedding encoder for entity alignment, improving the performance of existing entity alignment techniques. Experiments demonstrate that our framework outperforms the state-of-the-art approaches on two new benchmarks.


DPCL-Diff: The Temporal Knowledge Graph Reasoning based on Graph Node Diffusion Model with Dual-Domain Periodic Contrastive Learning

arXiv.org Artificial Intelligence

Temporal knowledge graph (TKG) reasoning that infers future missing facts is an essential and challenging task. Predicting future events typically relies on closely related historical facts, yielding more accurate results for repetitive or periodic events. However, for future events with sparse historical interactions, the effectiveness of this method, which focuses on leveraging high-frequency historical information, diminishes. Recently, the capabilities of diffusion models in image generation have opened new opportunities for TKG reasoning. Therefore, we propose a graph node diffusion model with dual-domain periodic contrastive learning (DPCL-Diff). Graph node diffusion model (GNDiff) introduces noise into sparsely related events to simulate new events, generating high-quality data that better conforms to the actual distribution. This generative mechanism significantly enhances the model's ability to reason about new events. Additionally, the dual-domain periodic contrastive learning (DPCL) maps periodic and non-periodic event entities to Poincar\'e and Euclidean spaces, leveraging their characteristics to distinguish similar periodic events effectively. Experimental results on four public datasets demonstrate that DPCL-Diff significantly outperforms state-of-the-art TKG models in event prediction, demonstrating our approach's effectiveness. This study also investigates the combined effectiveness of GNDiff and DPCL in TKG tasks.


Capturing and Anticipating User Intents in Data Analytics via Knowledge Graphs

arXiv.org Artificial Intelligence

In today's data-driven world, the ability to extract meaningful information from data is becoming essential for businesses, organizations and researchers alike. For that purpose, a wide range of tools and systems exist addressing data-related tasks, from data integration, preprocessing and modeling, to the interpretation and evaluation of the results. As data continues to grow in volume, variety, and complexity, there is an increasing need for advanced but user-friendly tools, such as intelligent discovery assistants (IDAs) or automated machine learning (AutoML) systems, that facilitate the user's interaction with the data. This enables non-expert users, such as citizen data scientists, to leverage powerful data analytics techniques effectively. The assistance offered by IDAs or AutoML tools should not be guided only by the analytical problem's data but should also be tailored to each individual user. To this end, this work explores the usage of Knowledge Graphs (KG) as a basic framework for capturing in a human-centered manner complex analytics workflows, by storing information not only about the workflow's components, datasets and algorithms but also about the users, their intents and their feedback, among others. The data stored in the generated KG can then be exploited to provide assistance (e.g., recommendations) to the users interacting with these systems. To accomplish this objective, two methods are explored in this work. Initially, the usage of query templates to extract relevant information from the KG is studied. However, upon identifying its main limitations, the usage of link prediction with knowledge graph embeddings is explored, which enhances flexibility and allows leveraging the entire structure and components of the graph. The experiments show that the proposed method is able to capture the graph's structure and to produce sensible suggestions.


Detecting text level intellectual influence with knowledge graph embeddings

arXiv.org Artificial Intelligence

Introduction Tracing the spread of ideas and the presence of influence is a question of special importance across a wide range of disciplines, ranging from intellectual history to cultural analytics, computational social science, and the science of science. Method We collect a corpus of open source journal articles, generate Knowledge Graph representations using the Gemini LLM, and attempt to predict the existence of citations between sampled pairs of articles using previously published methods and a novel Graph Neural Network based embedding model. Results We demonstrate that our knowledge graph embedding method is superior at distinguishing pairs of articles with and without citation. Once trained, it runs efficiently and can be fine - tuned on specific corpora to suit individual researcher needs. Conclusion ( s) This experiment demonstrates that the relationships encoded in a knowledge graph, especially the types of concepts brought together by specific relations can encode information capable of revealing intellectual influence. This suggests that further work in analyzing document level knowledge graphs to understand latent structures c ould provide valuable insights.


Semantic Enrichment of the Quantum Cascade Laser Properties in Text- A Knowledge Graph Generation Approach

arXiv.org Artificial Intelligence

A well structured collection of the various Quantum Cascade Laser (QCL) design and working properties data provides a platform to analyze and understand the relationships between these properties. By analyzing these relationships, we can gain insights into how different design features impact laser performance properties such as the working temperature. Most of these QCL properties are captured in scientific text. There is therefore need for efficient methodologies that can be utilized to extract QCL properties from text and generate a semantically enriched and interlinked platform where the properties can be analyzed to uncover hidden relations. There is also the need to maintain provenance and reference information on which these properties are based. Semantic Web technologies such as Ontologies and Knowledge Graphs have proven capability in providing interlinked data platforms for knowledge representation in various domains. In this paper, we propose an approach for generating a QCL properties Knowledge Graph (KG) from text for semantic enrichment of the properties. The approach is based on the QCL ontology and a Retrieval Augmented Generation (RAG) enabled information extraction pipeline based on GPT 4-Turbo language model. The properties of interest include: working temperature, laser design type, lasing frequency, laser optical power and the heterostructure. The experimental results demonstrate the feasibility and effectiveness of this approach for efficiently extracting QCL properties from unstructured text and generating a QCL properties Knowledge Graph, which has potential applications in semantic enrichment and analysis of QCL data.


Resilience in Knowledge Graph Embeddings

arXiv.org Machine Learning

In recent years, knowledge graphs have gained interest and witnessed widespread applications in various domains, such as information retrieval, question-answering, recommendation systems, amongst others. Large-scale knowledge graphs to this end have demonstrated their utility in effectively representing structured knowledge. To further facilitate the application of machine learning techniques, knowledge graph embedding (KGE) models have been developed. Such models can transform entities and relationships within knowledge graphs into vectors. However, these embedding models often face challenges related to noise, missing information, distribution shift, adversarial attacks, etc. This can lead to sub-optimal embeddings and incorrect inferences, thereby negatively impacting downstream applications. While the existing literature has focused so far on adversarial attacks on KGE models, the challenges related to the other critical aspects remain unexplored. In this paper, we, first of all, give a unified definition of resilience, encompassing several factors such as generalisation, performance consistency, distribution adaption, and robustness. After formalizing these concepts for machine learning in general, we define them in the context of knowledge graphs. To find the gap in the existing works on resilience in the context of knowledge graphs, we perform a systematic survey, taking into account all these aspects mentioned previously. Our survey results show that most of the existing works focus on a specific aspect of resilience, namely robustness. After categorizing such works based on their respective aspects of resilience, we discuss the challenges and future research directions.


Hierarchical Knowledge Graph Construction from Images for Scalable E-Commerce

arXiv.org Artificial Intelligence

Knowledge Graph (KG) is playing an increasingly important role in various AI systems. For e-commerce, an efficient and low-cost automated knowledge graph construction method is the foundation of enabling various successful downstream applications. In this paper, we propose a novel method for constructing structured product knowledge graphs from raw product images. The method cooperatively leverages recent advances in the vision-language model (VLM) and large language model (LLM), fully automating the process and allowing timely graph updates. We also present a human-annotated e-commerce product dataset for benchmarking product property extraction in knowledge graph construction. Our method outperforms our baseline in all metrics and evaluated properties, demonstrating its effectiveness and bright usage potential.


Knowledge Graph Enhanced Language Agents for Recommendation

arXiv.org Artificial Intelligence

Language agents have recently been used to simulate human behavior and user-item interactions for recommendation systems. However, current language agent simulations do not understand the relationships between users and items, leading to inaccurate user profiles and ineffective recommendations. In this work, we explore the utility of Knowledge Graphs (KGs), which contain extensive and reliable relationships between users and items, for recommendation. Our key insight is that the paths in a KG can capture complex relationships between users and items, eliciting the underlying reasons for user preferences and enriching user profiles. Leveraging this insight, we propose Knowledge Graph Enhanced Language Agents(KGLA), a framework that unifies language agents and KG for recommendation systems. In the simulated recommendation scenario, we position the user and item within the KG and integrate KG paths as natural language descriptions into the simulation. This allows language agents to interact with each other and discover sufficient rationale behind their interactions, making the simulation more accurate and aligned with real-world cases, thus improving recommendation performance. Our experimental results show that KGLA significantly improves recommendation performance (with a 33%-95% boost in NDCG@1 among three widely used benchmarks) compared to the previous best baseline method.