aliase
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
- North America > United States > Virginia (0.04)
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- Europe > Ukraine (0.04)
- Media (1.00)
- Leisure & Entertainment > Sports (1.00)
- Banking & Finance (0.96)
- (3 more...)
- Information Technology > Communications > Social Media (0.97)
- Information Technology > Communications > Mobile (0.95)
- Information Technology > Artificial Intelligence (0.95)
LINK-KG: LLM-Driven Coreference-Resolved Knowledge Graphs for Human Smuggling Networks
Meher, Dipak, Domeniconi, Carlotta, Correa-Cabrera, Guadalupe
Abstract--Human smuggling networks are complex and constantly evolving, making them difficult to analyze comprehensively. Legal case documents offer rich factual and procedural insights into these networks but are often long, unstructured, and filled with ambiguous or shifting references, posing significant challenges for automated knowledge graph (KG) construction. Existing methods either overlook coreference resolution or fail to scale beyond short text spans, leading to fragmented graphs and inconsistent entity linking. We propose LINK-KG, a modular framework that integrates a three-stage, LLM-guided coreference resolution pipeline with downstream KG extraction. At the core of our approach is a type-specific Prompt Cache, which consistently tracks and resolves references across document chunks, enabling clean and disambiguated narratives for structured knowledge graph construction from both short and long legal texts. LINK-KG reduces average node duplication by 45.21% and noisy nodes by 32.22% compared to baseline methods, resulting in cleaner and more coherent graph structures. Human smuggling networks represent highly adaptive and organized systems involving a web of actors, routes, vehicles, and intermediaries, often operating under the radar of restrictive immigration policies [1]. These networks exploit legal loopholes, adjust swiftly to enforcement changes, and frequently intersect with transnational criminal organizations. Effectively analyzing their structure and behavior is critical for informing policy, enhancing security, and preventing exploitation. However, much of the actionable insight remains embedded in lengthy, unstructured legal documents, such as court rulings, field reports, and case transcripts, making automated analysis both essential and challenging.
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Government > Immigration & Customs (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.86)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (4 more...)
KG-TRICK: Unifying Textual and Relational Information Completion of Knowledge for Multilingual Knowledge Graphs
Zhou, Zelin, Conia, Simone, Lee, Daniel, Li, Min, Huang, Shenglei, Minhas, Umar Farooq, Potdar, Saloni, Xiao, Henry, Li, Yunyao
Multilingual knowledge graphs (KGs) provide high-quality relational and textual information for various NLP applications, but they are often incomplete, especially in non-English languages. Previous research has shown that combining information from KGs in different languages aids either Knowledge Graph Completion (KGC), the task of predicting missing relations between entities, or Knowledge Graph Enhancement (KGE), the task of predicting missing textual information for entities. Although previous efforts have considered KGC and KGE as independent tasks, we hypothesize that they are interdependent and mutually beneficial. To this end, we introduce KG-TRICK, a novel sequence-to-sequence framework that unifies the tasks of textual and relational information completion for multilingual KGs. KG-TRICK demonstrates that: i) it is possible to unify the tasks of KGC and KGE into a single framework, and ii) combining textual information from multiple languages is beneficial to improve the completeness of a KG. As part of our contributions, we also introduce WikiKGE10++, the largest manually-curated benchmark for textual information completion of KGs, which features over 25,000 entities across 10 diverse languages.
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (7 more...)
Sims: An Interactive Tool for Geospatial Matching and Clustering
Zaytar, Akram, Tadesse, Girmaw Abebe, Robinson, Caleb, Bendito, Eduardo G., Devare, Medha, Chernet, Meklit, Hacheme, Gilles Q., Dodhia, Rahul, Ferres, Juan M. Lavista
Acquiring, processing, and visualizing geospatial data requires significant computing resources, especially for large spatio-temporal domains. This challenge hinders the rapid discovery of predictive features, which is essential for advancing geospatial modeling. To address this, we developed Similarity Search (Sims), a no-code web tool that allows users to perform clustering and similarity search over defined regions of interest using Google Earth Engine as a backend. Sims is designed to complement existing modeling tools by focusing on feature exploration rather than model creation. We demonstrate the utility of Sims through a case study analyzing simulated maize yield data in Rwanda, where we evaluate how different combinations of soil, weather, and agronomic features affect the clustering of yield response zones. Sims is open source and available at https://github.com/microsoft/Sims
The Role of Natural Language Processing Tasks in Automatic Literary Character Network Construction
Amalvy, Arthur, Labatut, Vincent, Dufour, Richard
The automatic extraction of character networks from literary texts is generally carried out using natural language processing (NLP) cascading pipelines. While this approach is widespread, no study exists on the impact of low-level NLP tasks on their performance. In this article, we conduct such a study on a literary dataset, focusing on the role of named entity recognition (NER) and coreference resolution when extracting co-occurrence networks. To highlight the impact of these tasks' performance, we start with gold-standard annotations, progressively add uniformly distributed errors, and observe their impact in terms of character network quality. We demonstrate that NER performance depends on the tested novel and strongly affects character detection. We also show that NER-detected mentions alone miss a lot of character co-occurrences, and that coreference resolution is needed to prevent this. Finally, we present comparison points with 2 methods based on large language models (LLMs), including a fully end-to-end one, and show that these models are outperformed by traditional NLP pipelines in terms of recall.
Data-driven Coreference-based Ontology Building
Ashury-Tahan, Shir, Cohen, Amir David Nissan, Cohen, Nadav, Louzoun, Yoram, Goldberg, Yoav
While coreference resolution is traditionally used as a component in individual document understanding, in this work we take a more global view and explore what can we learn about a domain from the set of all document-level coreference relations that are present in a large corpus. We derive coreference chains from a corpus of 30 million biomedical abstracts and construct a graph based on the string phrases within these chains, establishing connections between phrases if they co-occur within the same coreference chain. We then use the graph structure and the betweeness centrality measure to distinguish between edges denoting hierarchy, identity and noise, assign directionality to edges denoting hierarchy, and split nodes (strings) that correspond to multiple distinct concepts. The result is a rich, data-driven ontology over concepts in the biomedical domain, parts of which overlaps significantly with human-authored ontologies. We release the coreference chains and resulting ontology under a creative-commons license, along with the code.
- North America > United States > Illinois (0.04)
- Europe > Monaco (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (3 more...)
KnowPhish: Large Language Models Meet Multimodal Knowledge Graphs for Enhancing Reference-Based Phishing Detection
Li, Yuexin, Huang, Chengyu, Deng, Shumin, Lock, Mei Lin, Cao, Tri, Oo, Nay, Lim, Hoon Wei, Hooi, Bryan
Phishing attacks have inflicted substantial losses on individuals and businesses alike, necessitating the development of robust and efficient automated phishing detection approaches. Reference-based phishing detectors (RBPDs), which compare the logos on a target webpage to a known set of logos, have emerged as the state-of-the-art approach. However, a major limitation of existing RBPDs is that they rely on a manually constructed brand knowledge base, making it infeasible to scale to a large number of brands, which results in false negative errors due to the insufficient brand coverage of the knowledge base. To address this issue, we propose an automated knowledge collection pipeline, using which we collect a large-scale multimodal brand knowledge base, KnowPhish, containing 20k brands with rich information about each brand. KnowPhish can be used to boost the performance of existing RBPDs in a plug-and-play manner. A second limitation of existing RBPDs is that they solely rely on the image modality, ignoring useful textual information present in the webpage HTML. To utilize this textual information, we propose a Large Language Model (LLM)-based approach to extract brand information of webpages from text. Our resulting multimodal phishing detection approach, KnowPhish Detector (KPD), can detect phishing webpages with or without logos. We evaluate KnowPhish and KPD on a manually validated dataset, and a field study under Singapore's local context, showing substantial improvements in effectiveness and efficiency compared to state-of-the-art baselines.
- Asia > Singapore (0.25)
- Oceania > Australia (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (10 more...)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Multilingual Entity Linking Using Dense Retrieval
Entity linking (EL) is the computational process of connecting textual mentions to corresponding entities. Like many areas of natural language processing, the EL field has greatly benefited from deep learning, leading to significant performance improvements. However, present-day approaches are expensive to train and rely on diverse data sources, complicating their reproducibility. In this thesis, we develop multiple systems that are fast to train, demonstrating that competitive entity linking can be achieved without a large GPU cluster. Moreover, we train on a publicly available dataset, ensuring reproducibility and accessibility. Our models are evaluated for 9 languages giving an accurate overview of their strengths. Furthermore, we offer a~detailed analysis of bi-encoder training hyperparameters, a popular approach in EL, to guide their informed selection. Overall, our work shows that building competitive neural network based EL systems that operate in multiple languages is possible even with limited resources, thus making EL more approachable.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
- (21 more...)