AITopics | Ahmed, Shafiuddin Rehan

Collaborating Authors

Ahmed, Shafiuddin Rehan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS

Ahmed, Shafiuddin Rehan, Shah, Ankit Parag, Tran, Quan Hung, Khetan, Vivek, Kang, Sukryool, Mehta, Ankit, Bao, Yujia, Wei, Wei

arXiv.org Artificial IntelligenceMar-10-2025

Climate change has intensified the need for transparency and accountability in organizational practices, making Environmental, Social, and Governance (ESG) reporting increasingly crucial. Frameworks like the Global Reporting Initiative (GRI) and the new European Sustainability Reporting Standards (ESRS) aim to standardize ESG reporting, yet generating comprehensive reports remains challenging due to the considerable length of ESG documents and variability in company reporting styles. To facilitate ESG report automation, Retrieval-Augmented Generation (RAG) systems can be employed, but their development is hindered by a lack of labeled data suitable for training retrieval models. In this paper, we leverage an underutilized source of weak supervision -- the disclosure content index found in past ESG reports -- to create a comprehensive dataset, ESG-CID, for both GRI and ESRS standards. By extracting mappings between specific disclosure requirements and corresponding report sections, and refining them using a Large Language Model as a judge, we generate a robust training and evaluation set. We benchmark popular embedding models on this dataset and show that fine-tuning BERT-based models can outperform commercial embeddings and leading public models, even under temporal data splits for cross-report style transfer from GRI to ESRS

community relations, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2503.10674

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report (0.64)
Public Relations > Community Relations (0.35)

Industry:

Law (1.00)
Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Generating Harder Cross-document Event Coreference Resolution Datasets using Metaphoric Paraphrasing

Ahmed, Shafiuddin Rehan, Wang, Zhiyong Eric, Baker, George Arthur, Stowe, Kevin, Martin, James H.

arXiv.org Artificial IntelligenceJun-5-2024

The most popular Cross-Document Event Coreference Resolution (CDEC) datasets fail to convey the true difficulty of the task, due to the lack of lexical diversity between coreferring event triggers (words or phrases that refer to an event). Furthermore, there is a dearth of event datasets for figurative language, limiting a crucial avenue of research in event comprehension. We address these two issues by introducing ECB+META, a lexically rich variant of Event Coref Bank Plus (ECB+) for CDEC on symbolic and metaphoric language. We use ChatGPT as a tool for the metaphoric transformation of sentences in the documents of ECB+, then tag the original event triggers in the transformed sentences in a semi-automated manner. In this way, we avoid the re-annotation of expensive coreference links. We present results that show existing methods that work well on ECB+ struggle with ECB+META, thereby paving the way for CDEC research on a much more challenging dataset. Code/data: https://github.com/ahmeshaf/llms_coref

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2407.11988

Country:

Europe (1.00)
North America > United States > Colorado (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles

Nath, Abhijnan, Jamil, Huma, Ahmed, Shafiuddin Rehan, Baker, George, Ghosh, Rahul, Martin, James H., Blanchard, Nathaniel, Krishnaswamy, Nikhil

arXiv.org Artificial IntelligenceApr-13-2024

Event coreference resolution (ECR) is the task of determining whether distinct mentions of events within a multi-document corpus are actually linked to the same underlying occurrence. Images of the events can help facilitate resolution when language is ambiguous. Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models. As existing ECR benchmark datasets rarely provide images for all event mentions, we augment the popular ECB+ dataset with event-centric images scraped from the internet and generated using image diffusion models. We establish three methods that incorporate images and text for coreference: 1) a standard fused model with finetuning, 2) a novel linear mapping method without finetuning and 3) an ensembling approach based on splitting mention pairs by semantic and discourse-level difficulty. We evaluate on 2 datasets: the augmented ECB+, and AIDA Phase 1. Our ensemble systems using cross-modal linear mapping establish an upper limit (91.9 CoNLL F1) on ECB+ ECR performance given the preprocessing assumptions used, and establish a novel baseline on AIDA Phase 1. Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems, and highlight a need for more multimodal resources in the coreference resolution space.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2404.08949

Country:

Asia > Middle East (0.68)
North America > United States > Indiana > Tippecanoe County (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre: Research Report > New Finding (0.54)

Industry:

Government > Regional Government (1.00)
Law (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(2 more...)

Add feedback

From Algebraic Word Problem to Program: A Formalized Approach

Wiemerslage, Adam, Ahmed, Shafiuddin Rehan

arXiv.org Artificial IntelligenceApr-6-2024

In this paper, we propose a pipeline to convert grade school level algebraic word problem into program of a formal languageA-IMP. Using natural language processing tools, we break the problem into sentence fragments which can then be reduced to functions. The functions are categorized by the head verb of the sentence and its structure, as defined by (Hosseini et al., 2014). We define the function signature and extract its arguments from the text using dependency parsing. We have a working implementation of the entire pipeline which can be found on our github repository.

algebraic word problem, artificial intelligence, natural language, (1 more...)

arXiv.org Artificial Intelligence

2003.11517

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Linear Cross-document Event Coreference Resolution with X-AMR

Ahmed, Shafiuddin Rehan, Baker, George Arthur, Judge, Evi, Regan, Michael, Wright-Bettner, Kristin, Palmer, Martha, Martin, James H.

arXiv.org Artificial IntelligenceMar-24-2024

Event Coreference Resolution (ECR) as a pairwise mention classification task is expensive both for automated systems and manual annotations. The task's quadratic difficulty is exacerbated when using Large Language Models (LLMs), making prompt engineering for ECR prohibitively costly. In this work, we propose a graphical representation of events, X-AMR, anchored around individual mentions using a \textbf{cross}-document version of \textbf{A}bstract \textbf{M}eaning \textbf{R}epresentation. We then linearize the ECR with a novel multi-hop coreference algorithm over the event graphs. The event graphs simplify ECR, making it a) LLM cost-effective, b) compositional and interpretable, and c) easily annotated. For a fair assessment, we first enrich an existing ECR benchmark dataset with these event graphs using an annotator-friendly tool we introduce. Then, we employ GPT-4, the newest LLM by OpenAI, for these annotations. Finally, using the ECR algorithm, we assess GPT-4 against humans and analyze its limitations. Through this research, we aim to advance the state-of-the-art for efficient ECR and shed light on the potential shortcomings of current LLMs at this task. Code and annotations: \url{https://github.com/ahmeshaf/gpt_coref}

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2404.08656

Country:

Europe (1.00)
Asia (0.93)
North America > United States > Colorado (0.46)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

X-AMR Annotation Tool

Ahmed, Shafiuddin Rehan, Cai, Jon Z., Palmer, Martha, Martin, James H.

arXiv.org Artificial IntelligenceFeb-29-2024

To illustrate the challenge of coreference across Semantic representations of events play a pivotal documents, consider the following example: Two role in natural language processing (NLP) tasks, facilitating news articles discuss a corporate acquisition. In the understanding and extraction of meaningful one article, the event is described as "Company A's information from text. Among the various purchase of Company B on July 1st, 2008" while approaches to represent events, Semantic Role Labeling in another article, it is referred to as "In 7/08 Company (SRL; Palmer et al. (2005)) and Abstract B was acquired by Company A." Establishing Meaning Representation (AMR; Banarescu et al. the coreference relationship between these two descriptions (2013)) have gained significant attention. In this is non-trivial, yet crucial for creating a paper, we delve into the realm of semantic event comprehensive representation of the acquisition representations, with a particular focus on a method event.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.15407

Country:

Europe (1.00)
North America > United States > Colorado (0.14)
North America > United States > New Mexico (0.14)

Genre: Research Report (1.00)

Industry: Law > Business Law (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CAMRA: Copilot for AMR Annotation

Cai, Jon Z., Ahmed, Shafiuddin Rehan, Bonn, Julia, Wright-Bettner, Kristin, Palmer, Martha, Martin, James H.

arXiv.org Artificial IntelligenceNov-17-2023

In this paper, we introduce CAMRA (Copilot for AMR Annotatations), a cutting-edge web-based tool designed for constructing Abstract Meaning Representation (AMR) from natural language text. CAMRA offers a novel approach to deep lexical semantics annotation such as AMR, treating AMR annotation akin to coding in programming languages. Leveraging the familiarity of programming paradigms, CAMRA encompasses all essential features of existing AMR editors, including example lookup, while going a step further by integrating Propbank roleset lookup as an autocomplete feature within the tool. Notably, CAMRA incorporates AMR parser models as coding co-pilots, greatly enhancing the efficiency and accuracy of AMR annotators. To demonstrate the tool's capabilities, we provide a live demo accessible at: https://camra.colorado.edu

annotation, artificial intelligence, natural language, (19 more...)

arXiv.org Artificial Intelligence

2311.10928

Country: North America > United States > Colorado (0.34)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

How Good is the Model in Model-in-the-loop Event Coreference Resolution Annotation?

Ahmed, Shafiuddin Rehan, Nath, Abhijnan, Regan, Michael, Pollins, Adam, Krishnaswamy, Nikhil, Martin, James H.

arXiv.org Artificial IntelligenceJun-6-2023

Annotating cross-document event coreference links is a time-consuming and cognitively demanding task that can compromise annotation quality and efficiency. To address this, we propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only. We evaluate the effectiveness of this approach by first simulating the annotation process and then, using a novel annotator-centric Recall-Annotation effort trade-off metric, we compare the results of various underlying models and datasets. We finally present a method for obtaining 97\% recall while substantially reducing the workload required by a fully manual annotation process. Code and data can be found at https://github.com/ahmeshaf/model_in_coref

computational linguistic, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2306.05434

Country:

Europe (1.00)
North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Colorado > Boulder County > Boulder (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

$2 * n$ is better than $n^2$: Decomposing Event Coreference Resolution into Two Tractable Problems

Ahmed, Shafiuddin Rehan, Nath, Abhijnan, Martin, James H., Krishnaswamy, Nikhil

arXiv.org Artificial IntelligenceMay-9-2023

Event Coreference Resolution (ECR) is the task of linking mentions of the same event either within or across documents. Most mention pairs are not coreferent, yet many that are coreferent can be identified through simple techniques such as lemma matching of the event triggers or the sentences in which they appear. Existing methods for training coreference systems sample from a largely skewed distribution, making it difficult for the algorithm to learn coreference beyond surface matching. Additionally, these methods are intractable because of the quadratic operations needed. To address these challenges, we break the problem of ECR into two parts: a) a heuristic to efficiently filter out a large number of non-coreferent pairs, and b) a training approach on a balanced set of coreferent and non-coreferent mention pairs. By following this approach, we show that we get comparable results to the state of the art on two popular ECR datasets while significantly reducing compute requirements. We also analyze the mention pairs that are "hard" to accurately classify as coreferent or non-coreferent. Code at https://github.com/ahmeshaf/lemma_ce_coref

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2305.05672

Country:

Europe (1.00)
North America > United States > Colorado (0.46)
North America > United States > Louisiana (0.28)

Genre: Research Report (1.00)

Industry: Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Within-Document Event Coreference with BERT-Based Contextualized Representations

Ahmed, Shafiuddin Rehan, Martin, James H.

arXiv.org Artificial IntelligenceFeb-15-2021

Event coreference continues to be a challenging problem in information extraction. With the absence of any external knowledge bases for events, coreference becomes a clustering task that relies on effective representations of the context in which event mentions appear. Recent advances in contextualized language representations have proven successful in many tasks, however, their use in event linking been limited. Here we present a three part approach that (1) uses representations derived from a pretrained BERT model to (2) train a neural classifier to (3) drive a simple clustering algorithm to create coreference chains. We achieve state of the art results with this model on two standard datasets for within-document event coreference task and establish a new standard on a third newer dataset.

coreference, neural network, survey article, (17 more...)

arXiv.org Artificial Intelligence

2102.096

Country:

Europe (1.00)
North America > United States > Colorado (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.34)

Add feedback