AITopics | citation recommendation

Collaborating Authors

citation recommendation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Hallucinate or Memorize? The Two Sides of Probabilistic Learning in Large Language Models

Niimi, Junichiro

arXiv.org Artificial IntelligenceNov-13-2025

Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in citation recommendation, the hallucination of non-existent papers remains a major issue. Building on prior studies, this study hypothesizes that an LLM's ability to correctly produce bibliographic records depends on whether the underlying knowledge is generated or memorized, with highly cited papers (i.e., more frequently appear in the pretraining corpus) showing lower hallucination rates. We therefore assume citation count as a proxy for training data redundancy (i.e., the frequency with which a given bibliographic record appears in the pretraining corpus) and investigate how citation frequency affects hallucinated references in LLM outputs. Using GPT-4.1, we generated and manually verified 100 citations across twenty computer-science domains, and measured factual consistency via cosine similarity between generated and authentic metadata. The results revealed that (i) citation count is strongly correlated with factual accuracy, (ii) bibliographic information becomes almost verbatim memorized beyond roughly 1,000 citations, and (iii) memory interference occurs when multiple highly cited papers share similar content. These findings indicate a threshold where generalization shifts into memorization, with highly cited papers being nearly verbatim retained in the model.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2511.08877

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Talk2Ref: A Dataset for Reference Prediction from Scientific Talks

Broy, Frederik, Züfle, Maike, Niehues, Jan

arXiv.org Artificial IntelligenceOct-29-2025

Scientific talks are a growing medium for disseminating research, and automatically identifying relevant literature that grounds or enriches a talk would be highly valuable for researchers and students alike. We introduce Reference Prediction from Talks (RPT), a new task that maps long, and unstructured scientific presentations to relevant papers. To support research on RPT, we present Talk2Ref, the first large-scale dataset of its kind, containing 6,279 talks and 43,429 cited papers (26 per talk on average), where relevance is approximated by the papers cited in the talk's corresponding source publication. We establish strong baselines by evaluating state-of-the-art text embedding models in zero-shot retrieval scenarios, and propose a dual-encoder architecture trained on Talk2Ref. We further explore strategies for handling long transcripts, as well as training for domain adaptation. Our results show that fine-tuning on Talk2Ref significantly improves citation prediction performance, demonstrating both the challenges of the task and the effectiveness of our dataset for learning semantic representations from spoken scientific content. The dataset and trained models are released under an open license to foster future research on integrating spoken scientific communication into citation recommendation systems.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.24478

Country:

Europe (1.00)
North America > United States (0.93)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.46)

Add feedback

Citation Recommendation using Deep Canonical Correlation Analysis

McNamara, Conor, Ramlan, Effirul

arXiv.org Artificial IntelligenceJul-24-2025

Recent advances in citation recommendation have improved accuracy by leveraging multi-view representation learning to integrate the various modalities present in scholarly documents. However, effectively combining multiple data views requires fusion techniques that can capture complementary information while preserving the unique characteristics of each modality. We propose a novel citation recommendation algorithm that improves upon linear Canonical Correlation Analysis (CCA) methods by applying Deep CCA (DCCA), a neural network extension capable of capturing complex, non-linear relationships between distributed textual and graph-based representations of scientific articles. Experiments on the large-scale DBLP (Digital Bibliography & Library Project) citation network dataset demonstrate that our approach outperforms state-of-the-art CCA-based methods, achieving relative improvements of over 11% in Mean Average Precision@10, 5% in Precision@10, and 7% in Recall@10. These gains reflect more relevant citation recommendations and enhanced ranking quality, suggesting that DCCA's non-linear transformations yield more expressive latent representations than CCA's linear projections.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.17603

Country: Europe > Ireland (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

SCIRGC: Multi-Granularity Citation Recommendation and Citation Sentence Preference Alignment

Li, Xiangyu, Chen, Jingqiang

arXiv.org Artificial IntelligenceMay-28-2025

Citations are crucial in scientific research articles as they highlight the connection between the current study and prior work. However, this process is often time-consuming for researchers. In this study, we propose the SciRGC framework, which aims to automatically recommend citation articles and generate citation sentences for citation locations within articles. The framework addresses two key challenges in academic citation generation: 1) how to accurately identify the author's citation intent and find relevant citation papers, and 2) how to generate high-quality citation sentences that align with human preferences. We enhance citation recommendation accuracy in the citation article recommendation module by incorporating citation networks and sentiment intent, and generate reasoning-based citation sentences in the citation sentence generation module by using the original article abstract, local context, citation intent, and recommended articles as inputs. Additionally, we propose a new evaluation metric to fairly assess the quality of generated citation sentences. Through comparisons with baseline models and ablation experiments, the SciRGC framework not only improves the accuracy and relevance of citation recommendations but also ensures the appropriateness of the generated citation sentences in context, providing a valuable tool for interdisciplinary researchers.

apreprint-june13, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.20103

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Towards AI-assisted Academic Writing

Liebling, Daniel J., Kane, Malcolm, Grunde-Mclaughlin, Madeleine, Lang, Ian J., Venugopalan, Subhashini, Brenner, Michael P.

arXiv.org Artificial IntelligenceMar-17-2025

We present components of an AI-assisted academic writing system including citation recommendation and introduction writing. The system recommends citations by considering the user's current document context to provide relevant suggestions. It generates introductions in a structured fashion, situating the contributions of the research relative to prior work. We demonstrate the effectiveness of the components through quantitative evaluations. Finally, the paper presents qualitative research exploring how researchers incorporate citations into their writing workflows. Our findings indicate that there is demand for precise AI-assisted writing systems and simple, effective methods for meeting those needs.

information retrieval, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.13771

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > China > Hong Kong (0.04)
(7 more...)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Citation Recommendation based on Argumentative Zoning of User Queries

Ma, Shutian, Zhang, Chengzhi, Zhang, Heng, Gao, Zheng

arXiv.org Artificial IntelligenceJan-30-2025

Due to the increasing of scientific publication, scientific information recommendation has become an urgent problem which can save retrieval cost. There are kinds of information that can be recommended, such as paper recommendation (Mei et al., 2022), author recommendation (Alhoori & Furuta, 2017), journal recommendation (Gündoğan et al., 2023) and so on. Among them, citation recommendation has arisen researchers' attention, which aims to help people find appropriate and necessary work to cite based on the given user queries. This paper aims to improve citation recommendation by considering the argumentative zoning of the citing sentence. Normally, authors will follow a logical framework when writing scientific papers. For example, the International Committee of Medical Journal Editors (ICMJE) recommends the IMRaD (Introduction, Methods, Results and Discussion) structure in writing and editing guidelines of biomedical publications (Editors & others, 2004). The structure of a research article is designed to present the research work clearly and concisely. This structure also helps to make it easy for readers to understand and evaluate the research.

data mining, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.joi.2024.101607

2501.18292

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
North America > United States > Indiana (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

CiteBART: Learning to Generate Citations for Local Citation Recommendation

Çelik, Ege Yiğit, Tekir, Selma

arXiv.org Artificial IntelligenceDec-23-2024

Citations are essential building blocks in scientific writing. The scientific community is longing for support in their generation. Citation generation involves two complementary subtasks: Determining the citation worthiness of a context and, if it's worth it, proposing the best candidate papers for the citation placeholder. The latter subtask is called local citation recommendation (LCR). This paper proposes CiteBART, a custom BART pre-training based on citation token masking to generate citations to achieve LCR. In the base scheme, we mask the citation token in the local citation context to make the citation prediction. In the global one, we concatenate the citing paper's title and abstract to the local citation context to learn to reconstruct the citation token. CiteBART outperforms state-of-the-art approaches on the citation recommendation benchmarks except for the smallest FullTextPeerRead dataset. The effect is significant in the larger benchmarks, e.g., Refseer and ArXiv. We present a qualitative analysis and an ablation study to provide insights into the workings of CiteBART. Our analyses confirm that its generative nature brings about a zero-shot capability.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.17534

Country:

Asia > Middle East > Republic of Türkiye (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Law (0.68)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmark for Evaluation and Analysis of Citation Recommendation Models

Maharjan, Puja

arXiv.org Artificial IntelligenceDec-10-2024

Citation recommendation systems have attracted much academic interest, resulting in many studies and implementations. These systems help authors automatically generate proper citations by suggesting relevant references based on the text they have written. However, the methods used in citation recommendation differ across various studies and implementations. Some approaches focus on the overall content of papers, while others consider the context of the citation text. Additionally, the datasets used in these studies include different aspects of papers, such as metadata, citation context, or even the full text of the paper in various formats and structures. The diversity in models, datasets, and evaluation metrics makes it challenging to assess and compare citation recommendation methods effectively. To address this issue, a standardized dataset and evaluation metrics are needed to evaluate these models consistently. Therefore, we propose developing a benchmark specifically designed to analyze and compare citation recommendation models. This benchmark will evaluate the performance of models on different features of the citation context and provide a comprehensive evaluation of the models across all these tasks, presenting the results in a standardized way. By creating a benchmark with standardized evaluation metrics, researchers and practitioners in the field of citation recommendation will have a common platform to assess and compare different models. This will enable meaningful comparisons and help identify promising approaches for further research and development in the field.

data mining, machine learning, natural language, (23 more...)

arXiv.org Artificial Intelligence

2412.07713

Country:

Europe > Norway > Western Norway > Rogaland > Stavanger (0.04)
Europe > Germany (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Data Science > Data Mining (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)

Add feedback

CiteME: Can Language Models Accurately Cite Scientific Claims?

Press, Ori, Hochlehnert, Andreas, Prabhu, Ameya, Udandarao, Vishaal, Press, Ofir, Bethge, Matthias

arXiv.org Artificial IntelligenceJul-10-2024

Thousands of new scientific papers are published each month. Such information overload complicates researcher efforts to stay current with the state-of-the-art as well as to verify and correctly attribute claims. We pose the following research question: Given a text excerpt referencing a paper, could an LM act as a research assistant to correctly identify the referenced paper? We advance efforts to answer this question by building a benchmark that evaluates the abilities of LMs in citation attribution. Our benchmark, CiteME, consists of text excerpts from recent machine learning papers, each referencing a single other paper. CiteME use reveals a large gap between frontier LMs and human performance, with LMs achieving only 4.2-18.5% accuracy and humans 69.7%. We close this gap by introducing CiteAgent, an autonomous system built on the GPT-4o LM that can also search and read papers, which achieves an accuracy of 35.3\% on CiteME. Overall, CiteME serves as a challenging testbed for open-ended claim attribution, driving the research community towards a future where any claim made by an LM can be automatically verified and discarded if found to be incorrect.

arxiv preprint arxiv, citeagent, excerpt, (14 more...)

arXiv.org Artificial Intelligence

2407.12861

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback

SymTax: Symbiotic Relationship and Taxonomy Fusion for Effective Citation Recommendation

Goyal, Karan, Goel, Mayank, Goyal, Vikram, Mohania, Mukesh

arXiv.org Artificial IntelligenceMay-26-2024

Citing pertinent literature is pivotal to writing and reviewing a scientific document. Existing techniques mainly focus on the local context or the global context for recommending citations but fail to consider the actual human citation behaviour. We propose SymTax, a three-stage recommendation architecture that considers both the local and the global context, and additionally the taxonomical representations of query-candidate tuples and the Symbiosis prevailing amongst them. SymTax learns to embed the infused taxonomies in the hyperbolic space and uses hyperbolic separation as a latent feature to compute query-candidate similarity. We build a novel and large dataset ArSyTa containing 8.27 million citation contexts and describe the creation process in detail. We conduct extensive experiments and ablation studies to demonstrate the effectiveness and design choice of each module in our framework. Also, combinatorial analysis from our experiments shed light on the choice of language models (LMs) and fusion embedding, and the inclusion of section heading as a signal. Our proposed module that captures the symbiotic relationship solely leads to performance gains of 26.66% and 39.25% in Recall@5 w.r.t. SOTA on ACL-200 and RefSeer datasets, respectively. The complete framework yields a gain of 22.56% in Recall@5 wrt SOTA on our proposed dataset. The code and dataset are available at https://github.com/goyalkaraniit/SymTax

citation context, dataset, recommendation, (15 more...)

arXiv.org Artificial Intelligence

2406.01606

Country:

Asia > India > NCT > Delhi (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback