AITopics | anchor text

Collaborating Authors

anchor text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations

Wang, Yile, Shen, Zhanyu, Huang, Hui

arXiv.org Artificial IntelligenceMay-19-2025

Semantic text representation is a fundamental task in the field of natural language processing. Existing text embedding (e.g., SimCSE and LLM2Vec) have demonstrated excellent performance, but the values of each dimension are difficult to trace and interpret. Bag-of-words, as classic sparse interpretable embeddings, suffers from poor performance. Recently, Benara et al. (2024) propose interpretable text embeddings using large language models, which forms "0/1" embeddings based on responses to a series of questions. These interpretable text embeddings are typically high-dimensional (larger than 10,000). In this work, we propose Low-dimensional (lower than 500) Dense and Interpretable text embeddings with Relative representations (LDIR). The numerical values of its dimensions indicate semantic relatedness to different anchor texts through farthest point sampling, offering both semantic representation as well as a certain level of traceability and interpretability. We validate LDIR on multiple semantic textual similarity, retrieval, and clustering tasks. Extensive experimental results show that LDIR performs close to the black-box baseline models and outperforms the interpretable embeddings baselines with much fewer dimensions. Code is available at https://github.com/szu-tera/LDIR.

computational linguistic, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.10354

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

SLiCK: Exploiting Subsequences for Length-Constrained Keyword Spotting

Nishu, Kumari, Cho, Minsik, Naik, Devang

arXiv.org Artificial IntelligenceSep-5-2024

User-defined keyword spotting on a resource-constrained edge device is challenging. However, keywords are often bounded by a maximum keyword length, which has been largely under-leveraged in prior works. Our analysis of keyword-length distribution shows that user-defined keyword spotting can be treated as a length-constrained problem, eliminating the need for aggregation over variable text length. This leads to our proposed method for efficient keyword spotting, SLiCK (exploiting Subsequences for Length-Constrained Keyword spotting). We further introduce a subsequence-level matching scheme to learn audio-text relations at a finer granularity, thus distinguishing similar-sounding keywords more effectively through enhanced context. In SLiCK, the model is trained with a multi-task learning approach using two modules: Matcher (utterance-level matching task, novel subsequence-level matching task) and Encoder (phoneme recognition task). The proposed method improves the baseline results on Libriphrase hard dataset, increasing AUC from $88.52$ to $94.9$ and reducing EER from $18.82$ to $11.1$.

anchor text, dataset, keyword, (13 more...)

arXiv.org Artificial Intelligence

2409.09067

Country:

North America > United States (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Magic Markup: Maintaining Document-External Markup with an LLM

Misback, Edward, Tatlock, Zachary, Tanimoto, Steven L.

arXiv.org Artificial IntelligenceMar-6-2024

Text documents, including programs, typically have human-readable semantic structure. Historically, programmatic access to these semantics has required explicit in-document tagging. Especially in systems where the text has an execution semantics, this means it is an opt-in feature that is hard to support properly. Today, language models offer a new method: metadata can be bound to entities in changing text using a model's human-like understanding of semantics, with no requirements on the document structure. This method expands the applications of document annotation, a fundamental operation in program writing, debugging, maintenance, and presentation. We contribute a system that employs an intelligent agent to re-tag modified programs, enabling rich annotations to automatically follow code as it evolves. We also contribute a formal problem definition, an empirical synthetic benchmark suite, and our benchmark generator. Our system achieves an accuracy of 90% on our benchmarks and can replace a document's tags in parallel at a rate of 5 seconds per tag. While there remains significant room for improvement, we find performance reliable enough to justify further exploration of applications.

annotation, benchmark, language model, (15 more...)

arXiv.org Artificial Intelligence

2403.03481

Country:

Europe > Sweden > Skåne County > Lund (0.05)
North America > United States > Washington > King County > Seattle (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)

Add feedback

Unsupervised Dense Retrieval Training with Web Anchors

Xie, Yiqing, Liu, Xiao, Xiong, Chenyan

arXiv.org Artificial IntelligenceMay-9-2023

In this work, we present an unsupervised retrieval method with contrastive learning on web anchors. The anchor text describes the content that is referenced from the linked page. This shows similarities to search queries that aim to retrieve pertinent information from relevant documents. Based on their commonalities, we train an unsupervised dense retriever, Anchor-DR, with a contrastive learning task that matches the anchor text and the linked document. To filter out uninformative anchors (such as ``homepage'' or other functional anchors), we present a novel filtering technique to only select anchors that contain similar types of information as search queries. Experiments show that Anchor-DR outperforms state-of-the-art methods on unsupervised dense retrieval by a large margin (e.g., by 5.3% NDCG@10 on MSMARCO). The gain of our method is especially significant for search and question answering tasks. Our analysis further reveals that the pattern of anchor-document pairs is similar to that of search query-document pairs. Code available at https://github.com/Veronicium/AnchorDR.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3539618.3592080

2305.05834

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
Asia > Taiwan > Taiwan Province > Taipei (0.05)
(10 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.96)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization

Nguyen, Tan-Minh, Nguyen, Thai-Binh, Nguyen, Hoang-Trung, Nguyen, Hai-Long, Thanh, Tam Doan, Nguyen, Ha-Thanh, Vuong, Thi-Hai-Yen

arXiv.org Artificial IntelligenceApr-11-2023

Multi-document summarization is challenging because the summaries should not only describe the most important information from all documents but also provide a coherent interpretation of the documents. This paper proposes a method for multi-document summarization based on cluster similarity. In the extractive method we use hybrid model based on a modified version of the PageRank algorithm and a text correlation considerations mechanism. After generating summaries by selecting the most important sentences from each cluster, we apply BARTpho and ViT5 to construct the abstractive models. Both extractive and abstractive approaches were considered in this study. The proposed method achieves competitive results in VLSP 2022 competition.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2304.05205

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Vietnam > Hanoi > Hanoi (0.05)
Asia > Vietnam > Thái Bình Province > Thái Bình (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.51)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
(2 more...)

Add feedback

Pre-training for Information Retrieval: Are Hyperlinks Fully Explored?

Wu, Jiawen, Zhang, Xinyu, Zhu, Yutao, Liu, Zheng, Guo, Zikai, Fei, Zhaoye, Lai, Ruofei, Wu, Yongkang, Cao, Zhao, Dou, Zhicheng

arXiv.org Artificial IntelligenceSep-14-2022

Recent years have witnessed great progress on applying pre-trained language models, e.g., BERT, to information retrieval (IR) tasks. Hyperlinks, which are commonly used in Web pages, have been leveraged for designing pre-training objectives. For example, anchor texts of the hyperlinks have been used for simulating queries, thus constructing tremendous query-document pairs for pre-training. However, as a bridge across two web pages, the potential of hyperlinks has not been fully explored. In this work, we focus on modeling the relationship between two documents that are connected by hyperlinks and designing a new pre-training objective for ad-hoc retrieval. Specifically, we categorize the relationships between documents into four groups: no link, unidirectional link, symmetric link, and the most relevant symmetric link. By comparing two documents sampled from adjacent groups, the model can gradually improve its capability of capturing matching signals. We propose a progressive hyperlink predication ({PHP}) framework to explore the utilization of hyperlinks in pre-training. Experimental results on two large-scale ad-hoc retrieval datasets and six question-answering datasets demonstrate its superiority over existing pre-training methods.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2209.06583

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(18 more...)

Genre:

Research Report (0.82)
Instructional Material (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Entity Linking Meets Deep Learning: Techniques and Solutions

Shen, Wei, Li, Yuhan, Liu, Yinan, Han, Jiawei, Wang, Jianyong, Yuan, Xiaojie

arXiv.org Artificial IntelligenceSep-26-2021

Entity linking (EL) is the process of linking entity mentions appearing in web text with their corresponding entities in a knowledge base. EL plays an important role in the fields of knowledge engineering and data mining, underlying a variety of downstream applications such as knowledge base population, content analysis, relation extraction, and question answering. In recent years, deep learning (DL), which has achieved tremendous success in various domains, has also been leveraged in EL methods to surpass traditional machine learning based methods and yield the state-of-the-art performance. In this survey, we present a comprehensive review and analysis of existing DL based EL methods. First of all, we propose a new taxonomy, which organizes existing DL based EL methods using three axes: embedding, feature, and algorithm. Then we systematically survey the representative EL methods along the three axes of the taxonomy. Later, we introduce ten commonly used EL data sets and give a quantitative performance analysis of DL based EL methods over these data sets. Finally, we discuss the remaining limitations of existing methods and highlight some promising future directions.

candidate entity, entity mention, representation, (14 more...)

arXiv.org Artificial Intelligence

2109.1252

Country:

Asia > China > Beijing > Beijing (0.04)
North America > United States > New York (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(7 more...)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Sports > Basketball (0.92)
Media (0.67)
Government > Regional Government > North America Government > United States Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Predicting Links on Wikipedia with Anchor Text Information

Brochier, Robin, Béchet, Frédéric

arXiv.org Artificial IntelligenceMay-25-2021

Wikipedia, the largest open-collaborative online encyclopedia, is a corpus of documents bound together by internal hyperlinks. These links form the building blocks of a large network whose structure contains important information on the concepts covered in this encyclopedia. The presence of a link between two articles, materialised by an anchor text in the source page pointing to the target page, can increase readers' understanding of a topic. However, the process of linking follows specific editorial rules to avoid both under-linking and over-linking. In this paper, we study the transductive and the inductive tasks of link prediction on several subsets of the English Wikipedia and identify some key challenges behind automatic linking based on anchor text information. We propose an appropriate evaluation sampling methodology and compare several algorithms. Moreover, we propose baseline models that provide a good estimation of the overall difficulty of the tasks.

algorithm, prediction, wikipedia, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3404835.3462994

2105.11734

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.05)
North America > Canada (0.04)
Europe > United Kingdom (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.64)

Industry: Government > Regional Government > North America Government > United States Government (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Information Management > Search (0.90)
Information Technology > Data Science > Data Mining (0.89)
(3 more...)

Add feedback

Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing

Lin, Xi Victoria, Socher, Richard, Xiong, Caiming

arXiv.org Artificial IntelligenceDec-30-2020

We present BRIDGE, a powerful sequential architecture for modeling dependencies between natural language questions and relational databases in cross-DB semantic parsing. BRIDGE represents the question and DB schema in a tagged sequence where a subset of the fields are augmented with cell values mentioned in the question. The hybrid sequence is encoded by BERT with minimal subsequent layers and the text-DB contextualization is realized via the fine-tuned deep attention in BERT. Combined with a pointer-generator decoder with schema-consistency driven search space pruning, BRIDGE attained state-of-the-art performance on popular cross-DB text-to-SQL benchmarks, Spider (71.1\% dev, 67.5\% test with ensemble model) and WikiSQL (92.6\% dev, 91.9\% test). Our analysis shows that BRIDGE effectively captures the desired cross-modal dependencies and has the potential to generalize to more text-DB related tasks. Our implementation is available at \url{https://github.com/salesforce/TabularSemanticParsing}.

anchor text, computational linguistic, sql query, (15 more...)

arXiv.org Artificial Intelligence

2012.12627

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
(14 more...)

Genre: Research Report (0.81)

Industry: Information Technology > Software (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Best Anchor Text Optimization Practices [Infographic]

#artificialintelligenceAug-5-2020, 07:46:19 GMT

Anchor texts are hyperlinked, clickable words in the content. Working on anchor text optimization for both internal and external content can help your site get a better ranking on search engines. In 2011 and earlier, companies gained good rankings by using keyword-rich links as anchor texts. In 2012, Google released the Penguin update, which caused businesses that had exact matching keywords to suffer major slides in Google rankings. Businesses started considering the use of anchor text optimization strategies to recover the lost ground.

artificial intelligence, information management, natural language, (9 more...)

#artificialintelligence

Country: Asia > India (0.07)

Technology:

Information Technology > Information Management > Search (0.64)
Information Technology > Artificial Intelligence > Natural Language (0.47)

Add feedback