AITopics | universal text

Collaborating Authors

universal text

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhancing Negation Awareness in Universal Text Embeddings: A Data-efficient and Computational-efficient Approach

Cao, Hongliu

arXiv.org Artificial IntelligenceApr-1-2025

--Negation plays an important role in various natural language processing tasks such as Natural Language Inference and Sentiment Analysis tasks. Numerous prior studies have found that contextual text embedding models such as BERT, ELMO, RoBERT a or XLNet face challenges in accurately understanding negation. Recent advancements in universal text embeddings have demonstrated superior performance over contextual text embeddings in various tasks. However, due to the bias in popular evaluation benchmarks, the negation awareness capacity of these models remains unclear . T o bridge the gap in existing literature, an in-depth analysis is initiated in this work to study the negation awareness of cutting-edge universal text embedding models. Our findings reveal a significant lack of negation awareness in these models, often interpreting negated text pairs as semantically similar . T o efficiently deal with the conflict that different tasks need different trade-offs between topic and negation information among other semantic information, a data-efficient and computational-efficient embedding re-weighting method is proposed without modifying the parameters of text embedding models. The proposed solution is able to improve text embedding models' negation awareness significantly on both simple negation understanding task and complex negation understanding task. Furthermore, the proposed solution can also significantly improve the negation awareness of Large Language Model based task-specific high dimensional universal text embeddings.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2504.00584

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
Asia > Middle East > UAE (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Writing Style Matters: An Examination of Bias and Fairness in Information Retrieval Systems

Cao, Hongliu

arXiv.org Artificial IntelligenceDec-12-2024

The rapid advancement of Language Model technologies has opened new opportunities, but also introduced new challenges related to bias and fairness. This paper explores the uncharted territory of potential biases in state-of-the-art universal text embedding models towards specific document and query writing styles within Information Retrieval (IR) systems. Our investigation reveals that different embedding models exhibit different preferences of document writing style, while more informal and emotive styles are less favored by most embedding models. In terms of query writing styles, many embedding models tend to match the style of the query with the style of the retrieved documents, but some show a consistent preference for specific styles. Text embedding models fine-tuned on synthetic data generated by LLMs display a consistent preference for certain style of generated data. These biases in text embedding based IR systems can inadvertently silence or marginalize certain communication styles, thereby posing a significant threat to fairness in information retrieval. Finally, we also compare the answer styles of Retrieval Augmented Generation (RAG) systems based on different LLMs and find out that most text embedding models are biased towards LLM's answer styles when used as evaluation metrics for answer correctness. This study sheds light on the critical issue of writing style based bias in IR systems, offering valuable insights for the development of more fair and robust models.

arxiv preprint arxiv, llm, query, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3701551.3703514

2411.13173

Country:

Europe > Germany > Lower Saxony > Hanover (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Recent advances in text embedding: A Comprehensive Review of Top-Performing Methods on the MTEB Benchmark

Cao, Hongliu

arXiv.org Artificial IntelligenceJun-19-2024

Text embedding methods have become increasingly popular in both industrial and academic fields due to their critical role in a variety of natural language processing tasks. The significance of universal text embeddings has been further highlighted with the rise of Large Language Models (LLMs) applications such as Retrieval-Augmented Systems (RAGs). While previous models have attempted to be general-purpose, they often struggle to generalize across tasks and domains. However, recent advancements in training data quantity, quality and diversity; synthetic data generation from LLMs as well as using LLMs as backbones encourage great improvements in pursuing universal text embeddings. In this paper, we provide an overview of the recent advances in universal text embedding models with a focus on the top performing text embeddings on Massive Text Embedding Benchmark (MTEB). Through detailed comparison and analysis, we highlight the key contributions and limitations in this area, and propose potentially inspiring future research directions.

arxiv preprint arxiv, dataset, universal text, (14 more...)

arXiv.org Artificial Intelligence

2406.01607

Country:

Asia > Middle East > UAE (0.05)
Asia > China > Beijing > Beijing (0.04)
North America > Canada > Quebec (0.04)
(3 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback