AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Numbers Matter! Bringing Quantity-awareness to Retrieval Systems

Almasian, Satya, Bruseva, Milena, Gertz, Michael

arXiv.org Artificial IntelligenceJul-14-2024

Quantitative information plays a crucial role in understanding and interpreting the content of documents. Many user queries contain quantities and cannot be resolved without understanding their semantics, e.g., ``car that costs less than $10k''. Yet, modern search engines apply the same ranking mechanisms for both words and quantities, overlooking magnitude and unit information. In this paper, we introduce two quantity-aware ranking techniques designed to rank both the quantity and textual content either jointly or independently. These techniques incorporate quantity information in available retrieval systems and can address queries with numerical conditions equal, greater than, and less than. To evaluate the effectiveness of our proposed models, we introduce two novel quantity-aware benchmark datasets in the domains of finance and medicine and compare our method against various lexical and neural models. The code and data are available under https://github.com/satya77/QuantityAwareRankers.

concept unit index, quantity, query, (15 more...)

arXiv.org Artificial Intelligence

2407.10283

Country:

Oceania > Australia (0.04)
North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Germany (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Banking & Finance (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Comparing Complex Concepts with Transformers: Matching Patent Claims Against Natural Language Text

Blume, Matthias, Heidari, Ghobad, Hewel, Christoph

arXiv.org Artificial IntelligenceJul-14-2024

An entity defending itself against infringement may attempt to A key capability in managing patent applications or a patent invalidate a patent by finding novelty-destroying prior art to that portfolio is comparing claims to other text, e.g. a patent patent. In all cases, the key task is to search through a set of specification. Because the language of claims is different from documents and determine whether those documents cover all language used elsewhere in the patent application or in non-patent aspects of each claim of the subject patent application or granted text, this has been challenging for computer based natural patent. Thus, a claim of a subject patent (application) may be language processing. We test two new LLM-based approaches considered a query to an information retrieval system whose and find that both provide substantially better performance than objective is to retrieve a document or set of documents that previously published values. The ability to match dense contain all aspects of that claim.

paragraph, patent application, similarity, (14 more...)

arXiv.org Artificial Intelligence

2407.10351

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Add feedback

Position: Measure Dataset Diversity, Don't Just Claim It

Zhao, Dora, Andrews, Jerone T. A., Papakyriakopoulos, Orestis, Xiang, Alice

arXiv.org Artificial IntelligenceJul-11-2024

Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets. Despite their prevalence, these terms lack clear definitions and validation. Our research explores the implications of this issue by analyzing "diversity" across 135 image and text datasets. Drawing from social sciences, we apply principles from measurement theory to identify considerations and offer recommendations for conceptualizing, operationalizing, and evaluating diversity in datasets. Our findings have broader implications for ML research, advocating for a more nuanced and precise approach to handling value-laden properties in dataset construction.

computer vision, dataset, diversity, (12 more...)

arXiv.org Artificial Intelligence

2407.08188

Country:

Europe > Austria > Vienna (0.14)
Europe > Ukraine > Kyiv Oblast > Kyiv (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(9 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Government (0.92)
Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(2 more...)

Add feedback

Automatic Generation of Web Censorship Probe Lists

Tang, Jenny, Alvarez, Leo, Brar, Arjun, Hoang, Nguyen Phong, Christin, Nicolas

arXiv.org Artificial IntelligenceJul-11-2024

Domain probe lists--used to determine which URLs to probe for Web censorship--play a critical role in Internet censorship measurement studies. Indeed, the size and accuracy of the domain probe list limits the set of censored pages that can be detected; inaccurate lists can lead to an incomplete view of the censorship landscape or biased results. Previous efforts to generate domain probe lists have been mostly manual or crowdsourced. This approach is time-consuming, prone to errors, and does not scale well to the ever-changing censorship landscape. In this paper, we explore methods for automatically generating probe lists that are both comprehensive and up-to-date for Web censorship measurement. We start from an initial set of 139,957 unique URLs from various existing test lists consisting of pages from a variety of languages to generate new candidate pages. By analyzing content from these URLs (i.e., performing topic and keyword extraction), expanding these topics, and using them as a feed to search engines, our method produces 119,255 new URLs across 35,147 domains. We then test the new candidate pages by attempting to access each URL from servers in eleven different global locations over a span of four months to check for their connectivity and potential signs of censorship. Our measurements reveal that our method discovered over 1,400 domains--not present in the original dataset--we suspect to be blocked. In short, automatically updating probe lists is possible, and can help further automate censorship measurements at scale.

baseline, keyword, source list, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.56553/popets-2024-0106

2407.08185

Country:

Asia > China > Shanghai > Shanghai (0.07)
Asia > China > Beijing > Beijing (0.07)
Asia > China > Hong Kong (0.06)
(20 more...)

Genre: Research Report > New Finding (0.46)

Industry: Law > Civil Rights & Constitutional Law (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
(4 more...)

Add feedback

Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans

Grafberger, Stefan

arXiv.org Artificial IntelligenceJul-10-2024

Machine Learning (ML) is increasingly used to automate impactful decisions, which leads to concerns regarding their correctness, reliability, and fairness. We envision highly-automated software platforms to assist data scientists with developing, validating, monitoring, and analysing their ML pipelines. In contrast to existing work, our key idea is to extract "logical query plans" from ML pipeline code relying on popular libraries. Based on these plans, we automatically infer pipeline semantics and instrument and rewrite the ML pipelines to enable diverse use cases without requiring data scientists to manually annotate or rewrite their code. First, we developed such an abstract ML pipeline representation together with machinery to extract it from Python code. Next, we used this representation to efficiently instrument static ML pipelines and apply provenance tracking, which enables lightweight screening for common data preparation issues. Finally, we built machinery to automatically rewrite ML pipelines to perform more advanced what-if analyses and proposed using multi-query optimisation for the resulting workloads. In future work, we aim to interactively assist data scientists as they work on their ML pipelines.

data scientist, ml pipeline, pipeline, (14 more...)

arXiv.org Artificial Intelligence

2407.0756

Country: Europe > Germany > Berlin (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.72)

Add feedback

Probabilistic Routing for Graph-Based Approximate Nearest Neighbor Search

Lu, Kejing, Xiao, Chuan, Ishikawa, Yoshiharu

arXiv.org Artificial IntelligenceJul-10-2024

Approximate nearest neighbor search (ANNS) in high-dimensional spaces is a pivotal challenge in the field of machine learning. In recent years, graph-based methods have emerged as the superior approach to ANNS, establishing a new state of the art. Although various optimizations for graph-based ANNS have been introduced, they predominantly rely on heuristic methods that lack formal theoretical backing. This paper aims to enhance routing within graph-based ANNS by introducing a method that offers a probabilistic guarantee when exploring a node's neighbors in the graph. We formulate the problem as probabilistic routing and develop two baseline strategies by incorporating locality-sensitive techniques. Subsequently, we introduce PEOs, a novel approach that efficiently identifies which neighbors in the graph should be considered for exact distance calculation, thus significantly improving efficiency in practice. Our experiments demonstrate that equipping PEOs can increase throughput on commonly utilized graph indexes (HNSW and NSSG) by a factor of 1.6 to 2.5, and its efficiency consistently outperforms the leading-edge routing technique by 1.1 to 1.4 times.

dataset, peo, probabilistic routing, (9 more...)

arXiv.org Artificial Intelligence

2402.11354

Country:

Asia > Afghanistan > Parwan Province > Charikar (0.04)
Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Robust Neural Information Retrieval: An Adversarial and Out-of-distribution Perspective

Liu, Yu-An, Zhang, Ruqing, Guo, Jiafeng, de Rijke, Maarten, Fan, Yixing, Cheng, Xueqi

arXiv.org Artificial IntelligenceJul-9-2024

According to the global overview report from Digital 2023, nearly 82% of Internet users between 18 and 64 have used a search engine or web portal in the past month. Specifically, IR is the process of finding and providing relevant information in response to the user query from a large collection of data. Recently, with advances in deep learning, neural IR models have witnessed significant progress [51, 53]. With the development of training methodologies such as pre-training [44, 100] and fine-tuning [73, 117, 162], neural IR models have demonstrated remarkable effectiveness in learning query-document relevance patterns. Why is robustness important in IR? In real-world deployment of neural IR models, an aspect equally essential as their effectiveness is their robustness. A good IR system must not only exhibit high effectiveness under normal conditions but also demonstrate robustness in the face of abnormal conditions. The natural openness of IR systems makes them vulnerable to intrusion, and the consequences can be severe. For example: (i) Search engines are vulnerable to black hat SEO attacks, necessitating significant efforts to curb these infringements.

information retrieval, retrieval, robustness, (15 more...)

arXiv.org Artificial Intelligence

2407.06992

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Beijing > Beijing (0.04)
(5 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government (1.00)
Education (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

Liang, Renjie, Li, Li, Zhang, Chongzhi, Wang, Jing, Zhu, Xizhou, Sun, Aixin

arXiv.org Artificial IntelligenceJul-9-2024

In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the $NDCG@K, IoU\geq \mu$ evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}

annotation, dataset, query, (16 more...)

arXiv.org Artificial Intelligence

2407.06597

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Enhancing Mobile "How-to" Queries with Automated Search Results Verification and Reranking

Ding, Lei, Bheemanpally, Jeshwanth, Zhang, Yi

arXiv.org Artificial IntelligenceJul-8-2024

Many people use search engines to find online guidance to solve computer or mobile device problems. Users frequently encounter challenges in identifying effective solutions from search results, often wasting time trying ineffective solutions that seem relevant yet fail to solve real problems. This paper introduces a novel approach to improving the accuracy and relevance of online technical support search results through automated search results verification and reranking. Taking "How-to" queries specific to on-device execution as a starting point, we developed the first solution that allows an AI agent to interpret and execute step-by-step instructions in the search results in a controlled Android environment. We further integrated the agent's findings into a reranking mechanism that orders search results based on the success indicators of the tested solutions. The paper details the architecture of our solution and a comprehensive evaluation of the system through a series of tests across various application domains. The results demonstrate a significant improvement in the quality and reliability of the top-ranked results. Our findings suggest a paradigm shift in how search engine ranking for online technical support help can be optimized, offering a scalable and automated solution to the pervasive challenge of finding effective and reliable online help.

how-to, instruction, query, (12 more...)

arXiv.org Artificial Intelligence

2404.0886

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
North America > United States > Michigan > Oakland County > Rochester (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(4 more...)

Add feedback

Artificial Intuition: Efficient Classification of Scientific Abstracts

Sakhrani, Harsh, Pervez, Naseela, Kumar, Anirudh Ravi, Morstatter, Fred, Reed, Alexandra Graddy, Belz, Andrea

arXiv.org Artificial IntelligenceJul-8-2024

It is desirable to coarsely classify short scientific texts, such as grant or publication abstracts, for strategic insight or research portfolio management. These texts efficiently transmit dense information to experts possessing a rich body of knowledge to aid interpretation. Yet this task is remarkably difficult to automate because of brevity and the absence of context. To address this gap, we have developed a novel approach to generate and appropriately assign coarse domain-specific labels. We show that a Large Language Model (LLM) can provide metadata essential to the task, in a process akin to the augmentation of supplemental knowledge representing human intuition, and propose a workflow. As a pilot study, we use a corpus of award abstracts from the National Aeronautics and Space Administration (NASA). We develop new assessment tools in concert with established performance metrics.

classification, keyword, label space, (15 more...)

arXiv.org Artificial Intelligence

2407.06093

Country:

North America > United States > New York > New York County > New York City (0.05)
Asia > China > Hong Kong (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.69)

Industry:

Government > Space Agency (0.69)
Energy > Renewable (0.68)
Government > Regional Government > North America Government > United States Government (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)

Add feedback