AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Generative User-Experience Research for Developing Domain-specific Natural Language Processing Applications

Zhukova, Anastasia, von Sperl, Lukas, Matt, Christian E., Gipp, Bela

arXiv.org Artificial IntelligenceJan-19-2024

Natural Language Processing (NLP) has been recently extensively incorporated into industrial and domain applications. For example, NLP is used for speeding up processes, e.g., automation classification of types of customer feedback or filtering out spam emails, information extraction, e.g., named entity recognition to extract symptoms, diagnoses, and treatments from medical records, or auto-completing input forms with language models. Despite the broad integration, domain-specific NLP applications may require practicing more user-driven methodologies to address user needs with these applications. Often, the data-driven approach falls short in exploring the needs of the domain users (Yang, 2018). On the one hand, domain users are often integrated into development at the late test phase to evaluate the usability of ML/NLP applications (Carney, 2019). Unlike user-driven software development, the development of NLP applications depends mainly on data availability or experimenting with machine learning (ML)/NLP trends and thus is a major driver of application development. On the other hand, the user-driven development of a domain-specific ML/NLP application in medicine showed that close collaboration with the domain users in the earlier stages increases the effectiveness of the final product (Yang, 2017). Therefore, integrating user experience (UX) and human-computer interaction (HCI) research into ML/NLP research addresses users' needs, fuses their expertise, and increases intuitiveness, transparency, simplicity, and trust for the system users (Boukhelifa et al, 2018; Paleyes et al, 2022).

application, information, prototype, (15 more...)

arXiv.org Artificial Intelligence

2306.16143

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
(10 more...)

Genre:

Questionnaire & Opinion Survey (0.97)
Workflow (0.93)
Research Report > New Finding (0.68)
Personal > Interview (0.67)

Industry:

Health & Medicine (0.66)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Generative Dense Retrieval: Memory Can Be a Burden

Yuan, Peiwen, Wang, Xinglin, Feng, Shaoxiong, Pan, Boyuan, Li, Yiwei, Wang, Heda, Miao, Xupeng, Li, Kan

arXiv.org Artificial IntelligenceJan-18-2024

Generative Retrieval (GR), autoregressively decoding relevant document identifiers given a query, has been shown to perform well under the setting of small-scale corpora. By memorizing the document corpus with model parameters, GR implicitly achieves deep interaction between query and document. However, such a memorizing mechanism faces three drawbacks: (1) Poor memory accuracy for fine-grained features of documents; (2) Memory confusion gets worse as the corpus size increases; (3) Huge memory update costs for new documents. To alleviate these problems, we propose the Generative Dense Retrieval (GDR) paradigm. Specifically, GDR first uses the limited memory volume to achieve inter-cluster matching from query to relevant document clusters. Memorizing-free matching mechanism from Dense Retrieval (DR) is then introduced to conduct fine-grained intra-cluster matching from clusters to relevant documents. The coarse-to-fine process maximizes the advantages of GR's deep interaction and DR's scalability. Besides, we design a cluster identifier constructing strategy to facilitate corpus memory and a cluster-adaptive negative sampling strategy to enhance the intra-cluster mapping ability. Empirical results show that GDR obtains an average of 3.0 R@100 improvement on NQ dataset under multiple settings and has better scalability.

gdr, identifier, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2401.10487

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Austria (0.04)
(8 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native

Lu, Yao, Bian, Song, Chen, Lequn, He, Yongjun, Hui, Yulong, Lentz, Matthew, Li, Beibin, Liu, Fei, Li, Jialin, Liu, Qi, Liu, Rui, Liu, Xiaoxuan, Ma, Lin, Rong, Kexin, Wang, Jianguo, Wu, Yingjun, Wu, Yongji, Zhang, Huanchen, Zhang, Minjia, Zhang, Qizhen, Zhou, Tianyi, Zhuo, Danyang

arXiv.org Artificial IntelligenceJan-17-2024

In this paper, we investigate the intersection of large generative AI models and cloud-native computing architectures. Recent large models such as ChatGPT, while revolutionary in their capabilities, face challenges like escalating costs and demand for high-end GPUs. Drawing analogies between large-model-as-a-service (LMaaS) and cloud database-as-a-service (DBaaS), we describe an AI-native computing paradigm that harnesses the power of both cloud-native technologies (e.g., multi-tenancy and serverless computing) and advanced machine learning runtime (e.g., batched LoRA inference). These joint efforts aim to optimize costs-of-goods-sold (COGS) and improve resource accessibility. The journey of merging these two domains is just at the beginning and we hope to stimulate future research and development in this area.

application, arxiv preprint arxiv, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2401.1223

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(9 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Services (1.00)
Energy (0.68)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(2 more...)

Add feedback

BERTologyNavigator: Advanced Question Answering with BERT-based Semantics

Rajpal, Shreya, Usbeck, Ricardo

arXiv.org Artificial IntelligenceJan-17-2024

The development and integration of knowledge graphs and language models has significance in artificial intelligence and natural language processing. In this study, we introduce the BERTologyNavigator -- a two-phased system that combines relation extraction techniques and BERT embeddings to navigate the relationships within the DBLP Knowledge Graph (KG). Our approach focuses on extracting one-hop relations and labelled candidate pairs in the first phases. This is followed by employing BERT's CLS embeddings and additional heuristics for relation selection in the second phase. Our system reaches an F1 score of 0.2175 on the DBLP QuAD Final test dataset for Scholarly QALD and 0.98 F1 score on the subset of the DBLP QuAD test dataset during the QA phase.

ceur workshop proceedings, dataset, proceedings, (9 more...)

arXiv.org Artificial Intelligence

2401.09553

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(10 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)

Add feedback

QAnswer: Towards Question Answering Search over Websites

Guo, Kunpeng, Defretiere, Clement, Diefenbach, Dennis, Gravier, Christophe, Gourru, Antoine

arXiv.org Artificial IntelligenceJan-17-2024

Question Answering (QA) is increasingly used by search engines to provide results to their end-users, yet very few websites currently use QA technologies for their search functionality. To illustrate the potential of QA technologies for the website search practitioner, we demonstrate web searches that combine QA over knowledge graphs and QA over free text -- each being usually tackled separately. We also discuss the different benefits and drawbacks of both approaches for web site searches. We use the case studies made of websites hosted by the Wikimedia Foundation (namely Wikipedia and Wikidata). Differently from a search engine (e.g. Google, Bing, etc), the data are indexed integrally, i.e. we do not index only a subset, and they are indexed exclusively, i.e. we index only data available on the corresponding website.

arxiv, paragraph, query, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3487553

2401.09175

Country:

Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.50)

Industry: Government (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.83)

Add feedback

Spatial Entity Resolution between Restaurant Locations and Transportation Destinations in Southeast Asia

Gao, Emily, Widdows, Dominic

arXiv.org Artificial IntelligenceJan-16-2024

Solving this problem can improve precision by removing duplicates, and can enrich detail by (for example) merging a phone Location matters in many businesses and services today, number from one record with the hours of operation particularly for transportation and delivery, scenarios from another, once these records are known to refer in which it is important to find the correct pickup to the same thing. This problem is referred to as entity and drop-off locations very quickly. User experience resolution (see (Talburt, 2011)), and it occurs with can be negatively affected if the location information various datasets, including those representing people, is inaccurate or insufficient. Inaccuracies products, works of literature, etc. can originate from imprecise GPS data, manual error happening in the process of data entry, or the lack of For Grab, one entity resolution problem that arises effective data quality control. Insufficiencies can also for spatial data is the alignment of transportation destinations take many forms, including lack of coverage, and lack and restaurants. Currently Grab maintains of detail -- for example, we may know the latitude two tables separately for transportation and food delivery, and longitude of a restaurant location in a mall, but because each use case requires some specific this might not include information about where passengers features, i.e., food delivery needs information about should be dropped off, or where a delivery the estimated delivery time, cuisine types, and opening courier should park to collect food for delivery. Or hours which are absent in the POI table. However, the location of a business may be known, but not its it is highly likely that some entities from both tables contact details or opening hours.

levenshtein distance, restaurant, similarity, (17 more...)

arXiv.org Artificial Intelligence

2401.08537

Country:

Asia > Southeast Asia (0.41)
Asia > Indonesia > Borneo > Kalimantan > Central Kalimantan > Palangka Raya (0.14)
Asia > Singapore (0.06)
(11 more...)

Genre: Research Report (0.50)

Industry:

Transportation (1.00)
Consumer Products & Services > Restaurants (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (0.67)

Add feedback

Wikidata as a seed for Web Extraction

Guo, Kunpeng, Diefenbach, Dennis, Gourru, Antoine, Gravier, Christophe

arXiv.org Artificial IntelligenceJan-15-2024

Wikidata has grown to a knowledge graph with an impressive size. To date, it contains more than 17 billion triples collecting information about people, places, films, stars, publications, proteins, and many more. On the other side, most of the information on the Web is not published in highly structured data repositories like Wikidata, but rather as unstructured and semi-structured content, more concretely in HTML pages containing text and tables. Finding, monitoring, and organizing this data in a knowledge graph is requiring considerable work from human editors. The volume and complexity of the data make this task difficult and time-consuming. In this work, we present a framework that is able to identify and extract new facts that are published under multiple Web domains so that they can be proposed for validation by Wikidata editors. The framework is relying on question-answering technologies. We take inspiration from ideas that are used to extract facts from textual collections and adapt them to extract facts from Web pages. For achieving this, we demonstrate that language models can be adapted to extract facts not only from textual collections but also from Web pages. By exploiting the information already contained in Wikidata the proposed framework can be trained without the need for any additional learning signals and can extract new facts for a wide range of properties and domains. Following this path, Wikidata can be used as a seed to extract facts on the Web. Our experiments show that we can achieve a mean performance of 84.07 at F1-score. Moreover, our estimations show that we can potentially extract millions of facts that can be proposed for human validation. The goal is to help editors in their daily tasks and contribute to the completion of the Wikidata knowledge graph.

extraction, information, wikidata, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543507

2401.07812

Country:

North America > United States > Texas > Travis County > Austin (0.05)
Europe > France (0.04)
Europe > United Kingdom > England > Oxfordshire (0.04)
(10 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media (0.66)
Leisure & Entertainment (0.66)
Education (0.46)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

On Image Search in Histopathology

Tizhoosh, H. R., Pantanowitz, Liron

arXiv.org Artificial IntelligenceJan-14-2024

Pathology images of histopathology can be acquired from camera-mounted microscopes or whole slide scanners. Utilizing similarity calculations to match patients based on these images holds significant potential in research and clinical contexts. Recent advancements in search technologies allow for nuanced quantification of cellular structures across diverse tissue types, facilitating comparisons and enabling inferences about diagnosis, prognosis, and predictions for new patients when compared against a curated database of diagnosed and treated cases. In this paper, we comprehensively review the latest developments in image search technologies for histopathology, offering a concise overview tailored for computational pathology researchers seeking effective, fast and efficient image search methods in their work.

image retrieval, image search, yottixel, (14 more...)

arXiv.org Artificial Intelligence

2401.08699

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Minnesota > Olmsted County > Rochester (0.04)
Europe > United Kingdom > England (0.04)

Genre:

Research Report (0.82)
Overview (0.68)

Industry:

Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(5 more...)

Add feedback

Mapping Transformer Leveraged Embeddings for Cross-Lingual Document Representation

Tashu, Tsegaye Misikir, Kontos, Eduard-Raul, Sabatelli, Matthia, Valdenegro-Toro, Matias

arXiv.org Artificial IntelligenceJan-12-2024

The rapid expansion of online information from diverse sources and the growing multilingual nature of the web underscore the escalating significance of information retrieval (IR) and recommender systems (RS). Today's web is no longer limited to a single language, but is increasingly rich in multiple languages, mirroring the multilingual capacities of its global users Steichen et al. [2014], Tashu et al. [2023]. This diversity highlights the urgent need for cross-lingual recommender systems. Traditional recommender systems often prioritize content in a single language, sidelining a wealth of multilingual documents that may hold valuable insights. This gap leads to the emergence of cross-language information access, where recommender systems suggest items in different languages based on user queries Lops et al. [2010], Narducci et al. [2016], Salamon et al. [2021]. Machine Learning and Deep Learning, which have significantly impacted language representation and processing, are pivotal to enhancing information retrieval and recommender systems, especially in the realm of document recom-The result presented in this work is based on Eduard-Raul Kontos's bachelor project while he was at the University of Groningen

computational linguistic, representation, transformer leveraged document representation, (10 more...)

arXiv.org Artificial Intelligence

2401.06583

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

A Large-Scale Analysis of Persian Tweets Regarding Covid-19 Vaccination

ShabaniMirzaei, Taha, Chamani, Houmaan, Abaskohi, Amirhossein, Zadeh, Zhivar Sourati Hassan, Bahrak, Behnam

arXiv.org Artificial IntelligenceJan-12-2024

The Covid-19 pandemic had an enormous effect on our lives, especially on people's interactions. By introducing Covid-19 vaccines, both positive and negative opinions were raised over the subject of taking vaccines or not. In this paper, using data gathered from Twitter, including tweets and user profiles, we offer a comprehensive analysis of public opinion in Iran about the Coronavirus vaccines. For this purpose, we applied a search query technique combined with a topic modeling approach to extract vaccine-related tweets. We utilized transformer-based models to classify the content of the tweets and extract themes revolving around vaccination. We also conducted an emotion analysis to evaluate the public happiness and anger around this topic. Our results demonstrate that Covid-19 vaccination has attracted considerable attention from different angles, such as governmental issues, safety or hesitancy, and side effects. Moreover, Coronavirus-relevant phenomena like public vaccination and the rate of infection deeply impacted public emotional status and users' interactions.

large-scale analysis, tweet, vaccination, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/s13278-023-01154-0

2302.04511

Country:

South America > Colombia > Meta Department > Villavicencio (0.04)
Asia > Philippines (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback