AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Conceptualizing Machine Learning for Dynamic Information Retrieval of Electronic Health Record Notes

Jiang, Sharon, Shen, Shannon, Agrawal, Monica, Lam, Barbara, Kurtzman, Nicholas, Horng, Steven, Karger, David, Sontag, David

arXiv.org Artificial IntelligenceAug-9-2023

The large amount of time clinicians spend sifting through patient notes and documenting in electronic health records (EHRs) is a leading cause of clinician burnout. By proactively and dynamically retrieving relevant notes during the documentation process, we can reduce the effort required to find relevant patient history. In this work, we conceptualize the use of EHR audit logs for machine learning as a source of supervision of note relevance in a specific clinical context, at a particular point in time. Our evaluation focuses on the dynamic retrieval in the emergency department, a high acuity setting with unique patterns of information retrieval and note writing. We show that our methods can achieve an AUC of 0.963 for predicting which notes will be read in an individual note writing session. We additionally conduct a user study with several clinicians and find that our framework can help clinicians retrieve relevant information more efficiently. Demonstrating that our framework and methods can perform well in this demanding setting is a promising proof of concept that they will translate to other clinical settings and data modalities (e.g., labs, medications, imaging).

information retrieval, machine learning, source document, (16 more...)

arXiv.org Artificial Intelligence

2308.08494

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Middle East > Israel (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Health Care Providers & Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.74)

Add feedback

A Universal Question-Answering Platform for Knowledge Graphs

Omar, Reham, Dhall, Ishika, Kalnis, Panos, Mansour, Essam

arXiv.org Artificial IntelligenceAug-8-2023

Knowledge from diverse application domains is organized as knowledge graphs (KGs) that are stored in RDF engines accessible in the web via SPARQL endpoints. Expressing a well-formed SPARQL query requires information about the graph structure and the exact URIs of its components, which is impractical for the average user. Question answering (QA) systems assist by translating natural language questions to SPARQL. Existing QA systems are typically based on application-specific human-curated rules, or require prior information, expensive pre-processing and model adaptation for each targeted KG. Therefore, they are hard to generalize to a broad set of applications and KGs. In this paper, we propose KGQAn, a universal QA system that does not need to be tailored to each target KG. Instead of curated rules, KGQAn introduces a novel formalization of question understanding as a text generation problem to convert a question into an intermediate abstract representation via a neural sequence-to-sequence model. We also develop a just-in-time linker that maps at query time the abstract representation to a SPARQL query for a specific KG, using only the publicly accessible APIs and the existing indices of the RDF store, without requiring any pre-processing. Our experiments with several real KGs demonstrate that KGQAn is easily deployed and outperforms by a large margin the state-of-the-art in terms of quality of answers and processing time, especially for arbitrary KGs, unseen during the training.

machine learning, natural language, question answering, (23 more...)

arXiv.org Artificial Intelligence

2303.00595

Country:

Europe > Russia > Northwestern Federal District > Kaliningrad Oblast > Kaliningrad (0.06)
Europe > Denmark (0.05)
Atlantic Ocean > North Atlantic Ocean > Baltic Sea (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Semantic Equivalence of e-Commerce Queries

Mandal, Aritra, Tunkelang, Daniel, Wu, Zhe

arXiv.org Artificial IntelligenceAug-7-2023

Search query variation poses a challenge in e-commerce search, as equivalent search intents can be expressed through different queries with surface-level differences. This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes. The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives. The framework utilizes both surface similarity and behavioral similarity to determine query equivalence. Surface similarity involves canonicalizing queries based on word inflection, word order, compounding, and noise words. Behavioral similarity leverages historical search behavior to generate vector representations of query intent. An offline process is used to train a sentence similarity model, while an online nearest neighbor approach supports processing of unseen queries. Experimental evaluations demonstrate the effectiveness of the proposed approach, outperforming popular sentence transformer models and achieving a Pearson correlation of 0.85 for query similarity. The results highlight the potential of leveraging historical behavior data and training models to recognize and utilize query equivalence in e-commerce search, leading to improved user experiences and business outcomes. Further advancements and benchmark datasets are encouraged to facilitate the development of solutions for this critical problem in the e-commerce domain.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.03869

Country: North America > United States > California > Santa Clara County > San Jose (0.04)

Genre: Research Report (0.40)

Industry: Information Technology > Services > e-Commerce Services (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.32)

Add feedback

Search Engine and Recommendation System for the Music Industry built with JinaAI

Gopalakrishnan, Ishita, R, Sanjjushri Varshini, V, Ponshriharini

arXiv.org Artificial IntelligenceAug-7-2023

One of the most intriguing debates regarding a novel task is the development of search engines and recommendation-based systems in the music industry. Studies have shown a drastic depression in the search engine fields, due to concerning factors such as speed, accuracy and the format of data given for querying. Often people face difficulty in searching for a song solely based on the title, hence a solution is proposed to complete a search analysis through a single query input and is matched with the lyrics of the songs present in the database. Hence it is essential to incorporate cutting-edge technology tools for developing a user-friendly search engine. Jina AI is an MLOps framework for building neural search engines that are utilized, in order for the user to obtain accurate results. Jina AI effectively helps to maintain and enhance the quality of performance for the search engine for the query given. An effective search engine and a recommendation system for the music industry, built with JinaAI.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2308.03842

Country:

North America > United States (0.04)
Europe > Slovakia > Bratislava > Bratislava (0.04)
Europe > Poland (0.04)

Genre:

Research Report (0.50)
Overview > Innovation (0.34)

Industry: Media > Music (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

CrossTalk: Intelligent Substrates for Language-Oriented Interaction in Video-Based Communication and Collaboration

Xia, Haijun, Wang, Tony, Gunturu, Aditya, Jiang, Peiling, Duan, William, Yao, Xiaoshuo

arXiv.org Artificial IntelligenceAug-7-2023

Despite the advances and ubiquity of digital communication media such as videoconferencing and virtual reality, they remain oblivious to the rich intentions expressed by users. Beyond transmitting audio, videos, and messages, we envision digital communication media as proactive facilitators that can provide unobtrusive assistance to enhance communication and collaboration. Informed by the results of a formative study, we propose three key design concepts to explore the systematic integration of intelligence into communication and collaboration, including the panel substrate, language-based intent recognition, and lightweight interaction techniques. We developed CrossTalk, a videoconferencing system that instantiates these concepts, which was found to enable a more fluid and flexible communication and collaboration experience.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3586183.3606773

2308.03311

Country:

North America > United States > California > San Francisco County > San Francisco (0.16)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Ontario > Toronto (0.14)
(21 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.69)
Education > Educational Setting (0.67)
Information Technology (0.67)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Collaboration (1.00)
(5 more...)

Add feedback

Improving Domain-Specific Retrieval by NLI Fine-Tuning

Dušek, Roman, Wawer, Aleksander, Galias, Christopher, Wojciechowska, Lidia

arXiv.org Artificial IntelligenceAug-6-2023

The aim of this article is to investigate the fine-tuning potential of natural language inference (NLI) data to improve information retrieval and ranking. We demonstrate this for both English and Polish languages, using data from one of the largest Polish e-commerce sites and selected open-domain datasets. We employ both monolingual and multilingual sentence encoders fine-tuned by a supervised method utilizing contrastive loss and NLI data. Our results point to the fact that NLI fine-tuning increases the performance of the models in both tasks and both languages, with the potential to improve mono- and multilingual models. Finally, we investigate uniformity and alignment of the embeddings to explain the effect of NLI-based fine-tuning for an out-of-domain use-case.

benchmark, information retrieval, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2308.03103

Country:

Europe > Poland > Greater Poland Province > Poznań (0.05)
North America > United States > New York > New York County > New York City (0.04)
North America > Dominican Republic (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.52)

Add feedback

Towards Consistency Filtering-Free Unsupervised Learning for Dense Retrieval

Shi, Haoxiang, Fujita, Sumio, Sakai, Tetsuya

arXiv.org Artificial IntelligenceAug-5-2023

Domain transfer is a prevalent challenge in modern neural Information Retrieval (IR). To overcome this problem, previous research has utilized domain-specific manual annotations and synthetic data produced by consistency filtering to finetune a general ranker and produce a domain-specific ranker. However, training such consistency filters are computationally expensive, which significantly reduces the model efficiency. In addition, consistency filtering often struggles to identify retrieval intentions and recognize query and corpus distributions in a target domain. In this study, we evaluate a more efficient solution: replacing the consistency filter with either direct pseudo-labeling, pseudo-relevance feedback, or unsupervised keyword generation methods for achieving consistent filtering-free unsupervised dense retrieval. Our extensive experimental evaluations demonstrate that, on average, TextRank-based pseudo relevance feedback outperforms other methods. Furthermore, we analyzed the training and inference efficiency of the proposed paradigm. The results indicate that filtering-free unsupervised learning can continuously improve training and inference efficiency while maintaining retrieval performance. In some cases, it can even improve performance based on particular datasets.

information retrieval, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2308.02926

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
North America > Canada > Ontario > Toronto (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.61)

Add feedback

Brave's privacy-focused search engine can now find images and videos

EngadgetAug-3-2023, 21:15:48 GMT

Brave's search engine no longer requires that you jump to Bing or Google just to find photos or videos. The company has introduced image and video queries to Brave Search, helping you find media while maintaining the same levels of privacy and freedom of access. You won't have to worry about being profiled through your picture hunts, or risk missing politically sensitive content (if unintentionally) pulled from another engine's index. You'll still have the option of continuing searches through competitors, at least for a while. The choice helps you get the results you're looking for, so long as you don't mind using a major engine.

find image and video, privacy-focused search engine

Engadget

Technology:

Information Technology > Information Management > Search (0.97)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

Brave's search engine is now totally independent from Google and Bing

PCWorldAug-3-2023, 19:31:31 GMT

Brave said Thursday that it has now completely separated its own search capabilities from Google and Microsoft Bing, allowing any search query within the Brave browser to be searched entirely by Brave itself. Based on the company's previous claim that "Brave Search is 100 percent private and anonymous," the change would mean that Brave Search would now be completely private, regardless of what you now search for. Brave is one of a handful of niche browsers, along with Opera, Vivaldi, Firefox, and more, that address a tiny niche of the browser market that's dominated by Google Chrome, then Microsoft Edge. Until now, Brave's own search engine had crawled the Internet by itself, developing its own database for search queries. But its image and video search had used Bing and Google.

artificial intelligence, information retrieval, natural language, (7 more...)

PCWorld

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Self-Supervised Contrastive BERT Fine-tuning for Fusion-based Reviewed-Item Retrieval

Pour, Mohammad Mahdi Abdollah, Farinneya, Parsa, Toroghi, Armin, Korikov, Anton, Pesaranghader, Ali, Sajed, Touqir, Bharadwaj, Manasa, Mavrin, Borislav, Sanner, Scott

arXiv.org Artificial IntelligenceAug-1-2023

As natural language interfaces enable users to express increasingly complex natural language queries, there is a parallel explosion of user review content that can allow users to better find items such as restaurants, books, or movies that match these expressive queries. While Neural Information Retrieval (IR) methods have provided state-of-theart results for matching queries to documents, they have not been extended to the task of Reviewed-Item Retrieval (RIR), where query-review scores must be aggregated (or fused) into item-level scores for ranking. In the absence of labeled RIR datasets, we extend Neural IR methodology to RIR by leveraging self-supervised methods for contrastive learning of BERT embeddings for both queries and reviews. Specifically, contrastive learning requires a choice of positive and negative samples, where the unique two-level structure of our item-review data combined with metadata affords us a rich structure for the selection of these samples. For contrastive learning in a Late Fusion scenario (where we aggregate queryreview scores into item-level scores), we investigate the use of positive review samples from the same item and/or with the same rating, selection of hard positive samples by choosing the least similar reviews from the same anchor item, and selection of hard negative samples by choosing the most similar reviews from different items. We also explore anchor sub-sampling and augmenting with meta-data. For a more end-to-end Early Fusion approach, we introduce contrastive item embedding learning to fuse reviews into single item embeddings. Experimental results show that Late Fusion contrastive learning for Neural RIR outperforms all other contrastive IR configurations, Neural IR, and sparse retrieval baselines, thus demonstrating the power of exploiting the two-level structure in Neural RIR approaches as well as the importance of preserving the nuance of individual review content via Late Fusion methods.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-28244-7_1

2308.00762

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Thailand > Bangkok > Bangkok (0.05)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback