AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Enhanced vectors for top-k document retrieval in Question Answering

Hammad, Mohammed

arXiv.org Artificial IntelligenceOct-8-2022

Modern day applications, especially information retrieval webapps that involve "search" as their use cases are gradually moving towards "answering" modules. Conversational chatbots which have been proved to be more engaging to users, use Question Answering as their core. Since, precise answering is computationally expensive, several approaches have been developed to prefetch the most relevant documents/passages from the database that contain the answer. We propose a different approach that retrieves the evidence documents efficiently and accurately, making sure that the relevant document for a given user query is not missed. We do so by assigning each document (or passage in our case), a unique identifier and using them to create dense vectors which can be efficiently indexed. More precisely, we use the identifier to predict randomly sampled context window words of the relevant question corresponding to the passage along with the words of passage itself. This naturally embeds the passage identifier into the vector space in such a way that the embedding is closer to the question without compromising he information content. This approach enables efficient creation of real-time query vectors in ~4 milliseconds.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.10584

Country: North America > United States (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

How to tackle an emerging topic? Combining strong and weak labels for Covid news NER

Ficek, Aleksander, Liu, Fangyu, Collier, Nigel

arXiv.org Artificial IntelligenceOct-8-2022

Being able to train Named Entity Recognition (NER) models for emerging topics is crucial for many real-world applications especially in the medical domain where new topics are continuously evolving out of the scope of existing models and datasets. For a realistic evaluation setup, we introduce a novel COVID-19 news NER dataset (COVIDNEWS-NER) and release 3000 entries of hand annotated strongly labelled sentences and 13000 auto-generated weakly labelled sentences. Besides the dataset, we propose CONTROSTER, a recipe to strategically combine weak and strong labels in improving NER in an emerging topic through transfer learning. We show the effectiveness of CONTROSTER on COVIDNEWS-NER while providing analysis on combining weak and strong labels for training. Our key findings are: (1) Using weak data to formulate an initial backbone before tuning on strong data outperforms methods trained on only strong or weak data. (2) A combination of out-of-domain and in-domain weak label training is crucial and can overcome saturation when being training on weak labels from a single source.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2209.15108

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.71)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.47)

Add feedback

Short Text Pre-training with Extended Token Classification for E-commerce Query Understanding

Jiang, Haoming, Cao, Tianyu, Li, Zheng, Luo, Chen, Tang, Xianfeng, Yin, Qingyu, Zhang, Danqing, Goutam, Rahul, Yin, Bing

arXiv.org Artificial IntelligenceOct-8-2022

E-commerce query understanding is the process of inferring the shopping intent of customers by extracting semantic meaning from their search queries. The recent progress of pre-trained masked language models (MLM) in natural language processing is extremely attractive for developing effective query understanding models. Specifically, MLM learns contextual text embedding via recovering the masked tokens in the sentences. Such a pre-training process relies on the sufficient contextual information. It is, however, less effective for search queries, which are usually short text. When applying masking to short search queries, most contextual information is lost and the intent of the search queries may be changed. To mitigate the above issues for MLM pre-training on search queries, we propose a novel pre-training task specifically designed for short text, called Extended Token Classification (ETC). Instead of masking the input text, our approach extends the input by inserting tokens via a generator network, and trains a discriminator to identify which tokens are inserted in the extended input. We conduct experiments in an E-commerce store to demonstrate the effectiveness of ETC.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.03915

Genre: Research Report (1.00)

Industry: Information Technology > Services > e-Commerce Services (0.82)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)

Add feedback

Quantifying Political Bias in News Articles

Gezici, Gizem

arXiv.org Artificial IntelligenceOct-7-2022

Search bias analysis is getting more attention in recent years since search results could affect In this work, we aim to establish an automated model for evaluating ideological bias in online news articles. The dataset is composed of news articles in search results as well as the newspaper articles. The current automated model results show that model capability is not sufficient to be exploited for annotating the documents automatically, thereby computing bias in search results.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2210.03404

Country:

North America > United States (0.94)
North America > Cuba (0.04)
Europe > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)
Asia > Middle East > Republic of Türkiye > Istanbul Province > Istanbul (0.04)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.69)

Industry:

Media > News (0.89)
Government > Regional Government > North America Government > United States Government (0.69)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.37)

Add feedback

HealthE: Classifying Entities in Online Textual Health Advice

Gatto, Joseph, Seegmiller, Parker, Johnston, Garrett, Preum, Sarah M.

arXiv.org Artificial IntelligenceOct-6-2022

The processing of entities in natural language is essential to many medical NLP systems. Unfortunately, existing datasets vastly under-represent the entities required to model public health relevant texts such as health advice often found on sites like WebMD. People rely on such information for personal health management and clinically relevant decision making. In this work, we release a new annotated dataset, HealthE, consisting of 6,756 health advice. HealthE has a more granular label space compared to existing medical NER corpora and contains annotation for diverse health phrases. Additionally, we introduce a new health entity classification model, EP S-BERT, which leverages textual context patterns in the classification of entity classes. EP S-BERT provides a 4-point increase in F1 score over the nearest baseline and a 34-point increase in F1 when compared to off-the-shelf medical NER tools trained to extract disease and medication mentions from clinical texts. All code and data are publicly available on Github.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.03246

Country: North America > United States > Oklahoma > Payne County > Cushing (0.04)

Genre: Research Report (0.51)

Industry:

Health & Medicine > Consumer Health (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.95)
Education > Health & Safety > School Nutrition (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

SynKB: Semantic Search for Synthetic Procedures

Bai, Fan, Ritter, Alan, Madrid, Peter, Freitag, Dayne, Niekrasz, John

arXiv.org Artificial IntelligenceOct-6-2022

In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.

artificial intelligence, information retrieval, natural language, (20 more...)

arXiv.org Artificial Intelligence

2208.074

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (0.40)

Industry:

Government > Regional Government > North America Government > United States Government (0.71)
Materials > Chemicals > Commodity Chemicals (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Ex-Google ad boss builds tracker-free search engine

BBC NewsOct-5-2022, 16:00:55 GMT

Creator Sridhar Ramaswamy, who worked at Google for 16 years and ran its ad business, told BBC News the technology sector had become "exploitative" of people's data, something he no longer wanted to be a part of.

boss build tracker-free search engine, ex-google ad boss build

BBC News

Industry: Information Technology (1.00)

Technology:

Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

A new feature has been added to Google's review pages: Pros and Cons structured data

#artificialintelligenceOct-5-2022, 14:20:23 GMT

Google now supports the pros and cons of structured data for review pages in the search results as per editorial reviews using the new validated markup such as structured data markup. Google also prioritized the supply of structured data given by your content which extracts the data for pros and cons usually shown in Google search engines. So it is likely to create value for your blog post to tell Google what data types you want to show in Google Search over Google testing to assume. Meanwhile, you markup the pros and cons of your blog post, and Google might show the rich results markup for your featured snippets. To implement the pros and cons of structured data, you must be aware of categorized structures by Google: structured data, unstructured data, and semi-structured data markup, and how they work precisely.

data markup, google, markup, (15 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.45)

Add feedback

Medical Image Retrieval via Nearest Neighbor Search on Pre-trained Image Features

Gupta, Deepak, Loane, Russell, Gayen, Soumya, Demner-Fushman, Dina

arXiv.org Artificial IntelligenceOct-5-2022

Nearest neighbor search (NNS) aims to locate the points in high-dimensional space that is closest to the query point. The brute-force approach for finding the nearest neighbor becomes computationally infeasible when the number of points is large. The NNS has multiple applications in medicine, such as searching large medical imaging databases, disease classification, diagnosis, etc. With a focus on medical imaging, this paper proposes DenseLinkSearch an effective and efficient algorithm that searches and retrieves the relevant images from heterogeneous sources of medical images. Towards this, given a medical database, the proposed algorithm builds the index that consists of pre-computed links of each point in the database. The search algorithm utilizes the index to efficiently traverse the database in search of the nearest neighbor. We extensively tested the proposed NNS approach and compared the performance with state-of-the-art NNS approaches on benchmark datasets and our created medical image datasets. The proposed approach outperformed the existing approach in terms of retrieving accurate neighbors and retrieval speed. We also explore the role of medical image feature representation in content-based medical image retrieval tasks. We propose a Transformer-based feature representation technique that outperformed the existing pre-trained Transformer approach on CLEF 2011 medical image retrieval task. The source code of our experiments are available at https://github.com/deepaknlp/DLS.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.02401

Country:

North America > United States > Maryland > Montgomery County > Bethesda (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(3 more...)

Add feedback

Zeroth-Order Negative Curvature Finding: Escaping Saddle Points without Gradients

Zhang, Hualin, Xiong, Huan, Gu, Bin

arXiv.org Artificial IntelligenceOct-4-2022

We consider escaping saddle points of nonconvex problems where only the function evaluations can be accessed. Although a variety of works have been proposed, the majority of them require either second or first-order information, and only a few of them have exploited zeroth-order methods, particularly the technique of negative curvature finding with zeroth-order methods which has been proven to be the most efficient method for escaping saddle points. To fill this gap, in this paper, we propose two zeroth-order negative curvature finding frameworks that can replace Hessian-vector product computations without increasing the iteration complexity. We apply the proposed frameworks to ZO-GD, ZO-SGD, ZO-SCSG, ZO-SPIDER and prove that these ZO algorithms can converge to $(\epsilon,\delta)$-approximate second-order stationary points with less query complexity compared with prior zeroth-order works for finding local minima.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2210.01496

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback