AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

An Embedding-Based Grocery Search Model at Instacart

Xie, Yuqing, Na, Taesik, Xiao, Xiao, Manchanda, Saurav, Rao, Young, Xu, Zhihong, Shu, Guanghua, Vasiete, Esther, Tenneti, Tejaswi, Wang, Haixun

arXiv.org Artificial IntelligenceSep-12-2022

The key to e-commerce search is how to best utilize the large yet noisy log data. In this paper, we present our embedding-based model for grocery search at Instacart. The system learns query and product representations with a two-tower transformer-based encoder architecture. To tackle the cold-start problem, we focus on content-based features. To train the model efficiently on noisy data, we propose a self-adversarial learning method and a cascade training method. AccOn an offline human evaluation dataset, we achieve 10% relative improvement in RECALL@20, and for online A/B testing, we achieve 4.1% cart-adds per search (CAPS) and 1.5% gross merchandise value (GMV) improvement. We describe how we train and deploy the embedding based search model and give a detailed analysis of the effectiveness of our method.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2209.05555

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Spain > Galicia > Madrid (0.05)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (0.40)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Representing Social Networks as Dynamic Heterogeneous Graphs

Maleki, Negar, Padamanabhan, Balaji, Dutta, Kaushik

arXiv.org Artificial IntelligenceSep-12-2022

Graph representations for real-world social networks in the past have missed two important elements: the multiplexity of connections as well as representing time. To this end, in this paper, we present a new dynamic heterogeneous graph representation for social networks which includes time in every single component of the graph, i.e., nodes and edges, each of different types that captures heterogeneity. We illustrate the power of this representation by presenting four time-dependent queries and deep learning problems that cannot easily be handled in conventional homogeneous graph representations commonly used. As a proof of concept we present a detailed representation of a new social media platform (Steemit), which we use to illustrate both the dynamic querying capability as well as prediction tasks using graph neural networks (GNNs). The results illustrate the power of the dynamic heterogeneous graph representation to model social networks. Given that this is a relatively understudied area we also illustrate opportunities for future work in query optimization as well as new dynamic prediction tasks on heterogeneous graph structures.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICDMW58026.2022.00098

2209.03144

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.66)

Add feedback

Large-scale Evaluation of Transformer-based Article Encoders on the Task of Citation Recommendation

Medić, Zoran, Šnajder, Jan

arXiv.org Artificial IntelligenceSep-12-2022

Recently introduced transformer-based article encoders (TAEs) designed to produce similar vector representations for mutually related scientific articles have demonstrated strong performance on benchmark datasets for scientific article recommendation. However, the existing benchmark datasets are predominantly focused on single domains and, in some cases, contain easy negatives in small candidate pools. Evaluating representations on such benchmarks might obscure the realistic performance of TAEs in setups with thousands of articles in candidate pools. In this work, we evaluate TAEs on large benchmarks with more challenging candidate pools. We compare the performance of TAEs with a lexical retrieval baseline model BM25 on the task of citation recommendation, where the model produces a list of recommendations for citing in a given input article. We find out that BM25 is still very competitive with the state-of-the-art neural retrievers, a finding which is surprising given the strong performance of TAEs on small benchmarks. As a remedy for the limitations of the existing benchmarks, we propose a new benchmark dataset for evaluating scientific article representations: Multi-Domain Citation Recommendation dataset (MDCR), which covers different scientific fields and contains challenging candidate pools.

information retrieval, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2209.05452

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Croatia > Zagreb County > Zagreb (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
(2 more...)

Add feedback

How AI Writing Tools are Revolutionizing Content Creation (2022)

#artificialintelligenceSep-11-2022, 04:16:57 GMT

Search engines are constantly evolving, and as a result, the way we create and consume content is also changing. In particular, the rise of artificial intelligence (AI) writing tools is revolutionizing the content creation process. AI writing software is now being used by bloggers and businesses to create high-quality content quickly and easily. This software can analyze data and find trends to help you write about what's popular right now. It can also help you come up with catchy headlines and create drafts that are ready for publishing. AI writing tools have improved a great deal over the past few years and now they can help with writing articles, digital ad copy, blog post ideas, youtube video descriptions, and Google ads all fast and in multiple languages. In this article, we'll discuss how AI writing software is changing the way bloggers and businesses create content and answer some frequently asked questions about this technology. AI writing tools are computer programs that can generate written content. AI tools can be used to create blog articles, website content, or even sales letters. Most AI tools use natural language processing (NLP) to understand the topic and then generate relevant content. AI writing tools can save you a lot of time by quickly generating high-quality content. Just enter a few keywords and the AI tool will do the rest.

ai tool, blog post, software, (14 more...)

#artificialintelligence

Industry: Marketing (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

Code Compliance Assessment as a Learning Problem

Sawant, Neela, Sengamedu, Srinivasan H.

arXiv.org Artificial IntelligenceSep-10-2022

Manual code reviews and static code analyzers are the traditional mechanisms to verify if source code complies with coding policies. However, these mechanisms are hard to scale. We formulate code compliance assessment as a machine learning (ML) problem, to take as input a natural language policy and code, and generate a prediction on the code's compliance, non-compliance, or irrelevance. This can help scale compliance classification and search for policies not covered by traditional mechanisms. We explore key research questions on ML model formulation, training data, and evaluation setup. The core idea is to obtain a joint code-text embedding space which preserves compliance relationships via the vector distance of code and policy embeddings. As there is no task-specific data, we re-interpret and filter commonly available software datasets with additional pre-training and pre-finetuning tasks that reduce the semantic gap. We benchmarked our approach on two listings of coding policies (CWE and CBP). This is a zero-shot evaluation as none of the policies occur in the training set. On CWE and CBP respectively, our tool Policy2Code achieves classification accuracies of (59%, 71%) and search MRR of (0.05, 0.21) compared to CodeBERT with classification accuracies of (37%, 54%) and MRR of (0.02, 0.02). In a user study, 24% Policy2Code detections were accepted compared to 7% for CodeBERT.

information retrieval, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2209.04602

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Nebraska > Lancaster County > Lincoln (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
(10 more...)

Genre: Research Report (0.91)

Industry: Education > Focused Education > Special Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback

Apple could lose $15B if DOJ forces Google to stop paying to be iPhone's default search engine

Daily Mail - Science & techSep-9-2022, 20:46:52 GMT

Apple stands to lose up to $15 billion a year if the Justice Department forces Google to stop paying the company to be the default search engine on all iPhones - as regulators question the legality of the longtime arrangement. Anytime iPhone users open a web browser to enter a search query, it always defaults to Google. Even though anyone can change this setting, almost no one does, resulting in a huge amount of traffic (and ad revenue) to Google from over a billion iPhone users worldwide. Analysts from Bernstein estimated that Google's payment to Apple would increase to $15 billion in 2021 and as high as $18-$20 billion this year, reports 9to5Mac. The contracts are the basis of the DOJ's antitrust against the California-based company, which began in the closing days of the Trump administration and won't head to trial until sometime in 2023 Last year, Apple's total gross profit was over $152 billion - so losing the Google payments would shave at least 10% off.

apple, default search engine, google, (11 more...)

Daily Mail - Science & tech

Country: North America > United States > California (0.27)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

Consensus - A search engine that finds you answers in scientific papers

#artificialintelligenceSep-9-2022, 03:33:10 GMT

Ever wonder what the research actually says? Just ask a question and Consensus will instantly read millions of research papers and deliver you answers. From nutrition, to exercise, to economic policy, Consensus makes you an expert on the research in seconds.

consensus, scientific paper, search engine

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

MICO: Selective Search with Mutual Information Co-training

Wang, Zhanyu, Zhang, Xiao, Yun, Hyokun, Teo, Choon Hui, Chilimbi, Trishul

arXiv.org Artificial IntelligenceSep-9-2022

In contrast to traditional exhaustive search, selective search first clusters documents into several groups before all the documents are searched exhaustively by a query, to limit the search executed within one group or only a few groups. Selective search is designed to reduce the latency and computation in modern large-scale search systems. In this study, we propose MICO, a Mutual Information CO-training framework for selective search with minimal supervision using the search logs. After training, MICO does not only cluster the documents, but also routes unseen queries to the relevant clusters for efficient retrieval. In our empirical experiments, MICO significantly improves the performance on multiple metrics of selective search and outperforms a number of existing competitive baselines.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2209.04378

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
(6 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Share the Tensor Tea: How Databases can Leverage the Machine Learning Ecosystem

Asada, Yuki, Fu, Victor, Gandhi, Apurva, Gemawat, Advitya, Zhang, Lihao, He, Dong, Gupta, Vivek, Nosakhare, Ehi, Banda, Dalitso, Sen, Rathijit, Interlandi, Matteo

arXiv.org Artificial IntelligenceSep-9-2022

We demonstrate Tensor Query Processor (TQP): a query processor that automatically compiles relational operators into tensor programs. By leveraging tensor runtimes such as PyTorch, TQP is able to: (1) integrate with ML tools (e.g., Pandas for data ingestion, Tensorboard for visualization); (2) target different hardware (e.g., CPU, GPU) and software (e.g., browser) backends; and (3) end-to-end accelerate queries containing both relational and ML operators. TQP is generic enough to support the TPC-H benchmark, and it provides performance that is comparable to, and often better than, that of specialized CPU and GPU query processors.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.14778/3554821.3554853

2209.04579

Genre: Research Report (0.50)

Industry: Education (0.41)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.37)

Add feedback

Extracting a Knowledge Base of COVID-19 Events from Social Media

Zong, Shi, Baheti, Ashutosh, Xu, Wei, Ritter, Alan

arXiv.org Artificial IntelligenceSep-9-2022

In this paper, we present a manually annotated corpus of 10,000 tweets containing public reports of five COVID-19 events, including positive and negative tests, deaths, denied access to testing, claimed cures and preventions. We designed slot-filling questions for each event type and annotated a total of 31 fine-grained slots, such as the location of events, recent travel, and close contacts. We show that our corpus can support fine-tuning BERT-based classifiers to automatically extract publicly reported events and help track the spread of a new disease. We also demonstrate that, by aggregating events extracted from millions of tweets, we achieve surprisingly high precision when answering complex queries, such as "Which organizations have employees that tested positive in Philadelphia?" We will release our corpus (with user-information removed), automatic extraction models, and the corresponding knowledge base to the research community.

knowledge base, proceedings, tweet, (15 more...)

arXiv.org Artificial Intelligence

2006.02567

Country:

Europe > United Kingdom (0.15)
North America > United States > California > San Francisco County > San Francisco (0.04)
North America > Jamaica (0.04)
(12 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
(3 more...)

Add feedback