AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Apple Is Quietly Working On Its Own Search Engine To Take On Google

International Business TimesOct-29-2020, 22:15:02 GMT

Apple may be stealthily developing its own search engine, as Google faces a lawsuit from the U.S. antitrust authorities regarding the search engine giant's agreements with companies to be the default search tool. In the newest operating system update for the iPhone, the iOS 14, Apple has started showing its own search results and direct links to websites when users search from their home screen. In its updated version, iOS 14 does not use Google for many of its search functions, as it previously used to. The search window that appears in iPhones when users swipe right now compiles Apple-generated search suggestions rather than Google results. Earlier this week, the U.S. Department of Justice, in a landmark lawsuit said, Google is monopolizing the search space by entering into multi-billion dollar deals with mobile companies like Apple, Motorola, and network carriers like AT&T and Verizon, to be the default search engine on devices.

artificial intelligence, information retrieval, natural language, (13 more...)

International Business Times

Country: North America > United States (1.00)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Introduction to Machine Learning

#artificialintelligenceOct-28-2020, 14:35:49 GMT

Most readers will be familiar with the concept of web page ranking. That is the process of submitting a query to a search engine, which then finds web pages relevant to the query and which returns them in their order of relevance. See e.g. Figure below for an example of the query results for "Machine Learning". That is, the search engine returns a sorted list of web pages given a query. To achieve this goal, a search engine needs to'know' which pages are relevant and which pages match the query.

information retrieval, machine learning, natural language, (12 more...)

#artificialintelligence

Country: Asia > India > Karnataka > Bengaluru (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.99)

Add feedback

Active Classification with Uncertainty Comparison Queries

Cui, Zhenghang, Sato, Issei

arXiv.org Machine LearningOct-28-2020

Noisy pairwise comparison feedback has been incorporated to improve the overall query complexity of interactively learning binary classifiers. The \textit{positivity comparison oracle} is used to provide feedback on which is more likely to be positive given a pair of data points. Because it is impossible to infer accurate labels using this oracle alone \textit{without knowing the classification threshold}, existing methods still rely on the traditional \textit{explicit labeling oracle}, which directly answers the label given a data point. Existing methods conduct sorting on all data points and use explicit labeling oracle to find the classification threshold. The current methods, however, have two drawbacks: (1) they needs unnecessary sorting for label inference; (2) quick sort is naively adapted to noisy feedback and negatively affects practical performance. In order to avoid this inefficiency and acquire information of the classification threshold, we propose a new pairwise comparison oracle concerning uncertainties. This oracle receives two data points as input and answers which one has higher uncertainty. We then propose an efficient adaptive labeling algorithm using the proposed oracle and the positivity comparison oracle. In addition, we also address the situation where the labeling budget is insufficient compared to the dataset size, which can be dealt with by plugging the proposed algorithm into an active learning algorithm. Furthermore, we confirm the feasibility of the proposed oracle and the performance of the proposed algorithm theoretically and empirically.

algorithm, oracle, probability, (15 more...)

arXiv.org Machine Learning

2008.00645

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

A Clarifying Question Selection System from NTES_ALONG in Convai3 Challenge

Ou, Wenjie, Lin, Yue

arXiv.org Artificial IntelligenceOct-28-2020

This paper presents the participation of NTES\_ALONG team for the ClariQ challenge at Search-oriented Conversational AI (SCAI) EMNLP workshop in 2020. The challenge asks for a complete conversational information retrieval system that can understanding and generating clarification questions. We propose a clarifying question selection system which consists of response understanding, candidate question recalling and clarifying question ranking. We fine-tune a RoBERTa model to understand user's responses and use an enhanced BM25 model to recall the candidate questions. In clarifying question ranking stage, we reconstruct the training dataset and propose two models based on ELECTRA. Finally we ensemble the models by summing up their output probabilities and choose the question with the highest probability as the clarification question. Experiments show that our ensemble ranking model outperforms in the document relevance task and achieves the best recall@[20,30] metrics in question relevance task.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2010.14202

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

QBSUM: a Large-Scale Query-Based Document Summarization Dataset from Real-world Applications

Zhao, Mingjun, Yan, Shengli, Liu, Bang, Zhong, Xinwang, Hao, Qian, Chen, Haolan, Niu, Di, Long, Bowei, Guo, Weidong

arXiv.org Artificial IntelligenceOct-28-2020

Query-based document summarization aims to extract or generate a summary of a document which directly answers or is relevant to the search query. It is an important technique that can be beneficial to a variety of applications such as search engines, document-level machine reading comprehension, and chatbots. Currently, datasets designed for query-based summarization are short in numbers and existing datasets are also limited in both scale and quality. Moreover, to the best of our knowledge, there is no publicly available dataset for Chinese query-based document summarization. In this paper, we present QBSUM, a high-quality large-scale dataset consisting of 49,000+ data samples for the task of Chinese query-based document summarization. We also propose multiple unsupervised and supervised solutions to the task and demonstrate their high-speed inference and superior performance via both offline experiments and online A/B tests. The QBSUM dataset is released in order to facilitate future advancement of this research field.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csl.2020.101166

2010.14108

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
(2 more...)

Add feedback

Query Complexity of k-NN based Mode Estimation

Singhal, Anirudh, Pirojiwala, Subham, Karamchandani, Nikhil

arXiv.org Machine LearningOct-26-2020

Motivated by the mode estimation problem of an unknown multivariate probability density function, we study the problem of identifying the point with the minimum k-th nearest neighbor distance for a given dataset of n points. We study the case where the pairwise distances are apriori unknown, but we have access to an oracle which we can query to get noisy information about the distance between any pair of points. For two natural oracle models, we design a sequential learning algorithm, based on the idea of confidence intervals, which adaptively decides which queries to send to the oracle and is able to correctly solve the problem with high probability. We derive instance-dependent upper bounds on the query complexity of our proposed scheme and also demonstrate significant improvement over the performance of other baselines via extensive numerical evaluations.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2010.13491

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.61)

Add feedback

A Survey of Embedding Space Alignment Methods for Language and Knowledge Graphs

Kalinowski, Alexander, An, Yuan

arXiv.org Artificial IntelligenceOct-26-2020

The purpose of this survey is to explore the core techniques and categorizations of methods for aligning low-dimensional embedding spaces. Projecting sparse, high-dimensional data sets into compact, lower-dimensional spaces allows not only for a significant reduction in storage space, but also builds dense representations with many applications. These embedding spaces have become a staple in representation learning ever since their heralded application to natural language in a technique called word2vec, and have replaced traditional machine learning features as easy-to-build, high-quality representations of the source objects. There has been a wealth of study around techniques for embedding objects, such as images, natural language and knowledge graphs, and many research agendas focused on mapping one embedding space to another, either for the purpose of aligning and unifying to a common space, applications to joint downstream tasks or ease of transfer learning. In order to fully leverage these dense representations and translate them across domains and problem spaces, techniques for establishing alignments between them must be developed and understood.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2010.13688

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(9 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Chile's New Interdisciplinary Institute for Foundational Research on Data

Communications of the ACMOct-24-2020, 05:51:15 GMT

The Millennium Institute for Foundational Research on Dataa (IMFD) started its operations in June 2018, funded by the Millennium Science Initiative of the Chilean National Agency of Research and Development.b IMFD is a joint initiative led by Universidad de Chile and Universidad Católica de Chile, with the participation of five other Chilean universities: Universidad de Concepción, Universidad de Talca, Universidad Técnica Federico Santa María, Universidad Diego Portales, and Universidad Adolfo Ibáñez. IMFD aims to be a reference center in Latin America related to state-of-the-art research on the foundational problems with data, as well as its applications to tackling diverse issues ranging from scientific challenges to complex social problems. As tasks of this kind are interdisciplinary by nature, IMFD gathers a large number of researchers in several areas that include traditional computer science areas such as data management, Web science, algorithms and data structures, privacy and verification, information retrieval, data mining, machine learning, and knowledge representation, as well as some areas from other fields, including statistics, political science, and communication studies. IMFD currently hosts 36 researchers, seven postdoctoral fellows, and more than 100 students.

information, information retrieval, natural language, (19 more...)

Communications of the ACM

AI-Alerts: 2020 > 2020-10 > AAAI AI-Alert for Oct 27, 2020 (1.00)

Country:

North America > Central America (0.25)
South America > Chile > Maule Region > Talca Province > Talca (0.24)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
(6 more...)

Genre: Research Report (0.47)

Industry:

Media > News (0.97)
Energy (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Keyphrase Extraction with Dynamic Graph Convolutional Networks and Diversified Inference

Zhang, Haoyu, Long, Dingkun, Xu, Guangwei, Xie, Pengjun, Huang, Fei, Wang, Ji

arXiv.org Artificial IntelligenceOct-24-2020

Keyphrase extraction (KE) aims to summarize a set of phrases that accurately express a concept or a topic covered in a given document. Recently, Sequence-to-Sequence (Seq2Seq) based generative framework is widely used in KE task, and it has obtained competitive performance on various benchmarks. The main challenges of Seq2Seq methods lie in acquiring informative latent document representation and better modeling the compositionality of the target keyphrases set, which will directly affect the quality of generated keyphrases. In this paper, we propose to adopt the Dynamic Graph Convolutional Networks (DGCN) to solve the above two problems simultaneously. Concretely, we explore to integrate dependency trees with GCN for latent representation learning. Moreover, the graph structure in our model is dynamically modified during the learning process according to the generated keyphrases. To this end, our approach is able to explicitly learn the relations within the keyphrases collection and guarantee the information interchange between encoder and decoder in both directions. Extensive experiments on various KE benchmark datasets demonstrate the effectiveness of our approach.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2010.12828

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Google Paid Apple Billions To Dominate Search On iPhones, Justice Department Says

NPR TechnologyOct-22-2020, 12:55:26 GMT

The Justice Department says Google CEO Sundar Pichai (left) met privately with Apple chief Tim Cook in 2018 to discuss how their two companies could collaborate. The Justice Department says Google CEO Sundar Pichai (left) met privately with Apple chief Tim Cook in 2018 to discuss how their two companies could collaborate. Buried on page 36 of the Justice Department lawsuit accusing Google of abusing its monopoly power is this remarkable figure: $8 billion to $12 billion. That's the hefty sum Google allegedly paid Apple for one of the most prized pieces of real estate in the world of online search: default status on iPhones and all other Apple devices. Justice Department investigators say Apple, which does not have its own search engine, hammered out a multiyear deal making Google the default search engine on all iPhones and other Apple products.

artificial intelligence, information retrieval, natural language, (16 more...)

NPR Technology

Country:

Europe (0.06)
North America > United States > California (0.05)

Industry:

Law (1.00)
Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)

Add feedback