AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

What Do You Need to Know to Use a Search Engine? Why We Still Need to Teach Research Skills

AI MagazineOct-6-2020, 15:39:58 GMT

For the vast majority of queries (for example, navigation, simple fact lookup, and others), search engines do extremely well. Their ability to quickly provide answers to queries is a remarkable testament to the power of many of the fundamental methods of AI. They also highlight many of the issues that are common to sophisticated AI question-answering systems. It has become clear that people think of search programs in ways that are very different from traditional information sources. Rapid and ready-at-hand access, depth of processing, and the way they enable people to offload some ordinary memory tasks suggest that search engines have become more of a cognitive amplifier than a simple repository or front-end to the Internet.

information retrieval, question answering, teach research skill, (6 more...)

AI Magazine

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.92)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.62)

Add feedback

SCOBO: Sparsity-Aware Comparison Oracle Based Optimization

Cai, HanQin, Mckenzie, Daniel, Yin, Wotao, Zhang, Zhenliang

arXiv.org Artificial IntelligenceOct-6-2020

We study derivative-free optimization for convex functions where we further assume that function evaluations are unavailable. Instead, one only has access to a comparison oracle, which, given two points $x$ and $y$, and returns a single bit of information indicating which point has larger function value, $f(x)$ or $f(y)$, with some probability of being incorrect. This probability may be constant or it may depend on $|f(x)-f(y)|$. Previous algorithms for this problem have been hampered by a query complexity which is polynomially dependent on the problem dimension, $d$. We propose a novel algorithm that breaks this dependence: it has query complexity only logarithmically dependent on $d$ if the function in addition has low dimensional structure that can be exploited. Numerical experiments on synthetic data and the MuJoCo dataset show that our algorithm outperforms state-of-the-art methods for comparison based optimization, and is even competitive with other derivative-free algorithms that require explicit function evaluations.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2010.02479

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
(2 more...)

Add feedback

Query complexity of adversarial attacks

Głuch, Grzegorz, Urbanke, Rüdiger

arXiv.org Machine LearningOct-2-2020

The decision boundary of a learning algorithm applied to a given task can be viewed as the outcome of a random process: (i) generate a training set and, (ii) apply to it the, potentially randomized, learning algorithm. Recall, see Definitions 4 and 5, that a query-bounded adversary does not know the sample on which the model was trained nor the randomness used by the learner. This means that if the decision boundary has high entropy then the adversary needs to ask many questions to recover the boundary to a high degree of precision. This suggest that high-entropy decision boundaries are robust against query-bounded adversaries since intuitively it is clear that an approximate knowledge of the decision boundary is a prerequisite for a successful attack. Following this reasoning, we present two instances where high entropy of the decision boundary leads to security.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2010.01039

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Sweden > Stockholm > Stockholm (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.50)
Government > Military (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.41)

Add feedback

ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention

Gomez-Perez, Jose Manuel, Ortega, Raul

arXiv.org Artificial IntelligenceOct-1-2020

Textbook Question Answering is a complex task in the intersection of Machine Comprehension and Visual Question Answering that requires reasoning with multimodal information from text and diagrams. For the first time, this paper taps on the potential of transformer language models and bottom-up and top-down attention to tackle the language and visual understanding challenges this task entails. Rather than training a language-visual transformer from scratch we rely on pre-trained transformers, fine-tuning and ensembling. We add bottom-up and top-down attention to identify regions of interest corresponding to diagram constituents and their relationships, improving the selection of relevant visual information for each question and answer options. Our system ISAAQ reports unprecedented success in all TQA question types, with accuracies of 81.36%, 71.11% and 55.12% on true/false, text-only and diagram multiple choice questions. ISAAQ also demonstrates its broad applicability, obtaining state-of-the-art results in other demanding datasets.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2010.00562

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Extracting Concepts for Precision Oncology from the Biomedical Literature

Greenspan, Nicholas, Si, Yuqi, Roberts, Kirk

arXiv.org Artificial IntelligenceSep-30-2020

This paper describes an initial dataset and automatic natural language processing (NLP) method for extracting concepts related to precision oncology from biomedical research articles. We extract five concept types: Cancer, Mutation, Population, Treatment, Outcome. A corpus of 250 biomedical abstracts were annotated with these concepts following standard double-annotation procedures. We then experiment with BERT-based models for concept extraction. The best-performing model achieved a precision of 63.8%, a recall of 71.9%, and an F1 of 67.1. Finally, we propose additional directions for research for improving extraction performance and utilizing the NLP system in downstream precision oncology applications.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2010.00074

Country:

North America > United States > Texas > Harris County > Houston (0.04)
North America > United States > Colorado (0.04)
Asia > China > Hunan Province (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards a Multi-modal, Multi-task Learning based Pre-training Framework for Document Representation Learning

Pramanik, Subhojeet, Mujumdar, Shashank, Patel, Hima

arXiv.org Artificial IntelligenceSep-30-2020

In this paper, we propose a multi-task learning-based framework that utilizes a combination of self-supervised and supervised pre-training tasks to learn a generic document representation. We design the network architecture and the pre-training tasks to incorporate the multi-modal document information across text, layout, and image dimensions and allow the network to work with multi-page documents. We showcase the applicability of our pre-training framework on a variety of different real-world document tasks such as document classification, document information extraction, and document retrieval. We conduct exhaustive experiments to compare performance against different ablations of our framework and state-of-the-art baselines. We discuss the current limitations and next steps for our work.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2009.14457

Country:

Asia > Middle East > Jordan (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Gender prediction using limited Twitter Data

Burghoorn, Maaike, de Boer, Maaike H. T., Raaijmakers, Stephan

arXiv.org Artificial IntelligenceSep-29-2020

Transformer models have shown impressive performance on a variety of NLP tasks. Off-the-shelf, pre-trained models can be fine-tuned for specific NLP classification tasks, reducing the need for large amounts of additional training data. However, little research has addressed how much data is required to accurately fine-tune such pre-trained transformer models, and how much data is needed for accurate prediction. This paper explores the usability of BERT (a Transformer model for word embedding) for gender prediction on social media. Forensic applications include detecting gender obfuscation, e.g. males posing as females in chat rooms. A Dutch BERT model is fine-tuned on different samples of a Dutch Twitter dataset labeled for gender, varying in the number of tweets used per person. The results show that finetuning BERT contributes to good gender classification performance (80% F1) when finetuned on only 200 tweets per person. But when using just 20 tweets per person, the performance of our classifier deteriorates non-steeply (to 70% F1). These results show that even with relatively small amounts of data, BERT can be fine-tuned to accurately help predict the gender of Twitter users, and, consequently, that it is possible to determine gender on the basis of just a low volume of tweets. This opens up an operational perspective on the swift detection of gender.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2010.02005

Country: Europe > Netherlands > South Holland > The Hague (0.04)

Genre: Research Report > New Finding (0.69)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.32)

Add feedback

Towards a Natural Language Query Processing System

Montgomery, Chantal, Isah, Haruna, Zulkernine, Farhana

arXiv.org Artificial IntelligenceSep-25-2020

Tackling the information retrieval gap between non-technical database end-users and those with the knowledge of formal query languages has been an interesting area of data management and analytics research. The use of natural language interfaces to query information from databases offers the opportunity to bridge the communication challenges between end-users and systems that use formal query languages. Previous research efforts mainly focused on developing structured query interfaces to relational databases. However, the evolution of unstructured big data such as text, images, and video has exposed the limitations of traditional structured query interfaces. While the existing web search tools prove the popularity and usability of natural language query, they return complete documents and web pages instead of focused query responses and are not applicable to database systems. This paper reports our study on the design and development of a natural language query interface to a backend relational database. The novelty in the study lies in defining a graph database as a middle layer to store necessary metadata needed to transform a natural language query into structured query language that can be executed on backend databases. We implemented and evaluated our approach using a restaurant dataset. The translation results for some sample queries yielded a 90% accuracy rate.

artificial intelligence, information retrieval, natural language, (15 more...)

arXiv.org Artificial Intelligence

2009.12414

Country:

North America > Canada > Ontario > Kingston (0.14)
Asia > India > Maharashtra > Mumbai (0.05)
North America > Canada > Ontario > Toronto (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Consumer Products & Services > Restaurants (0.88)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Using Computer Programs and Search Problems for Teaching Theory of Computation

Communications of the ACMSep-24-2020, 04:10:51 GMT

The theory of computation is one of the crown jewels of the computer science curriculum. It stretches from the discovery of mathematical problems, such as the halting problem, that cannot be solved by computers, to the most celebrated open problem in computer science today: the P vs. NP question. Since the founding of our discipline by Church and Turing in the 1930s, the theory of computation has addressed some of the most fundamental questions about computers: What does it mean to compute the solution to a problem? Which problems can be solved by computers? Which problems can be solved efficiently, in theory and in practice?

artificial intelligence, information retrieval, natural language, (15 more...)

Communications of the ACM

Country:

North America > United States > Pennsylvania > Cumberland County > Carlisle (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Industry: Education > Curriculum (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Ranking for Individual and Group Fairness Simultaneously

Gorantla, Sruthi, Deshpande, Amit, Louis, Anand

arXiv.org Machine LearningSep-24-2020

Search and recommendation systems, such as search engines, recruiting tools, online marketplaces, news, and social media, output ranked lists of content, products, and sometimes, people. Credit ratings, standardized tests, risk assessments output only a score, but are also used implicitly for ranking. Bias in such ranking systems, especially among the top ranks, can worsen social and economic inequalities, polarize opinions, and reinforce stereotypes. On the other hand, a bias correction for minority groups can cause more harm if perceived as favoring group-fair outcomes over meritocracy. In this paper, we study a trade-off between individual fairness and group fairness in ranking. We define individual fairness based on how close the predicted rank of each item is to its true rank, and prove a lower bound on the trade-off achievable for simultaneous individual and group fairness in ranking. We give a fair ranking algorithm that takes any given ranking and outputs another ranking with simultaneous individual and group fairness guarantees comparable to the lower bound we prove. Our algorithm can be used to both pre-process training data as well as post-process the output of existing ranking algorithms. Our experimental results show that our algorithm performs better than the state-of-the-art fair learning to rank and fair post-processing baselines.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2010.06986

Country:

North America > United States > New York > New York County > New York City (0.05)
Asia > India > Karnataka > Bengaluru (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Industry:

Law (1.00)
Banking & Finance > Credit (0.49)
Education > Assessment & Standards > Student Performance (0.34)
Media > News (0.34)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.34)

Add feedback