AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Comparison of Approaches for Imbalanced Classification Problems in the Context of Retrieving Relevant Documents for an Analysis

Wankmüller, Sandra

arXiv.org Machine LearningMay-3-2022

One of the first steps in many text-based social science studies is to retrieve documents that are relevant for the analysis from large corpora of otherwise irrelevant documents. The conventional approach in social science to address this retrieval task is to apply a set of keywords and to consider those documents to be relevant that contain at least one of the keywords. But the application of incomplete keyword lists risks drawing biased inferences. More complex and costly methods such as query expansion techniques, topic model-based classification rules, and active as well as passive supervised learning could have the potential to more accurately separate relevant from irrelevant documents and thereby reduce the potential size of bias. Yet, whether applying these more expensive approaches increases retrieval performance compared to keyword lists at all, and if so, by how much, is unclear as a comparison of these approaches is lacking. This study closes this gap by comparing these methods across three retrieval tasks associated with a data set of German tweets (Linder, 2017), the Social Bias Inference Corpus (SBIC) (Sap et al., 2020), and the Reuters-21578 corpus (Lewis, 1997). Results show that query expansion techniques and topic model-based classification rules in most studied settings tend to decrease rather than increase retrieval performance. Active supervised learning, however, if applied on a not too small set of labeled training instances (e.g.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2205.016

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East (0.92)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Voting & Elections (1.00)
Government > Immigration & Customs (1.00)
Energy > Oil & Gas (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
(2 more...)

Add feedback

How Forte Transforms the Building of NLP Solution with PyTorch into Assembly Lines

#artificialintelligenceMay-1-2022, 00:56:08 GMT

Forte introduces "DataPack", a standardized data structure for unstructured data, distilling good software engineering practices such as reusability, extensibility, and flexibility into PyTorch-based ML solutions. Machine Learning (ML) technologies are now widely used in many day-to-day applications. For example, the systems behind personal assistants like Siri or Alexa are grounded in complex ML technologies, such as Natural Language Processing, Computer Vision, and many more. While the consumer interface of Machine Learning systems may appear simple, the systems behind the scene can be much more complex than they first appear. For example, building an intelligent medical information retrieval system requires one to stitch together a diverse set of techniques.

datapack, pipeline, workflow, (15 more...)

#artificialintelligence

Industry: Health & Medicine (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.63)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.56)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.56)

Add feedback

Counterfactual Learning To Rank for Utility-Maximizing Query Autocompletion

Block, Adam, Kidambi, Rahul, Hill, Daniel N., Joachims, Thorsten, Dhillon, Inderjit S.

arXiv.org Machine LearningApr-22-2022

Conventional methods for query autocompletion aim to predict which completed query a user will select from a list. A shortcoming of this approach is that users often do not know which query will provide the best retrieval performance on the current information retrieval system, meaning that any query autocompletion methods trained to mimic user behavior can lead to suboptimal query suggestions. To overcome this limitation, we propose a new approach that explicitly optimizes the query suggestions for downstream retrieval performance. We formulate this as a problem of ranking a set of rankings, where each query suggestion is represented by the downstream item ranking it produces. We then present a learning method that ranks query suggestions by the quality of their item rankings. The algorithm is based on a counterfactual learning approach that is able to leverage feedback on the items (e.g., clicks, purchases) to evaluate query suggestions through an unbiased estimator, thus avoiding the assumption that users write or select optimal queries. We establish theoretical support for the proposed approach and provide learning-theoretic guarantees. We also present empirical results on publicly available datasets, and demonstrate real-world applicability using data from an online shopping store.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Machine Learning

doi: 10.1145/3477495.3531958

2204.10936

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(9 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Retail > Online (0.48)
Information Technology > Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

WriterZen Review - Keyword Research & AI Copywriting Tool

#artificialintelligenceApr-17-2022, 03:10:21 GMT

Are you overwhelmed at all the things you need to accomplish to rank in search engines? WriterZen allows you to plan a strategy from topic discovery to keyword research, all the way to writing the content and checking for plagiarism. In this WriterZen review, you'll see what WriterZen is, how it works, and its features, and by the end of this article, you should know if WriterZenis right for you. WriterZen is a complete SEO package that can help you map out a strategy for your SEO. Its set of tools was designed to help you write articles that rank on any search engine, be it Google, Yahoo, Bing, or YouTube.

keyword, plagiarism, writerzen, (14 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.58)

Add feedback

Distributed Reconstruction of Noisy Pooled Data

Hahn-Klimroth, Max, Kaaser, Dominik

arXiv.org Machine LearningApr-14-2022

In the pooled data problem we are given a set of $n$ agents, each of which holds a hidden state bit, either $0$ or $1$. A querying procedure returns for a query set the sum of the states of the queried agents. The goal is to reconstruct the states using as few queries as possible. In this paper we consider two noise models for the pooled data problem. In the noisy channel model, the result for each agent flips with a certain probability. In the noisy query model, each query result is subject to random Gaussian noise. Our results are twofold. First, we present and analyze for both error models a simple and efficient distributed algorithm that reconstructs the initial states in a greedy fashion. Our novel analysis pins down the range of error probabilities and distributions for which our algorithm reconstructs the exact initial states with high probability. Secondly, we present simulation results of our algorithm and compare its performance with approximate message passing (AMP) algorithms that are conjectured to be optimal in a number of related problems.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

2204.07491

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Arnsberg Region > Dortmund (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
Health & Medicine > Therapeutic Area > Immunology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.90)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

How to Increase Your Google Page Speed Score

#artificialintelligenceApr-10-2022, 23:35:06 GMT

How many times has your website taken a while to load? How many times have you said, "Meh. Your Google page speed score and your core web vitals are more important than ever. Even if you're making sales right now, it's only a matter of time before your competition decides it's better to be the hare and not the tortoise. All of the great content, social media promotion, and keyword research in the world won't matter if your website is a slug on a rainy day.

plugin, speed score, website, (12 more...)

#artificialintelligence

Country:

Oceania > New Zealand (0.05)
Oceania > Australia (0.05)
North America > United States (0.05)

Technology:

Information Technology > Information Management > Search (0.86)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.41)

Add feedback

Search Engines are Missing Infected Sites, Putting Businesses At Risk

#artificialintelligenceApr-6-2022, 22:06:26 GMT

We've all come across warnings when visiting suspicious websites. Your browser or search engine might even block you from entering, displaying a message that this site may harm your device. But what if the site you're trying to visit is not flagged as malicious? According to SiteLock's 2022 Security Report, 92% of infected websites are not blacklisted by search engines. This means that businesses and individuals are vulnerable to attack when they visit these sites.

dataset, malware, missing infected site, (6 more...)

#artificialintelligence

Industry:

Information Technology > Security & Privacy (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.38)

Technology:

Information Technology > Information Management > Search (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Breaking Down and Interpreting Human Language -- NLP

#artificialintelligenceApr-6-2022, 00:10:09 GMT

From translation software, chatbots, spam filters, and search engines, to grammar correction software, voice assistants, and social media monitoring tools, NLP is at the core of tools in our everyday life. NLP -- Natural Language Processing trying to make machines that can think and act like humans (Don't worry they won't be Human as humans are). It is used to understand human behavior by feeding it with syntax, language, accents, and many other forms of sensory data that human captures. Algorithms then convert this data, rather say transforms this data in the language that the machine understands, thus making the machine learn on a certain rule to perform actions and solve problems. So How Does NLP Work?

interpreting human language, nlp, training data, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.44)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.42)

Add feedback

8 Best SQL Courses on Coursera

#artificialintelligenceApr-5-2022, 04:51:53 GMT

If you want to gain the skills necessary to query big data with modern distributed SQL engines, then this specialization is for you. The best part of this course is that it will teach you a newer breed of SQL engine: distributed query engines Hive and Impala. Hive and Impala are open-source SQL engines capable of querying enormous datasets. Another advantage of this specialization program is that this program provides excellent preparation for the Cloudera Certified Associate (CCA) Data Analyst certification exam. This Specialization program consists of 3 Courses.

best sql course, coursera, specialization program, (11 more...)

#artificialintelligence

Country: North America > United States > Michigan (0.05)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.61)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.75)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.61)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

The Download: Chatbots could one day replace search engines. Here's why that's a terrible idea.

MIT Technology ReviewMar-30-2022, 12:46:52 GMT

The world's oceans are amazing carbon sponges, capturing a quarter of human-produced carbon dioxide when surface waters react with the greenhouse gas in the air or marine organisms gobble it up as they grow. Some research groups and start-ups want to help accelerate this natural process by adding certain minerals to the oceans that could help them lock up even more carbon and slow climate change. The idea has attracted a lot of excitement and investment. However, a number of recent studies suggest that some of these approaches may not be as effective as scientists had hoped. That's disappointing news, because the world may need to suck up an additional 10 billion tons of carbon annually by midcentury to limit warming to 2 C, according to a recent report.

day replace search engine, download, terrible idea, (3 more...)

MIT Technology Review

Genre: Research Report (1.00)

Industry: Materials > Chemicals > Industrial Gases (0.95)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback