AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

The Search Engine Showdown is Far from Over

#artificialintelligenceFeb-12-2023, 09:36:08 GMT

Back in the 1990s, the search engine category was a hot space. Yahoo, Netscape, AOL, Ask Jeeves, AltaVista, Google search, MSN and others were vying to capture the dominant position. With time, they all fizzled out. Post 2000 was the era of Google Search, the undisputed winner of the space until quite recently. The tide is turning and the crown of Google Search is under threat.

bing, microsoft, search engine, (11 more...)

#artificialintelligence

Industry: Information Technology > Services (0.94)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.52)
(2 more...)

Add feedback

Training a Named Entity Recognition Model Without Data

#artificialintelligenceFeb-12-2023, 01:05:18 GMT

Named Entity Recognition(NER) is the task of recognizing entity names, such as person name, locations, and organizations, within a text. This task serves as a fundamental module for various NLP applications including chatbots, search engines, and translation systems. We can find NER datasets for generic entities easily, but obtaining data for specific domains can be challenging. Labeling NER data is more difficult than simple text classification, making it challenging to create large-scale domain-specific NER datasets. In this post, I will demonstrate how to train NER model without any labeled data.

dataset, entity name, ner dataset, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)

Add feedback

Will A.I. Kill the Internet?

SlateFeb-11-2023, 10:00:00 GMT

This week, Felix Salmon, Emily Peck, and Elizabeth Spiers discuss Microsoft's attempt to break into artificial intelligence assisted search with a revamp of their Bing search engine. They also talk about record high profits for oil companies and Bed Bath & Beyond's financial shenanigans.

internet

Slate

Technology:

Information Technology > Information Management > Search (0.78)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.78)
Information Technology > Communications > Mobile (0.58)

Add feedback

MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

Yang, Xianjun, Wilson, Stephen, Petzold, Linda

arXiv.org Artificial IntelligenceFeb-10-2023

In this paper, we present a novel approach to knowledge extraction and retrieval using Natural Language Processing (NLP) techniques for material science. Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials and make it easily accessible to the broader community. The proposed method leverages NLP techniques such as entity recognition and document classification to extract relevant information and build an extensive knowledge base, from a collection of 9.5 Million publications. The resulting knowledge base is integrated into a search engine, which enables users to search for information about specific materials, properties, and experiments with greater precision than traditional search engines like Google. We hope our results can enable material scientists quickly locate desired experimental procedures, compare their differences, and even inspire them to design new experiments.

artificial intelligence, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

2302.05597

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
Europe > Spain > Galicia > Madrid (0.04)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Fast Gumbel-Max Sketch and its Applications

Zhang, Yuanming, Wang, Pinghui, Qi, Yiyan, Cheng, Kuankuan, Zhao, Junzhou, Tian, Guangjian, Guan, Xiaohong

arXiv.org Artificial IntelligenceFeb-10-2023

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight element $i$, and then samples the element $i$ with the largest value of $g_i+\ln v_i$. Recently, applications including similarity estimation and weighted cardinality estimation require to generate $k$ independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large $k$ (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, which reduces the time complexity from $O(kn^+)$ to $O(k \ln k + n^+)$, where $n^+$ is the number of positive elements in the vector of interest. FastGM stops the procedure of Gumbel random variables computing for many elements, especially for those with small weights. We perform experiments on a variety of real-world datasets and the experimental results demonstrate that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy or incurring additional expenses.

data mining, information retrieval, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2302.05176

Country:

Asia > China > Shaanxi Province > Xi'an (0.05)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
(5 more...)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages

Singh, Bhavyajeet, Kandru, Pavan, Sharma, Anubhav, Varma, Vasudeva

arXiv.org Artificial IntelligenceFeb-9-2023

Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.

extraction, information retrieval, natural language, (12 more...)

arXiv.org Artificial Intelligence

2302.0479

Country:

Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
North America > United States > New York (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.55)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

Neural Approaches to Multilingual Information Retrieval

Lawrie, Dawn, Yang, Eugene, Oard, Douglas W., Mayfield, James

arXiv.org Artificial IntelligenceFeb-9-2023

Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether advances in neural document translation and pretrained multilingual neural language models enable improvements in the state of the art over earlier MLIR techniques. The results show that although combining neural document translation with neural ranking yields the best Mean Average Precision (MAP), 98% of that MAP score can be achieved with an 84% reduction in indexing time by using a pretrained XLM-R multilingual language model to index documents in their native language, and that 2% difference in effectiveness is not statistically significant. Key to achieving these results for MLIR is to fine-tune XLM-R using mixed-language batches from neural translations of MS MARCO passages.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2209.01335

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > Dominican Republic (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.88)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.72)

Add feedback

Query Processing on Tensor Computation Runtimes

He, Dong, Nakandala, Supun, Banda, Dalitso, Sen, Rathijit, Saur, Karla, Park, Kwanghyun, Curino, Carlo, Camacho-Rodríguez, Jesús, Karanasos, Konstantinos, Interlandi, Matteo

arXiv.org Artificial IntelligenceFeb-9-2023

The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. We design, build, and evaluate Tensor Query Processor (TQP): TQP transforms SQL queries into tensor programs and executes them on TCRs. TQP is able to run the full TPC-H benchmark by implementing novel algorithms for relational operators on the tensor routines. At the same time, TQP can support various hardware while only requiring a fraction of the usual development effort. Experiments show that TQP can improve query execution time by up to 10$\times$ over specialized CPU- and GPU-only systems. Finally, TQP can accelerate queries mixing ML predictions and SQL end-to-end, and deliver up to 9$\times$ speedup over CPU baselines.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.14778/3551793.3551833

2203.01877

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Oklahoma > Payne County > Cushing (0.04)
(5 more...)

Genre: Research Report (0.50)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

From Traditional Adaptive Data Caching to Adaptive Context Caching: A Survey

Weerasinghe, Shakthi, Zaslavsky, Arkady, Loke, Seng W., Hassani, Alireza, Abken, Amin, Medvedev, Alexey

arXiv.org Artificial IntelligenceFeb-9-2023

Context information is in demand more than ever with the rapid increase in the number of context-aware Internet of Things applications developed worldwide. Research in context and context-awareness is being conducted to broaden its applicability in light of many practical and technical challenges. One of the challenges is improving performance when responding to a large number of context queries. Context Management Platforms that infer and deliver context to applications measure this problem using Quality of Service (QoS) parameters. Although caching is a proven way to improve QoS, transiency of context and features such as variability and heterogeneity of context queries pose an additional real-time cost management problem. This paper presents a critical survey of the state-of-the-art in adaptive data caching with the objective of developing a body of knowledge in cost- and performance-efficient adaptive caching strategies. We comprehensively survey a large number of research publications and evaluate, compare, and contrast different techniques, policies, approaches, and schemes in adaptive caching. Our critical analysis is motivated by the focus on adaptively caching context as a core research problem. A formal definition for adaptive context caching is then proposed, followed by identified features and requirements of a well-designed, objective optimal adaptive context caching strategy.

data mining, machine learning, reinforcement learning, (26 more...)

arXiv.org Artificial Intelligence

2211.11259

Country:

Oceania > Australia > New South Wales > Sydney (0.14)
North America > United States > Missouri > Jackson County > Kansas City (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
(53 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Telecommunications (0.92)
Information Technology > Services (0.92)
Transportation > Ground > Road (0.67)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
(8 more...)

Add feedback

Microsoft's Bing search engine and Edge browser to use AI in challenge to Google

The Japan TimesFeb-8-2023, 03:04:42 GMT

REDMOND, Washington – Microsoft is revamping its Bing search engine and Edge browser with artificial intelligence, the company said Tuesday, signaling its ambition to retake the lead in consumer technology markets where it has fallen behind. The maker of the Windows operating system is staking its future on AI through billions of dollars of investment as it directly challenges Alphabet's Google, which for years has outpaced Microsoft in search and browser technology. Now, Microsoft is rolling out an intelligent chatbot to live alongside Bing's search results, putting AI that can summarize web pages, synthesize disparate sources, even compose emails and translate them into more consumers' hands. Microsoft expects every percentage point of share it gains will bring in another $2 billion in search advertising revenue. This could be due to a conflict with your ad-blocking or security software.

bing search engine, microsoft, search engine and edge browser, (2 more...)

The Japan Times

Country: North America > United States > Washington > King County > Redmond (0.29)

Industry:

Information Technology (1.00)
Marketing (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.65)

Add feedback