AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Shape Fragments

Delva, Thomas, Dimou, Anastasia, Jakubowski, Maxime, Bussche, Jan Van den

arXiv.org Artificial IntelligenceDec-22-2021

In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set of shapes is used to extract a subgraph from an RDF graph, the so-called shape fragment. Our proposed mechanism fits in the framework of Linked Data Fragments. In this paper, (i) we define our extraction mechanism formally, building on recently proposed SHACL formalizations; (ii) we establish correctness properties, which relate shape fragments to notions of provenance for database queries; (iii) we compare shape fragments with SPARQL queries; (iv) we discuss implementation options; and (v) we present initial experiments demonstrating that shape fragments are a feasible new idea.

graph, neighborhood, shape fragment, (16 more...)

arXiv.org Artificial Intelligence

2112.11796

Country:

Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Europe > Switzerland > Geneva > Geneva (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Communications > Web > Semantic Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.67)

Add feedback

Weaviate is an open-source search engine powered by ML, vectors, graphs, and GraphQL

#artificialintelligenceDec-21-2021, 22:35:12 GMT

Bob van Luijt's career in technology started at age 15, building websites to help people sell toothbrushes online. Not many 15 year-olds do that. Apparently, this gave van Luijt enough of a head start to arrive at the confluence of technology trends today. Van Luijt went on to study arts but ended up working full time in technology anyway. In 2015, when Google introduced its RankBrain algorithm, the quality of search results jumped up.

luijt, vector, weaviate, (12 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.42)

Add feedback

What is Search Engine Optimization (SEO)? - Discover How to Make

#artificialintelligenceDec-18-2021, 23:10:21 GMT

An entrepreneur or freelancer has two main strategies to tap into when marketing online. Search Engine Optimization (SEO), which attempts to rank your website on search engines "organically", and Search Engine Marketing (SEM), which ranks your website in search results in exchange for money. Both strategies can be used to build a business successfully--but which one is right for you? A great way to market your business is to use search engines to help your customers find you online. You will need a sales-focused website (e.g., one aimed at creating contact rather than one aimed at assuring customers that you are who you say you are) if you use this strategy; otherwise, your efforts will likely be wasted. You have two ways to use search engines to help people find your website; search engine optimization (SEO) and search engine marketing (SEM).

keyphrase, seo, website, (13 more...)

#artificialintelligence

Country:

North America > United States > Georgia > Fulton County > Atlanta (0.05)
Asia > China (0.05)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus

Piktus, Aleksandra, Petroni, Fabio, Karpukhin, Vladimir, Okhonko, Dmytro, Broscheit, Samuel, Izacard, Gautier, Lewis, Patrick, Oğuz, Barlas, Grave, Edouard, Yih, Wen-tau, Riedel, Sebastian

arXiv.org Artificial IntelligenceDec-18-2021

In order to address the increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web scale knowledge, lack of structure, inconsistent quality, and noise. To this end, we propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot. We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus - as a knowledge source. In contrast to Wikipedia, Sphere is orders of magnitude larger and better reflects the full diversity of knowledge on the Internet. We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art retrieve-and-read system to match and even outperform Wikipedia-based models on several KILT tasks - even if we aggressively filter content that looks like Wikipedia. We also observe that while a single dense passage index over Wikipedia can outperform a sparse BM25 version, on Sphere this is not yet possible. To facilitate further research into this area, and minimise the community's reliance on proprietary black box search engines, we will share our indices, evaluation metrics and infrastructure.

knowledge source, phere, wikipedia, (15 more...)

arXiv.org Artificial Intelligence

2112.09924

Country:

Oceania > Australia (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Oceania > Solomon Islands > Isabel Province > Santa Isabel Island > Buala (0.04)
(7 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Free SEO Tools & Search Engine Optimization Software Application - Discover How to Make

#artificialintelligenceDec-17-2021, 05:40:44 GMT

Tools to help you develop and market your site. Firefox Extensions Web Tools If you need feedback or have any burning questions please ask in the neighborhood online forum so we can get them sorted out. Includes site map, glossary, and flying start checklist. Tips on how to purchase traffic from search engines. Discover how to track your success with natural SEO and pay per click ads.

free seo tool, search engine optimization software application, website, (2 more...)

#artificialintelligence

Genre: Instructional Material (0.39)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Add feedback

Best of Both Worlds: A Hybrid Approach for Multi-Hop Explanation with Declarative Facts

Storks, Shane, Gao, Qiaozi, Reganti, Aishwarya, Thattai, Govind

arXiv.org Artificial IntelligenceDec-17-2021

Language-enabled AI systems can answer complex, multi-hop questions to high accuracy, but supporting answers with evidence is a more challenging task which is important for the transparency and trustworthiness to users. Prior work in this area typically makes a trade-off between efficiency and accuracy; state-of-the-art deep neural network systems are too cumbersome to be useful in large-scale applications, while the fastest systems lack reliability. In this work, we integrate fast syntactic methods with powerful semantic methods for multi-hop explanation generation based on declarative facts. Our best system, which learns a lightweight operation to simulate multi-hop reasoning over pieces of evidence and fine-tunes language models to re-rank generated explanation chains, outperforms a purely syntactic baseline from prior work by up to 7% in gold explanation retrieval rate.

computational linguistic, explanation, explanation chain, (14 more...)

arXiv.org Artificial Intelligence

2201.0274

Country:

Europe > Italy > Tuscany > Florence (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Michigan (0.04)
(3 more...)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)
(2 more...)

Add feedback

Towards Unsupervised Dense Information Retrieval with Contrastive Learning

Izacard, Gautier, Caron, Mathilde, Hosseini, Lucas, Riedel, Sebastian, Bojanowski, Piotr, Joulin, Armand, Grave, Edouard

arXiv.org Artificial IntelligenceDec-16-2021

Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new domains or applications with no training data, and are often outperformed by term-frequency methods such as BM25 which are not supervised. Thus, a natural question is whether it is possible to train dense retrievers without supervision. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. More precisely, we show on the BEIR benchmark that our model outperforms BM25 on 11 out of 15 datasets. Furthermore, when a few thousands examples are available, we show that fine-tuning our model on these leads to strong improvements compared to BM25. Finally, when used as pre-training before fine-tuning on the MS-MARCO dataset, our technique obtains state-of-the-art results on the BEIR benchmark.

arxiv preprint arxiv, retrieval, retriever, (13 more...)

arXiv.org Artificial Intelligence

2112.09118

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking

Zerveas, George, Rekabsaz, Navid, Cohen, Daniel, Eickhoff, Carsten

arXiv.org Artificial IntelligenceDec-16-2021

We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost. It utilizes precomputed document representations extracted by a base dense retrieval method and involves training a model to jointly score a large set of retrieved candidate documents for each query, while potentially transforming on the fly the representation of each document in the context of the other candidates as well as the query itself. When scoring a document representation based on its similarity to a query, the model is thus aware of the representation of its "peer" documents. We show that our approach leads to substantial improvement in retrieval performance over the base method and over scoring candidate documents in isolation from one another, as in a pair-wise training setting. Crucially, unlike term-interaction rerankers based on BERT-like encoders, it incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method. Finally, concurrently considering a set of candidate documents for a given query enables additional valuable capabilities in retrieval, such as score calibration and mitigating societal biases in ranking.

query, representation, retrieval, (16 more...)

arXiv.org Artificial Intelligence

2112.08766

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Rhode Island > Providence County > Providence (0.04)
North America > Canada (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Text Mining Through Label Induction Grouping Algorithm Based Method

Saleem, Gulshan, Ahmed, Nisar, Qamar, Usman

arXiv.org Artificial IntelligenceDec-15-2021

The main focus of information retrieval methods is to provide accurate and efficient results which are cost-effective too. LINGO (Label Induction Grouping Algorithm) is a clustering algorithm that aims to provide search results in form of quality clusters but also has a few limitations. In this paper, our focus is based on achieving results that are more meaningful and improving the overall performance of the algorithm. LINGO works on two main steps; Cluster Label Induction by using Latent Semantic Indexing technique (LSI) and Cluster content discovery by using the Vector Space Model (VSM). As LINGO uses VSM in cluster content discovery, our task is to replace VSM with LSI for cluster content discovery and to analyze the feasibility of using LSI with Okapi BM25. The next task is to compare the results of a modified method with the LINGO original method. The research is applied to five different text-based data sets to get more reliable results for every method. Research results show that LINGO produces 40-50% better results when using LSI for content Discovery. From theoretical evidence using Okapi BM25 for scoring method in LSI (LSI+Okapi BM25) for cluster content discovery instead of VSM, also results in better clusters generation in terms of scalability and performance when compares to both VSM and LSI's Results.

algorithm, lingo, lsi, (11 more...)

arXiv.org Artificial Intelligence

2112.08486

Country:

Asia > Pakistan > Punjab > Lahore Division > Lahore (0.06)
North America > United States > Hawaii (0.04)
Asia > Taiwan (0.04)
Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

Tree-planting search engine Ecosia launches Shopping feature for refurbished and sustainable products

The Independent - TechDec-14-2021, 14:06:48 GMT

Ecosia, the search engine that uses its profits to plant trees, is launching a new shopping feature. The company, which was founded in 2009, donates its expendable funds to tree-planting organizations. It claims to have planted 130 million trees across 30 countries around the world. Ecosia Shopping recommends products on Amazon, Kelkoo and Idealo, and other shopping partners that have been sustainably made, are reused, or have been refurbished. The feature is available now in the UK, Germany and France.

amazon, ecosia, panel event dogecoin price surge, (11 more...)

The Independent - Tech

Country:

Europe > United Kingdom (0.26)
Europe > Germany (0.25)
Europe > France (0.25)

Industry:

Banking & Finance > Trading (0.43)
Energy > Renewable (0.40)

Technology:

Information Technology > Information Management > Search (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)

Add feedback