Information Retrieval
Shape Fragments
Delva, Thomas, Dimou, Anastasia, Jakubowski, Maxime, Bussche, Jan Van den
In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set of shapes is used to extract a subgraph from an RDF graph, the so-called shape fragment. Our proposed mechanism fits in the framework of Linked Data Fragments. In this paper, (i) we define our extraction mechanism formally, building on recently proposed SHACL formalizations; (ii) we establish correctness properties, which relate shape fragments to notions of provenance for database queries; (iii) we compare shape fragments with SPARQL queries; (iv) we discuss implementation options; and (v) we present initial experiments demonstrating that shape fragments are a feasible new idea.
Weaviate is an open-source search engine powered by ML, vectors, graphs, and GraphQL
Bob van Luijt's career in technology started at age 15, building websites to help people sell toothbrushes online. Not many 15 year-olds do that. Apparently, this gave van Luijt enough of a head start to arrive at the confluence of technology trends today. Van Luijt went on to study arts but ended up working full time in technology anyway. In 2015, when Google introduced its RankBrain algorithm, the quality of search results jumped up.
What is Search Engine Optimization (SEO)? - Discover How to Make
An entrepreneur or freelancer has two main strategies to tap into when marketing online. Search Engine Optimization (SEO), which attempts to rank your website on search engines "organically", and Search Engine Marketing (SEM), which ranks your website in search results in exchange for money. Both strategies can be used to build a business successfully--but which one is right for you? A great way to market your business is to use search engines to help your customers find you online. You will need a sales-focused website (e.g., one aimed at creating contact rather than one aimed at assuring customers that you are who you say you are) if you use this strategy; otherwise, your efforts will likely be wasted. You have two ways to use search engines to help people find your website; search engine optimization (SEO) and search engine marketing (SEM).
The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus
Piktus, Aleksandra, Petroni, Fabio, Karpukhin, Vladimir, Okhonko, Dmytro, Broscheit, Samuel, Izacard, Gautier, Lewis, Patrick, Oฤuz, Barlas, Grave, Edouard, Yih, Wen-tau, Riedel, Sebastian
In order to address the increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web scale knowledge, lack of structure, inconsistent quality, and noise. To this end, we propose a new setup for evaluating existing KI-NLP tasks in which we generalize the background corpus to a universal web snapshot. We repurpose KILT, a standard KI-NLP benchmark initially developed for Wikipedia, and ask systems to use a subset of CCNet - the Sphere corpus - as a knowledge source. In contrast to Wikipedia, Sphere is orders of magnitude larger and better reflects the full diversity of knowledge on the Internet. We find that despite potential gaps of coverage, challenges of scale, lack of structure and lower quality, retrieval from Sphere enables a state-of-the-art retrieve-and-read system to match and even outperform Wikipedia-based models on several KILT tasks - even if we aggressively filter content that looks like Wikipedia. We also observe that while a single dense passage index over Wikipedia can outperform a sparse BM25 version, on Sphere this is not yet possible. To facilitate further research into this area, and minimise the community's reliance on proprietary black box search engines, we will share our indices, evaluation metrics and infrastructure.
Free SEO Tools & Search Engine Optimization Software Application - Discover How to Make
Tools to help you develop and market your site. Firefox Extensions Web Tools If you need feedback or have any burning questions please ask in the neighborhood online forum so we can get them sorted out. Includes site map, glossary, and flying start checklist. Tips on how to purchase traffic from search engines. Discover how to track your success with natural SEO and pay per click ads.
Best of Both Worlds: A Hybrid Approach for Multi-Hop Explanation with Declarative Facts
Storks, Shane, Gao, Qiaozi, Reganti, Aishwarya, Thattai, Govind
Language-enabled AI systems can answer complex, multi-hop questions to high accuracy, but supporting answers with evidence is a more challenging task which is important for the transparency and trustworthiness to users. Prior work in this area typically makes a trade-off between efficiency and accuracy; state-of-the-art deep neural network systems are too cumbersome to be useful in large-scale applications, while the fastest systems lack reliability. In this work, we integrate fast syntactic methods with powerful semantic methods for multi-hop explanation generation based on declarative facts. Our best system, which learns a lightweight operation to simulate multi-hop reasoning over pieces of evidence and fine-tunes language models to re-rank generated explanation chains, outperforms a purely syntactic baseline from prior work by up to 7% in gold explanation retrieval rate.
Towards Unsupervised Dense Information Retrieval with Contrastive Learning
Izacard, Gautier, Caron, Mathilde, Hosseini, Lucas, Riedel, Sebastian, Bojanowski, Piotr, Joulin, Armand, Grave, Edouard
Information retrieval is an important component in natural language processing, for knowledge intensive tasks such as question answering and fact checking. Recently, information retrieval has seen the emergence of dense retrievers, based on neural networks, as an alternative to classical sparse methods based on term-frequency. These models have obtained state-of-the-art results on datasets and tasks where large training sets are available. However, they do not transfer well to new domains or applications with no training data, and are often outperformed by term-frequency methods such as BM25 which are not supervised. Thus, a natural question is whether it is possible to train dense retrievers without supervision. In this work, we explore the limits of contrastive learning as a way to train unsupervised dense retrievers, and show that it leads to strong retrieval performance. More precisely, we show on the BEIR benchmark that our model outperforms BM25 on 11 out of 15 datasets. Furthermore, when a few thousands examples are available, we show that fine-tuning our model on these leads to strong improvements compared to BM25. Finally, when used as pre-training before fine-tuning on the MS-MARCO dataset, our technique obtains state-of-the-art results on the BEIR benchmark.
CODER: An efficient framework for improving retrieval through COntextualized Document Embedding Reranking
Zerveas, George, Rekabsaz, Navid, Cohen, Daniel, Eickhoff, Carsten
We present a framework for improving the performance of a wide class of retrieval models at minimal computational cost. It utilizes precomputed document representations extracted by a base dense retrieval method and involves training a model to jointly score a large set of retrieved candidate documents for each query, while potentially transforming on the fly the representation of each document in the context of the other candidates as well as the query itself. When scoring a document representation based on its similarity to a query, the model is thus aware of the representation of its "peer" documents. We show that our approach leads to substantial improvement in retrieval performance over the base method and over scoring candidate documents in isolation from one another, as in a pair-wise training setting. Crucially, unlike term-interaction rerankers based on BERT-like encoders, it incurs a negligible computational overhead on top of any first-stage method at run time, allowing it to be easily combined with any state-of-the-art dense retrieval method. Finally, concurrently considering a set of candidate documents for a given query enables additional valuable capabilities in retrieval, such as score calibration and mitigating societal biases in ranking.
Text Mining Through Label Induction Grouping Algorithm Based Method
Saleem, Gulshan, Ahmed, Nisar, Qamar, Usman
The main focus of information retrieval methods is to provide accurate and efficient results which are cost-effective too. LINGO (Label Induction Grouping Algorithm) is a clustering algorithm that aims to provide search results in form of quality clusters but also has a few limitations. In this paper, our focus is based on achieving results that are more meaningful and improving the overall performance of the algorithm. LINGO works on two main steps; Cluster Label Induction by using Latent Semantic Indexing technique (LSI) and Cluster content discovery by using the Vector Space Model (VSM). As LINGO uses VSM in cluster content discovery, our task is to replace VSM with LSI for cluster content discovery and to analyze the feasibility of using LSI with Okapi BM25. The next task is to compare the results of a modified method with the LINGO original method. The research is applied to five different text-based data sets to get more reliable results for every method. Research results show that LINGO produces 40-50% better results when using LSI for content Discovery. From theoretical evidence using Okapi BM25 for scoring method in LSI (LSI+Okapi BM25) for cluster content discovery instead of VSM, also results in better clusters generation in terms of scalability and performance when compares to both VSM and LSI's Results.
Tree-planting search engine Ecosia launches Shopping feature for refurbished and sustainable products
Ecosia, the search engine that uses its profits to plant trees, is launching a new shopping feature. The company, which was founded in 2009, donates its expendable funds to tree-planting organizations. It claims to have planted 130 million trees across 30 countries around the world. Ecosia Shopping recommends products on Amazon, Kelkoo and Idealo, and other shopping partners that have been sustainably made, are reused, or have been refurbished. The feature is available now in the UK, Germany and France.