Information Retrieval
The Top 10 Search Engines Today
In SEO, the focus is so often on Google. 'How do I rank higher in the Google SERPs?', or'How can I get more rich snippets on Google?' Of course, Google is one of the most popular search engines, but it's certainly not the only one. Different search engines have different audience demographics and different pros and cons, so when you're optimizing your website, you don't want to miss out on a significant share of a certain market. In this article, you will find a complete list of all top internet search engines, their pros and cons, and whether Google really is the most popular. We made a list of the top ten search engines widely used today.
Predicting Document Coverage for Relation Extraction
Singhania, Sneha, Razniewski, Simon, Weikum, Gerhard
This paper presents a new task of predicting the coverage of a text document for relation extraction (RE): does the document contain many relational tuples for a given entity? Coverage predictions are useful in selecting the best documents for knowledge base construction with large input corpora. To study this problem, we present a dataset of 31,366 diverse documents for 520 entities. We analyze the correlation of document coverage with features like length, entity mention frequency, Alexa rank, language complexity and information retrieval scores. Each of these features has only moderate predictive power. We employ methods combining features with statistical models like TF-IDF and language models like BERT. The model combining features and BERT, HERB, achieves an F1 score of up to 46%. We demonstrate the utility of coverage predictions on two use cases: KB construction and claim refutation.
Recommending Multiple Positive Citations for Manuscript via Content-Dependent Modeling and Multi-Positive Triplet
Considering the rapidly increasing number of academic papers, searching for and citing appropriate references has become a non-trial task during the wiring of papers. Recommending a handful of candidate papers to a manuscript before publication could ease the burden of the authors, and help the reviewers to check the completeness of the cited resources. Conventional approaches on citation recommendation generally consider recommending one ground-truth citation for a query context from an input manuscript, but lack of consideration on co-citation recommendations. However, a piece of context often needs to be supported by two or more co-citation pairs. Here, we propose a novel scientific paper modeling for citation recommendations, namely Multi-Positive BERT Model for Citation Recommendation (MP-BERT4CR), complied with a series of Multi-Positive Triplet objectives to recommend multiple positive citations for a query context. The proposed approach has the following advantages: First, the proposed multi-positive objectives are effective to recommend multiple positive candidates. Second, we adopt noise distributions which are built based on the historical co-citation frequencies, so that MP-BERT4CR is not only effective on recommending high-frequent co-citation pairs; but also the performances on retrieving the low-frequent ones are significantly improved. Third, we propose a dynamic context sampling strategy which captures the ``macro-scoped'' citing intents from a manuscript and empowers the citation embeddings to be content-dependent, which allow the algorithm to further improve the performances. Single and multiple positive recommendation experiments testified that MP-BERT4CR delivered significant improvements. In addition, MP-BERT4CR are also effective in retrieving the full list of co-citations, and historically low-frequent co-citation pairs compared with the prior works.
Parallel Logic Programming: A Sequel
Dovier, Agostino, Formisano, Andrea, Gupta, Gopal, Hermenegildo, Manuel V., Pontelli, Enrico, Rocha, Ricardo
Multi-core and highly-connected architectures have become ubiquitous, and this has brought renewed interest in language-based approaches to the exploitation of parallelism. Since its inception, logic programming has been recognized as a programming paradigm with great potential for automated exploitation of parallelism. The comprehensive survey of the first twenty years of research in parallel logic programming, published in 2001, has served since as a fundamental reference to researchers and developers. The contents are quite valid today, but at the same time the field has continued evolving at a fast pace in the years that have followed. Many of these achievements and ongoing research have been driven by the rapid pace of technological innovation, that has led to advances such as very large clusters, the wide diffusion of multi-core processors, the game-changing role of general-purpose graphic processing units, and the ubiquitous adoption of cloud computing. This has been paralleled by significant advances within logic programming, such as tabling, more powerful static analysis and verification, the rapid growth of Answer Set Programming, and in general, more mature implementations and systems. This survey provides a review of the research in parallel logic programming covering the period since 2001, thus providing a natural continuation of the previous survey. The goal of the survey is to serve not only as a reference for researchers and developers of logic programming systems, but also as engaging reading for anyone interested in logic and as a useful source for researchers in parallel systems outside logic programming. Under consideration in Theory and Practice of Logic Programming (TPLP).
Improve your search engine optimisation strategy with brand mentions
Over the past few decades, the role of search engine optimisation (SEO) strategies in day-to-day business marketing has increased exponentially, both in importance and application. In modern times, businesses have found it essential to invest in SEO due to the large part search engines play in connecting businesses with their customers. Strong organic online traffic …
How to build a custom NER Model?
Named Entity Recognition (NER) is a Natural Language Processing Technique which is used to extract proper entities in a given text content and classify the extracted entites under pre-defined classes. To put in simple words, NER is a technique used to extract entities such as person names, location names, company names, etc from a given text. NER has its own importance when it comes to information retrieval. Naturally after reading a particular text, Humans can recognize some common entities like person name, date and so on. But to do the same with the aid of computers, we have to help the computer learn and do the task for us. To do so, we can avail services of Natural Language Processing (NLP) and Machine Learning (ML).
FACOS: Finding API Relevant Contents on Stack Overflow with Semantic and Syntactic Analysis
Luong, Kien, Hadi, Mohammad, Thung, Ferdian, Fard, Fatemeh, Lo, David
Collecting API examples, usages, and mentions relevant to a specific API method over discussions on venues such as Stack Overflow is not a trivial problem. It requires efforts to correctly recognize whether the discussion refers to the API method that developers/tools are searching for. The content of the thread, which consists of both text paragraphs describing the involvement of the API method in the discussion and the code snippets containing the API invocation, may refer to the given API method. Leveraging this observation, we develop FACOS, a context-specific algorithm to capture the semantic and syntactic information of the paragraphs and code snippets in a discussion. FACOS combines a syntactic word-based score with a score from a predictive model fine-tuned from CodeBERT. FACOS beats the state-of-the-art approach by 13.9% in terms of F1-score.
Cross-language Information Retrieval
Galuščáková, Petra, Oard, Douglas W., Nair, Suraj
Two key assumptions shape the usual view of ranked retrieval: (1) that the searcher can choose words for their query that might appear in the documents that they wish to see, and (2) that ranking retrieved documents will suffice because the searcher will be able to recognize those which they wished to find. When the documents to be searched are in a language not known by the searcher, neither assumption is true. In such cases, Cross-Language Information Retrieval (CLIR) is needed. This chapter reviews the state of the art for cross-language information retrieval and outlines some open research questions.
Recent Advances in Automated Question Answering In Biomedical Domain
The objective of automated Question Answering (QA) systems is to provide answers to user queries in a time efficient manner. The answers are usually found in either databases (or knowledge bases) or a collection of documents commonly referred to as the corpus. In the past few decades there has been a proliferation of acquisition of knowledge and consequently there has been an exponential growth in new scientific articles in the field of biomedicine. Therefore, it has become difficult to keep track of all the information in the domain, even for domain experts. With the improvements in commercial search engines, users can type in their queries and get a small set of documents most relevant for answering their query, as well as relevant snippets from the documents in some cases. However, it may be still tedious and time consuming to manually look for the required information or answers. This has necessitated the development of efficient QA systems which aim to find exact and precise answers to user provided natural language questions in the domain of biomedicine. In this paper, we introduce the basic methodologies used for developing general domain QA systems, followed by a thorough investigation of different aspects of biomedical QA systems, including benchmark datasets and several proposed approaches, both using structured databases and collection of texts. We also explore the limitations of current systems and explore potential avenues for further advancement.