AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Let's build a Full-Text Search engine - Artem Krylysov

#artificialintelligenceJul-28-2020, 21:32:56 GMT

Full-Text Search is one of those tools people use every day without realizing it. If you ever googled "golang coverage report" or tried to find "indoor wireless camera" on an e-commerce website, you used some kind of full-text search. Full-Text Search (FTS) is a technique for searching text in a collection of documents. A document can refer to a web page, a newspaper article, an email message, or any structured text. Today we are going to build our own FTS engine.

artificial intelligence, information retrieval, natural language, (16 more...)

#artificialintelligence

Industry: Information Technology > Services > e-Commerce Services (0.55)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.41)

Add feedback

COVID-19 Knowledge Graph: Accelerating Information Retrieval and Discovery for Scientific Literature

Wise, Colby, Ioannidis, Vassilis N., Calvo, Miguel Romero, Song, Xiang, Price, George, Kulkarni, Ninad, Brand, Ryan, Bhatia, Parminder, Karypis, George

arXiv.org Artificial IntelligenceJul-24-2020

The coronavirus disease (COVID-19) has claimed the lives of over 350,000 people and infected more than 6 million people worldwide. Several search engines have surfaced to provide researchers with additional tools to find and retrieve information from the rapidly growing corpora on COVID-19. These engines lack extraction and visualization tools necessary to retrieve and interpret complex relations inherent to scientific literature. Moreover, because these engines mainly rely upon semantic information, their ability to capture complex global relationships across documents is limited, which reduces the quality of similarity-based article recommendations for users. In this work, we present the COVID-19 Knowledge Graph (CKG), a heterogeneous graph for extracting and visualizing complex relationships between COVID-19 scientific articles. The CKG combines semantic information with document topological information for the application of similar document retrieval. The CKG is constructed using the latent schema of the data, and then enriched with biomedical entity information extracted from the unstructured text of articles using scalable AWS technologies to form relations in the graph. Finally, we propose a document similarity engine that leverages low-dimensional graph embeddings from the CKG with semantic embeddings for similar article retrieval. Analysis demonstrates the quality of relationships in the CKG and shows that it can be used to uncover meaningful information in COVID-19 scientific articles. The CKG helps power www.cord19.aws and is publicly available.

information retrieval, natural language, relation, (14 more...)

arXiv.org Artificial Intelligence

2007.12731

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > France (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Search Engine Marketing (SEM) - TimesPost

#artificialintelligenceJul-23-2020, 11:56:04 GMT

Search Engine Marketing is a digital marketing strategy that helps to improve the visibility of the site in SERP. It is very important to rank in Search Engine Result Pages, and SEM helps us to rank in the list easily. To boost the traffic on the site, we need to implement some effective strategies, and SEM is one of the best marketing tools that helps to steadfast the traffic; it is a cost-effective way to get instant visibility and boost the website. To get an effective business presence on the internet, you need to have massive traffic on your site, and with regular SEO tips, nothing progressive will be achieved. Instead, try on the effective SEM techniques and notice some striking rise in the traffic.

artificial intelligence, natural language, search engine marketing, (4 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.96)

Add feedback

Graph integration of structured, semistructured and unstructured data for data journalism

Balalau, Oana, Conceiç{ã}o, Catarina, Galhardas, Helena, Manolescu, Ioana, Merabti, Tayeb, You, Jingmao, Youssef, Youssr

arXiv.org Artificial IntelligenceJul-23-2020

Nowadays, journalism is facilitated by the existence of large amounts of digital data sources, including many Open Data ones. Such data sources are extremely heterogeneous, ranging from highly struc-tured (relational databases), semi-structured (JSON, XML, HTML), graphs (e.g., RDF), and text. Journalists (and other classes of users lacking advanced IT expertise, such as most non-governmental-organizations, or small public administrations) need to be able to make sense of such heterogeneous corpora, even if they lack the ability to de ne and deploy custom extract-transform-load work ows. These are di cult to set up not only for arbitrary heterogeneous inputs , but also given that users may want to add (or remove) datasets to (from) the corpus. We describe a complete approach for integrating dynamic sets of heterogeneous data sources along the lines described above: the challenges we faced to make such graphs useful, allow their integration to scale, and the solutions we proposed for these problems. Our approach is implemented within the ConnectionLens system; we validate it through a set of experiments.

information retrieval, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2007.12488

Country:

Africa > Central African Republic (0.14)
Europe > France > Île-de-France > Paris > Paris (0.05)
North America > United States > Florida > Hillsborough County > Tampa (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Media > News (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

DSC Data Science Search Engine

#artificialintelligenceJul-22-2020, 03:00:31 GMT

Data Science Fails – There's No Such Thing as a Free Lunch While this latest DSC podcast isn't about sandwiches, it is related to lunch, specifically the no free lunch theorem. In short, the theorem states that no algorithm can be equally good at learning everything, which means that you can't know in advance which algorithm will work best on your data. Data Science Fails – There's No Such Thing as a Free Lunch While this latest DSC podcast isn't about sandwiches, it is related to lunch, specifically the no free lunch theorem. In short, the theorem states that no algorithm can be equally good at learning everything, which means that you can't know in advance which algorithm will work best on your data.

information retrieval, machine learning, natural language, (7 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (0.95)
Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Managing Data in Massive-Scale Vector Search Engine

#artificialintelligenceJul-22-2020, 03:00:30 GMT

The search based on Raw Data File is brute-force search which compares the distances between query vectors and origin vectors, and computes the nearest k vectors. Search efficiency can be greatly increased if the search is based on Index File where vectors are indexed. Building index requires additional disk space and is usually time-consuming. So what are the differences between Raw Data Files and Index Files? To put it simple, Raw Data File records every single vector together with their unique ID while Index File records vector clustering results such as index type, cluster centroids, and vectors in each cluster.

artificial intelligence, information retrieval, natural language, (7 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.40)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Elasticsearch for Data Science just got way easier

#artificialintelligenceJul-21-2020, 06:20:45 GMT

Elasticsearch is a feature-rich, open-source search-engine built on top of Apache Lucene, one of the most important full-text search engines on the market. Elasticsearch is best known for the expansive and versatile REST API experience it provides, including efficient wrappers for full-text search, sorting and aggregation tasks, making it a lot easier to implement such capabilities in existing backends without the need for complex re-engineering. Ever since its introduction in 2010, Elasticsearch gained a lot of traction in the software engineering domain and by 2016 it became the most popular enterprise search-engine software stack according to DBMS knowledge base DB-engines, surpassing the industry-standard Apache Solr (which is also built on top of Lucene). One of the things that makes Elasticsearch so popular is the ecosystem it generated. Engineers across the world developed open-source Elasticsearch integrations and extensions, and many of these projects were absorbed by Elastic (the company behind the Elasticsearch project) as part of their stack.

information retrieval, machine learning, natural language, (16 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.77)

Add feedback

COVID-19 Literature Knowledge Graph Construction and Drug Repurposing Report Generation

Wang, Qingyun, Li, Manling, Wang, Xuan, Parulian, Nikolaus, Han, Guangxing, Ma, Jiawei, Tu, Jingxuan, Lin, Ying, Zhang, Haoran, Liu, Weili, Chauhan, Aabhas, Guan, Yingjun, Li, Bangzheng, Li, Ruisong, Song, Xiangchen, Ji, Heng, Han, Jiawei, Chang, Shih-Fu, Pustejovsky, James, Rah, Jasmine, Liem, David, Elsayed, Ahmed, Palmer, Martha, Voss, Clare, Schneider, Cynthia, Onyshkevych, Boyan

arXiv.org Artificial IntelligenceJul-20-2020

To combat COVID-19, both clinicians and scientists need to digest the vast amount of relevant biomedical knowledge in literature to understand the disease mechanism and the related biological functions. We have developed a novel and comprehensive knowledge discovery framework, \textbf{COVID-KG} to extract fine-grained multimedia knowledge elements (entities, relations and events) from scientific literature. We then exploit the constructed multimedia knowledge graphs (KGs) for question answering and report generation, using drug repurposing as a case study. Our framework also provides detailed contextual sentences, subfigures and knowledge subgraphs as evidence. All of the data, KGs, reports, resources and shared services are publicly available.

data mining, information retrieval, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2007.00576

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Illinois (0.05)
(9 more...)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Data Science > Data Mining (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.62)
(2 more...)

Add feedback

Understanding TF-IDF in NLP.

#artificialintelligenceJul-12-2020, 17:36:47 GMT

TF-IDF, short for Term Frequency–Inverse Document Frequency, is a numerical statistic that is intended to reflect how important a word is to a document, in a collection or Corpus(Paragraph).It is often used as a Weighing Factor in searches of information retrieval, Text Mining, and User Modelling. The TF-IDF value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. TF-IDF is much more preferred than Bag-Of-Words, in which every word, is represented as 1 or 0, every time it gets appeared in each Sentence, while, in TF-IDF, gives weightage to each Word separately, which in turn defines the importance of each word than others. Let's Consider these Three sentences: Let's assume a word "Good", in sentence 1, as we know, TF(t) (Number of times term t appears in a document) / (Total number of terms in the document). So, Number of times the word "Good" appears in Sentence 1 is, 1 Time, and the Total number of times the word "Good", appears in all three Sentences is 3 times, so the TF(Term Frequency) value of word "Good" is, TF("Good") 1/3 0.333.

frequency, information retrieval, natural language, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

A Startup Is Testing the Subscription Model for Search Engines

WIREDJul-5-2020, 14:00:00 GMT

In November 2017, Sridhar Ramaswamy--the head of Google's $95 billion advertising arm--left the company after a scandal concerning advertisements for major corporations found on YouTube videos that put children in questionable situations. Ramaswamy told The New York Times that shortly after that incident, he decided that he needed to do something different in his life--because "an ad-supported model had limitations." This story originally appeared on Ars Technica, a trusted source for technology news, tech policy analysis, reviews, and more. Ars is owned by WIRED's parent company, Condé Nast. Ramaswamy's startup company, Neeva, is that "something different"--and though it, too, is a search engine, it seeks to sidestep some of Google's problems by avoiding the ads altogether. Ramaswamy says that the new engine won't show ads and won't collect or profit from user data--instead, it will charge its users a subscription fee.

artificial intelligence, information retrieval, natural language, (12 more...)

WIRED

Industry: Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Information Management > Search (0.78)
Information Technology > Communications (0.73)
Information Technology > Security & Privacy (0.71)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)

Add feedback