AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

BERT based patent novelty search by training claims to their own description

Freunek, Michael, Bodmer, André

arXiv.org Machine LearningMar-4-2021

In this paper we present a method to concatenate patent claims to their own description. By applying this method, BERT trains suitable descriptions for claims. Such a trained BERT (claim-to-description- BERT) could be able to identify novelty relevant descriptions for patents. In addition, we introduce a new scoring scheme, relevance scoring or novelty scoring, to process the output of BERT in a meaningful way. We tested the method on patent applications by training BERT on the first claims of patents and corresponding descriptions. BERT's output has been processed according to the relevance score and the results compared with the cited X documents in the search reports. The test showed that BERT has scored some of the cited X documents as highly relevant.

bert, description piece, patent, (16 more...)

arXiv.org Machine Learning

2103.01126

Country:

Europe > Switzerland > Bern > Bern (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)
Europe > Germany (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Law > Intellectual Property & Technology Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Brave Search is a privacy-first search engine

PCWorldMar-3-2021, 18:34:00 GMT

Browser privacy is a big deal, as Google and other companies use your search data to serve you ads while you surf the web. While most users accept that tradeoff, others who believe strongly in maintaining their own data privacy. If you're one of these, Brave Software can help. On Wednesday the company said it's launching a search engine to compete with Google and Bing, with privacy as its first priority. Brave is buying Tailcat, an open search engine, and will add it to what it's calling Brave Search, a forthcoming search engine.

brave search, privacy-first search engine, search engine, (3 more...)

PCWorld

Industry: Information Technology > Security & Privacy (0.98)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Security & Privacy (0.98)

Add feedback

Brave is developing its own privacy-focused search engine

EngadgetMar-3-2021, 16:20:59 GMT

Privacy-focused browser Brave is working on its own search engine. It has bought Tailcat, an open-source engine created by a team who worked on the defunct anti-tracking browser and search engine Cliqz, to power Brave Search. The company will allow others to use Brave Search tech to build their own search engines. Brave says the search engine will provide an alternative to Google Search and Chrome. It's developing Brave Search using the same principles as its browser, which now has more than 25 million monthly active users.

brave search, engine, search engine, (4 more...)

Engadget

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Data Augmentation for Abstractive Query-Focused Multi-Document Summarization

Pasunuru, Ramakanth, Celikyilmaz, Asli, Galley, Michel, Xiong, Chenyan, Zhang, Yizhe, Bansal, Mohit, Gao, Jianfeng

arXiv.org Artificial IntelligenceMar-2-2021

The progress in Query-focused Multi-Document Summarization (QMDS) has been limited by the lack of sufficient largescale high-quality training datasets. We present two QMDS training datasets, which we construct using two data augmentation methods: (1) transferring the commonly used single-document CNN/Daily Mail summarization dataset to create the QMDSCNN dataset, and (2) mining search-query logs to create the QMDSIR dataset. These two datasets have complementary properties, i.e., QMDSCNN has real summaries but queries are simulated, while QMDSIR has real queries but simulated summaries. To cover both these real summary and query aspects, we build abstractive end-to-end neural network models on the combined datasets that yield new state-of-the-art transfer results on DUC datasets. We also introduce new hierarchical encoders that enable a more efficient encoding of the query together with multiple documents. Empirical results demonstrate that our data augmentation and encoding methods outperform baseline models on automatic metrics, as well as on human evaluations along multiple attributes.

dataset, query, summarization, (15 more...)

arXiv.org Artificial Intelligence

2103.01863

Country:

North America > United States > New York > Kings County > New York City (0.14)
North America > Mexico > Sinaloa (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
Media (0.68)
Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.93)

Add feedback

Making Enterprise Search Personal - Coruzant Technologies

#artificialintelligenceMar-1-2021, 10:06:43 GMT

Knowledge management providers are now looking to build systems that are more tailored to the needs of their customers. In technical parlance, this is known as the behavioral model for information retrieval system design. With these models, users search for a product or service, and the results often include related offerings that are better matched to the user's intent. Honing in on this kind of personalization is at the crux of the new experience economy of customer service and the forefront of Enterprise Search advancements. One of the key requirements for forward-looking knowledge management is the capacity to extract data from the typically hundreds and thousands of data silos scattered throughout a company and crawl them to create meaningful insights.

coruzant technology, customer experience, information, (10 more...)

#artificialintelligence

Technology:

Information Technology > Knowledge Management (0.78)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.75)

Add feedback

Building a Complete AI Based Search Engine with Elasticsearch, Kubeflow and Katib

#artificialintelligenceFeb-26-2021, 20:20:56 GMT

Building search systems is hard. Preparing them to work with machine learning is really hard. Developing a complete search engine framework integrated with AI is really really hard. In this post, we'll build a search engine from scratch and discuss on how to further optimize results by adding a machine learning layer using Kubeflow and Katib. This new layer will be capable of retrieving results considering the context of users and is the main focus of this article. As we'll see, thanks to Kubeflow and Katib, final result is rather quite simple, efficient and easy to maintain. To understand the concepts in practice, we'll implement the system with hands-on experience. As it's been built on top of Kubernetes, you can use any infrastructure you like (given appropriate adaptations).

elasticsearch, katib, ranking model, (14 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.92)

Add feedback

Approximate Knowledge Graph Query Answering: From Ranking to Binary Classification

van Bakel, Ruud, Aleksiev, Teodor, Daza, Daniel, Alivanistos, Dimitrios, Cochez, Michael

arXiv.org Artificial IntelligenceFeb-22-2021

Large, heterogeneous datasets are characterized by missing or even erroneous information. This is more evident when they are the product of community effort or automatic fact extraction methods from external sources, such as text. A special case of the aforementioned phenomenon can be seen in knowledge graphs, where this mostly appears in the form of missing or incorrect edges and nodes. Structured querying on such incomplete graphs will result in incomplete sets of answers, even if the correct entities exist in the graph, since one or more edges needed to match the pattern are missing. To overcome this problem, several algorithms for approximate structured query answering have been proposed. Inspired by modern Information Retrieval metrics, these algorithms produce a ranking of all entities in the graph, and their performance is further evaluated based on how high in this ranking the correct answers appear. In this work we take a critical look at this way of evaluation. We argue that performing a ranking-based evaluation is not sufficient to assess methods for complex query answering. To solve this, we introduce Message Passing Query Boxes (MPQB), which takes binary classification metrics back into use and shows the effect this has on the recently proposed query embedding method MPQE.

classification, graph, query, (15 more...)

arXiv.org Artificial Intelligence

2102.11389

Country:

Europe > Netherlands > North Holland > Amsterdam (0.05)
Oceania > Australia > New South Wales > Sydney (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Netherlands > South Holland > Leiden (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Unsupervised Meta Learning for One Shot Title Compression in Voice Commerce

Mukherjee, Snehasish

arXiv.org Artificial IntelligenceFeb-21-2021

Product title compression for voice and mobile commerce is a well studied problem with several supervised models proposed so far. However these models have 2 major limitations; they are not designed to generate compressions dynamically based on cues at inference time, and they do not transfer well to different categories at test time. To address these shortcomings we model title compression as a meta learning problem where we ask can we learn a title compression model given only 1 example compression? We adopt an unsupervised approach to meta training by proposing an automatic task generation algorithm that models the observed label generation process as the outcome of 4 unobserved processes. We create parameterized approximations to each of these 4 latent processes to get a principled way of generating random compression rules, which are treated as different tasks. For our main meta learner, we use 2 models; M1 and M2. M1 is a task agnostic embedding generator whose output feeds into M2 which is a task specific label generator. We pre-train M1 on a novel unsupervised segment rank prediction task that allows us to treat M1 as a segment generator that also learns to rank segments during the meta-training process. Our experiments on 16000 crowd generated meta-test examples show that our unsupervised meta training regime is able to acquire a learning algorithm for different tasks after seeing only 1 example for each task. Further, we show that our model trained end to end as a black box meta learner, outperforms non parametric approaches. Our best model obtains an F1 score of 0.8412, beating the baseline by a large margin of 25 F1 points.

compression, dataset, title compression, (13 more...)

arXiv.org Artificial Intelligence

2102.1076

Country:

North America > United States > California > Santa Cruz County > Santa Cruz (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(8 more...)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

Scaling Creative Inspiration with Fine-Grained Functional Facets of Product Ideas

Hope, Tom, Tamari, Ronen, Kang, Hyeonsu, Hershcovich, Daniel, Chan, Joel, Kittur, Aniket, Shahaf, Dafna

arXiv.org Artificial IntelligenceFeb-19-2021

Web-scale repositories of products, patents and scientific papers offer an opportunity for creating automated systems that scour millions of ideas and assist users in discovering inspirations and solutions. Yet the common representation of ideas is in the form of raw textual descriptions, lacking important structure that is required for supporting creative innovation. Prior work has pointed to the importance of functional structure -- capturing the mechanisms and purposes of inventions -- for allowing users to discover structural connections across ideas and creatively adapt existing technologies. However, the use of functional representations was either coarse and limited in expressivity, or dependent on curated knowledge bases with poor coverage and significant manual effort from users. To help bridge this gap and unlock the potential of large-scale idea mining, we propose a novel computational representation that automatically breaks up products into fine-grained functional facets. We train a model to extract these facets from a challenging real-world corpus of invention descriptions, and represent each product as a set of facet embeddings. We design similarity metrics that support granular matching between functional facets across ideas, and use them to build a novel functional search capability that enables expressive queries for mechanisms and purposes. We construct a graph capturing hierarchical relations between purposes and mechanisms across an entire corpus of products, and use the graph to help problem-solvers explore the design space around a focal problem and view related problem perspectives. In empirical user studies, our approach leads to a significant boost in search accuracy and in the quality of creative inspirations, outperforming strong baselines and state-of-art representations of product texts by 50-60%.

inspiration, mechanism, representation, (14 more...)

arXiv.org Artificial Intelligence

2102.09761

Country:

Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Energy (1.00)
Consumer Products & Services (1.00)
Government (0.68)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(3 more...)

Add feedback

Information Ranking Using Optimum-Path Forest

Ascenção, Nathalia Q., Afonso, Luis C. S., Colombo, Danilo, Oliveira, Luciano, Papa, João P.

arXiv.org Artificial IntelligenceFeb-15-2021

The task of learning to rank has been widely studied by the machine learning community, mainly due to its use and great importance in information retrieval, data mining, and natural language processing. Therefore, ranking accurately and learning to rank are crucial tasks. Context-Based Information Retrieval systems have been of great importance to reduce the effort of finding relevant data. Such systems have evolved by using machine learning techniques to improve their results, but they are mainly dependent on user feedback. Although information retrieval has been addressed in different works along with classifiers based on Optimum-Path Forest (OPF), these have so far not been applied to the learning to rank task. Therefore, the main contribution of this work is to evaluate classifiers based on Optimum-Path Forest, in such a context. Experiments were performed considering the image retrieval and ranking scenarios, and the performance of OPF-based approaches was compared to the well-known SVM-Rank pairwise technique and a baseline based on distance calculation. The experiments showed competitive results concerning precision and outperformed traditional techniques in terms of computational load.

classification, classifier, image retrieval, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IJCNN48605.2020.9207689

2102.07917

Country:

South America > Brazil > Bahia > Salvador (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Italy (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

Add feedback