AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Search Engine for Scientific Publications: a Cybersecurity Case Study

Oliveira, Nuno, Sousa, Norberto, Praça, Isabel

arXiv.org Artificial IntelligenceJun-30-2021

Cybersecurity is a very challenging topic of research nowadays, as digitalization increases the interaction of people, software and services on the Internet by means of technology devices and networks connected to it. The field is broad and has a lot of unexplored ground under numerous disciplines such as management, psychology, and data science. Its large disciplinary spectrum and many significant research topics generate a considerable amount of information, making it hard for us to find what we are looking for when researching a particular subject. This work proposes a new search engine for scientific publications which combines both information retrieval and reading comprehension algorithms to extract answers from a collection of domain-specific documents. The proposed solution although being applied to the context of cybersecurity exhibited great generalization capabilities and can be easily adapted to perform under other distinct knowledge domains.

application, corpus, search engine, (12 more...)

arXiv.org Artificial Intelligence

2107.00082

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Portugal > Porto > Porto (0.04)

Genre:

Overview (1.00)
Research Report (0.65)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.96)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Leveraging Language to Learn Program Abstractions and Search Heuristics

#artificialintelligenceJun-23-2021, 01:16:23 GMT

Inductive program synthesis, or inferring programs from examples of desired behavior, offers a general paradigm for building interpretable, robust, and generalizable machine learning systems. Effective program synthesis depends on two key ingredients: a strong library of functions from which to build programs, and an efficient search strategy for finding programs that solve a given task. We introduce LAPS (Language for Abstraction and Program Search), a technique for using natural language annotations to guide joint learning of libraries and neurally-guided search models for synthesis. When integrated into a state-of-the-art library learning system (DreamCoder), LAPS produces higher-quality libraries and improves search efficiency and generalization on three domains – string editing, image composition, and abstract reasoning about scenes – even when no natural language hints are available at test time.

learn program abstraction, leveraging language, program abstraction and search heuristic, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)

Add feedback

Brave's privacy-focused search engine is available in beta

EngadgetJun-22-2021, 16:00:13 GMT

You can now try Brave's search engine for yourself. Brave has launched a beta Search feature both as an option in all its browsers as well as through the web for everyone else. As you'd expect, it's billed as a privacy- and transparency-oriented platform that doesn't track your activity or use "secret" algorithms to curate results. You'll eventually have the option of an ad-free version if you're willing to pay, and Brave will make Search available for other engines. The site index is independent, although Brave noted that image searches and some other features will lean on Microsoft's Bing.

privacy-focused search engine

Engadget

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Information Retrieval for ZeroSpeech 2021: The Submission by University of Wroclaw

Chorowski, Jan, Ciesielski, Grzegorz, Dzikowski, Jarosław, Łańcucki, Adrian, Marxer, Ricard, Opala, Mateusz, Pusz, Piotr, Rychlikowski, Paweł, Stypułkowski, Michał

arXiv.org Artificial IntelligenceJun-22-2021

We build on the In this paper we present our submission which tries to address unsupervised representations of speech proposed by the organizers all four tasks. We extend the baseline solution in several as a baseline, derived from CPC and clustered with the k-directions: we refine the intermediate representations, extracted means algorithm. We demonstrate that simple methods of refining with CPC, to directly improve the ABX scores. We show that those representations can narrow the gap, or even improve such representations can be used to perform simple fuzzy lookups upon the solutions which use a high computational budget. The in a large dataset, and even extract some common patterns results lead to the conclusion that the CPC-derived representations that serve as pseudo-words. Our approach to the semantic word are still too noisy for training language models, but stable similarity task is also based on pseudo-words.

dataset, language model, representation, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2021-1465

2106.11603

Country:

Europe > Poland > Lower Silesia Province > Wroclaw (0.40)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Maryland > Baltimore (0.04)
(4 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.65)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)

Add feedback

Towards a Query-Optimal and Time-Efficient Algorithm for Clustering with a Faulty Oracle

Peng, Pan, Zhang, Jiapeng

arXiv.org Machine LearningJun-18-2021

Motivated by applications in crowdsourced entity resolution in database, signed edge prediction in social networks and correlation clustering, Mazumdar and Saha [NIPS 2017] proposed an elegant theoretical model for studying clustering with a faulty oracle. In this model, given a set of $n$ items which belong to $k$ unknown groups (or clusters), our goal is to recover the clusters by asking pairwise queries to an oracle. This oracle can answer the query that ``do items $u$ and $v$ belong to the same cluster?''. However, the answer to each pairwise query errs with probability $\varepsilon$, for some $\varepsilon\in(0,\frac12)$. Mazumdar and Saha provided two algorithms under this model: one algorithm is query-optimal while time-inefficient (i.e., running in quasi-polynomial time), the other is time efficient (i.e., in polynomial time) while query-suboptimal. Larsen, Mitzenmacher and Tsourakakis [WWW 2020] then gave a new time-efficient algorithm for the special case of $2$ clusters, which is query-optimal if the bias $\delta:=1-2\varepsilon$ of the model is large. It was left as an open question whether one can obtain a query-optimal, time-efficient algorithm for the general case of $k$ clusters and other regimes of $\delta$. In this paper, we make progress on the above question and provide a time-efficient algorithm with nearly-optimal query complexity (up to a factor of $O(\log^2 n)$) for all constant $k$ and any $\delta$ in the regime when information-theoretic recovery is possible. Our algorithm is built on a connection to the stochastic block model.

clustering, faulty oracle, query-optimal and time-efficient algorithm

arXiv.org Machine Learning

2106.10374

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Social Media (0.53)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.53)

Add feedback

How to Extract Relevant Keywords with KeyBERT

#artificialintelligenceJun-17-2021, 20:20:36 GMT

There are many powerful techniques that perform keywords extraction (e.g. However, they are mainly based on the statistical properties of the text and don't necessarily take into account the semantic aspects of the full document. KeyBERT is a minimal and easy-to-use keyword extraction technique that aims at solving this issue. It leverages the BERT language model and relies on the transformers library. So go check his repo (and clone it) if you're interested in using it.

extract relevant keyword, keybert, keyword, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Query Embedding on Hyper-relational Knowledge Graphs

Alivanistos, Dimitrios, Berrendorf, Max, Cochez, Michael, Galkin, Mikhail

arXiv.org Artificial IntelligenceJun-17-2021

Multi-hop logical reasoning is an established problem in the field of representation learning on knowledge graphs (KGs). It subsumes both one-hop link prediction as well as other more complex types of logical queries. Existing algorithms operate only on classical, triple-based graphs, whereas modern KGs often employ a hyper-relational modeling paradigm. In this paradigm, typed edges may have several key-value pairs known as qualifiers that provide fine-grained context for facts. In queries, this context modifies the meaning of relations, and usually reduces the answer set. Hyper-relational queries are often observed in real-world KG applications, and existing approaches for approximate query answering cannot make use of qualifier pairs. In this work, we bridge this gap and extend the multi-hop reasoning problem to hyper-relational KGs allowing to tackle this new type of complex queries. Building upon recent advancements in Graph Neural Networks and query embedding techniques, we study how to embed and answer hyper-relational conjunctive queries. Besides that, we propose a method to answer such queries and demonstrate in our experiments that qualifiers improve query answering on a diverse set of query patterns.

graph, query, representation, (17 more...)

arXiv.org Artificial Intelligence

2106.08166

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.77)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

This High Schooler Created a Drug Discovery Search Engine

#artificialintelligenceJun-14-2021, 21:30:28 GMT

Between his mom's place in Manhattan, his dad in Queens, and his high school in the Bronx, Noah Getz is on the subway a lot. It gives him time to read and to think. Our first coronavirus summer was waning, and he'd been wrestling with a weighty science problem: using machine learning to hunt down tiny molecules that may help treat Alzheimer's. Thus far, his AI had been spitting out results that were "almost comically bad." The problem was that the algorithms Getz was using did their best when they had massive amounts of data to sift through and discover patterns in. Getz' data set was far smaller; he was working with one lab at Mount Sinai, not a multinational pharmaceutical company with a galaxy-sized drug library.

algorithm, compound, getz, (9 more...)

#artificialintelligence

AI-Alerts: 2021 > 2021-06 > AAAI AI-Alert for Jun 15, 2021 (1.00)

Country:

North America > United States > New York > Bronx County > New York City (0.25)
Oceania > New Zealand > North Island > Waikato (0.05)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.38)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.70)

Add feedback

Bilateral Personalized Dialogue Generation with Dynamic Persona-Aware Fusion

Li, Bin, Sun, Bin, Li, Shutao

arXiv.org Artificial IntelligenceJun-14-2021

Generating personalized responses is one of the major challenges in natural human-robot interaction. Current researches in this field mainly focus on generating responses consistent with the robot's pre-assigned persona, while ignoring the user's persona. Such responses may be inappropriate or even offensive, which may lead to the bad user experience. Therefore, we propose a bilateral personalized dialogue generation (BPDG) method with dynamic persona-aware fusion via multi-task transfer learning to generate responses consistent with both personas. The proposed method aims to accomplish three learning tasks: 1) an encoder is trained with dialogue utterances added with corresponded personalized attributes and relative position (language model task), 2) a dynamic persona-aware fusion module predicts the persona presence to adaptively fuse the contextual and bilateral personas encodings (persona prediction task) and 3) a decoder generates natural, fluent and personalized responses (dialogue generation task). To make the generated responses more personalized and bilateral persona-consistent, the Conditional Mutual Information Maximum (CMIM) criterion is adopted to select the final response from the generated candidates. The experimental results show that the proposed method outperforms several state-of-the-art methods in terms of both automatic and manual evaluations.

dialogue, information, persona, (14 more...)

arXiv.org Artificial Intelligence

2106.07857

Country:

Asia > China > Guangdong Province > Guangzhou (0.05)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Jiangsu Province > Nanjing (0.04)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
(3 more...)

Add feedback

Goal-Aware Neural SAT Solver

Ozolins, Emils, Freivalds, Karlis, Draguns, Andis, Gaile, Eliza, Zakovskis, Ronalds, Kozlovics, Sergejs

arXiv.org Artificial IntelligenceJun-14-2021

Modern neural networks obtain information about the problem and calculate the output solely from the input values. We argue that it is not always optimal, and the network's performance can be significantly improved by augmenting it with a query mechanism that allows the network to make several solution trials at run time and get feedback on the loss value on each trial. To demonstrate the capabilities of the query mechanism, we formulate an unsupervised (not dependant on labels) loss function for Boolean Satisfiability Problem (SAT) and theoretically show that it allows the network to extract rich information about the problem. We then propose a neural SAT solver with a query mechanism called QuerySAT and show that it outperforms the neural baseline on a wide range of SAT tasks and the classical baselines on SHA-1 preimage attack and 3-SAT task.

formula, query mechanism, solver, (14 more...)

arXiv.org Artificial Intelligence

2106.07162

Country:

Europe > Latvia (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Finland > Uusimaa > Helsinki (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback