AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

NeuroQL: A Neuro-Symbolic Language and Dataset for Inter-Subjective Reasoning

Papoulias, Nick

arXiv.org Artificial IntelligenceMar-13-2023

We present a new AI task and baseline solution for Inter-Subjective Reasoning. We define inter-subjective information, to be a mixture of objective and subjective information possibly shared by different parties. Examples may include commodities and their objective properties as reported by IR (Information Retrieval) systems, that need to be cross-referenced with subjective user reviews from an online forum. For an AI system to successfully reason about both, it needs to be able to combine symbolic reasoning of objective facts with the shared consensus found on subjective user reviews. To this end we introduce the NeuroQL dataset and DSL (Domain-specific Language) as a baseline solution for this problem. NeuroQL is a neuro-symbolic language that extends logical unification with neural primitives for extraction and retrieval. It can function as a target for automatic translation of inter-subjective questions (posed in natural language) into the neuro-symbolic code that can answer them.

information retrieval, logic & formal reasoning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.07146

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.93)

Add feedback

Unifying Vision, Text, and Layout for Universal Document Processing

Tang, Zineng, Yang, Ziyi, Wang, Guoxin, Fang, Yuwei, Liu, Yang, Zhu, Chenguang, Zeng, Michael, Zhang, Cha, Bansal, Mohit

arXiv.org Artificial IntelligenceMar-13-2023

We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats, including document understanding and generation. UDOP leverages the spatial correlation between textual content and document image to model image, text, and layout modalities with one uniform representation. With a novel Vision-Text-Layout Transformer, UDOP unifies pretraining and multi-domain downstream tasks into a prompt-based sequence generation scheme. UDOP is pretrained on both large-scale unlabeled document corpora using innovative self-supervised objectives and diverse labeled data. UDOP also learns to generate document images from text and layout modalities via masked image reconstruction. To the best of our knowledge, this is the first time in the field of document AI that one model simultaneously achieves high-quality neural document editing and content customization. Our method sets the state-of-the-art on 8 Document AI tasks, e.g., document understanding and QA, across diverse data domains like finance reports, academic papers, and websites. UDOP ranks first on the leaderboard of the Document Understanding Benchmark.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.02623

Country:

Oceania > Australia (0.04)
North America > United States > North Carolina (0.04)
North America > United States > California (0.04)
Europe > Spain (0.04)

Genre: Research Report (0.50)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

A Framework for Combining Entity Resolution and Query Answering in Knowledge Bases

Fagin, Ronald, Kolaitis, Phokion G., Lembo, Domenico, Popa, Lucian, Scafoglieri, Federico

arXiv.org Artificial IntelligenceMar-13-2023

We propose a new framework for combining entity resolution and query answering in knowledge bases (KBs) with tuple-generating dependencies (tgds) and equality-generating dependencies (egds) as rules. We define the semantics of the KB in terms of special instances that involve equivalence classes of entities and sets of values. Intuitively, the former collect all entities denoting the same real-world object, while the latter collect all alternative values for an attribute. This approach allows us to both resolve entities and bypass possible inconsistencies in the data. We then design a chase procedure that is tailored to this new framework and has the feature that it never fails; moreover, when the chase procedure terminates, it produces a universal solution, which in turn can be used to obtain the certain answers to conjunctive queries. We finally discuss challenges arising when the chase does not terminate.

combining entity resolution, knowledge base

arXiv.org Artificial Intelligence

doi: 10.24963/kr.2023/23

2303.07469

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.60)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.60)

Add feedback

ALIST: Associative Logic for Inference, Storage and Transfer. A Lingua Franca for Inference on the Web

Nuamah, Kwabena, Bundy, Alan

arXiv.org Artificial IntelligenceMar-12-2023

Recent developments in support for constructing knowledge graphs have led to a rapid rise in their creation both on the Web and within organisations. Added to existing sources of data, including relational databases, APIs, etc., there is a strong demand for techniques to query these diverse sources of knowledge. While formal query languages, such as SPARQL, exist for querying some knowledge graphs, users are required to know which knowledge graphs they need to query and the unique resource identifiers of the resources they need. Although alternative techniques in neural information retrieval embed the content of knowledge graphs in vector spaces, they fail to provide the representation and query expressivity needed (e.g. inability to handle non-trivial aggregation functions such as regression). We believe that a lingua franca, i.e. a formalism, that enables such representational flexibility will increase the ability of intelligent automated agents to combine diverse data sources by inference. Our work proposes a flexible representation (alists) to support intelligent federated querying of diverse knowledge sources. Our contribution includes (1) a formalism that abstracts the representation of queries from the specific query language of a knowledge graph; (2) a representation to dynamically curate data and functions (operations) to perform non-trivial inference over diverse knowledge sources; (3) a demonstration of the expressiveness of alists to represent the diversity of representational formalisms, including SPARQL queries, and more generally first-order logic expressions.

information retrieval, logic & formal reasoning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2303.06691

Country:

Europe > Spain > Andalusia > Seville Province (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
North America > United States > Missouri > Jackson County > Kansas City (0.04)
(2 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.67)

Add feedback

Another Generic Setting for Entity Resolution: Basic Theory

Guo, Xiuzhan, Berrill, Arthur, Kulkarni, Ajinkya, Belezko, Kostya, Luo, Min

arXiv.org Artificial IntelligenceMar-12-2023

Benjelloun et al. \cite{BGSWW} considered the Entity Resolution (ER) problem as the generic process of matching and merging entity records judged to represent the same real world object. They treated the functions for matching and merging entity records as black-boxes and introduced four important properties that enable efficient generic ER algorithms. In this paper, we shall study the properties which match and merge functions share, model matching and merging black-boxes for ER in a partial groupoid, based on the properties that match and merge functions satisfy, and show that a partial groupoid provides another generic setting for ER. The natural partial order on a partial groupoid is defined when the partial groupoid satisfies Idempotence and Catenary associativity. Given a partial order on a partial groupoid, the least upper bound and compatibility ($LU_{pg}$ and $CP_{pg}$) properties are equivalent to Idempotence, Commutativity, Associativity, and Representativity and the partial order must be the natural one we defined when the domain of the partial operation is reflexive. The partiality of a partial groupoid can be reduced using connected components and clique covers of its domain graph, and a noncommutative partial groupoid can be mapped to a commutative one homomorphically if it has the partial idempotent semigroup like structures. In a finitely generated partial groupoid $(P,D,\circ)$ without any conditions required, the ER we concern is the full elements in $P$. If $(P,D,\circ)$ satisfies Idempotence and Catenary associativity, then the ER is the maximal elements in $P$, which are full elements and form the ER defined in \cite{BGSWW}. Furthermore, in the case, since there is a transitive binary order, we consider ER as ``sorting, selecting, and querying the elements in a finitely generated partial groupoid."

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2303.06629

Country: Europe > Netherlands (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.62)

Add feedback

A Theoretical Analysis Of Nearest Neighbor Search On Approximate Near Neighbor Graph

Shrivastava, Anshumali, Song, Zhao, Xu, Zhaozhuo

arXiv.org Artificial IntelligenceMar-10-2023

Graph-based algorithms have demonstrated state-of-the-art performance in the nearest neighbor search (NN-Search) problem. These empirical successes urge the need for theoretical results that guarantee the search quality and efficiency of these algorithms. However, there exists a practice-to-theory gap in the graph-based NN-Search algorithms. Current theoretical literature focuses on greedy search on exact near neighbor graph while practitioners use approximate near neighbor graph (ANN-Graph) to reduce the preprocessing time. This work bridges this gap by presenting the theoretical guarantees of solving NN-Search via greedy search on ANN-Graph for low dimensional and dense vectors. To build this bridge, we leverage several novel tools from computational geometry. Our results provide quantification of the trade-offs associated with the approximation while building a near neighbor graph. We hope our results will open the door for more provable efficient graph-based NN-Search algorithms.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2303.0621

Country: Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

A Survey on Event-based News Narrative Extraction

Norambuena, Brian Keith, Mitra, Tanushree, North, Chris

arXiv.org Artificial IntelligenceMar-10-2023

Narratives are fundamental to our understanding of the world, providing us with a natural structure for knowledge representation over time. Computational narrative extraction is a subfield of artificial intelligence that makes heavy use of information retrieval and natural language processing techniques. Despite the importance of computational narrative extraction, relatively little scholarly work exists on synthesizing previous research and strategizing future research in the area. In particular, this article focuses on extracting news narratives from an event-centric perspective. Extracting narratives from news data has multiple applications in understanding the evolving information landscape. This survey presents an extensive study of research in the area of event-based news narrative extraction. In particular, we screened over 900 articles that yielded 54 relevant articles. These articles are synthesized and organized by representation model, extraction criteria, and evaluation approaches. Based on the reviewed studies, we identify recent trends, open challenges, and potential research lines.

information retrieval, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3584741

2302.08351

Country:

North America > United States > Washington > King County > Seattle (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
(32 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Media > News (1.00)
Government (1.00)
Law Enforcement & Public Safety (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

Malvertising in Google search results delivering stealers

#artificialintelligenceMar-9-2023, 17:24:02 GMT

In recent months, we observed an increase in the number of malicious campaigns that use Google Advertising as a means of distributing and delivering malware. At least two different stealers, Rhadamanthys and RedLine, were abusing the search engine promotion plan in order to deliver malicious payloads to victims' machines. They seem to use the same technique of mimicking a website associated with well-known software like Notepad and Blender 3D. The treat actors create copies of legit software websites while employing typosquatting (exploiting incorrectly spelled popular brands and company names as URLs) or combosquatting (using popular brands and company names combined with arbitrary words as URLs) to make the sites look like the real thing to the end user--the domain names allude to the original software or vendor. The design and the content of the fake web pages look the same as those of the original ones.

exe process, malicious payload, website, (15 more...)

#artificialintelligence

Industry: Information Technology > Security & Privacy (0.90)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.61)

Add feedback

GlobalNER: Incorporating Non-local Information into Named Entity Recognition

Hsu, Chiao-Wei, Su, Keh-Yih

arXiv.org Artificial IntelligenceMar-6-2023

Nowadays, many Natural Language Processing (NLP) tasks see the demand for incorporating knowledge external to the local information to further improve the performance. However, there is little related work on Named Entity Recognition (NER), which is one of the foundations of NLP. Specifically, no studies were conducted on the query generation and re-ranking for retrieving the related information for the purpose of improving NER. This work demonstrates the effectiveness of a DNN-based query generation method and a mention-aware re-ranking architecture based on BERTScore particularly for NER. In the end, a state-of-the-art performance of 61.56 micro-f1 score on WNUT17 dataset is achieved.

computational linguistic, information retrieval, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2303.02915

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
(20 more...)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Implementation of a noisy hyperlink removal system: A semantic and relatedness approach

Taghandiki, Kazem, Ehsan, Elnaz Rezaei

arXiv.org Artificial IntelligenceMar-6-2023

As the volume of data on the web grows, the web structure graph, which is a graph representation of the web, continues to evolve. The structure of this graph has gradually shifted from content-based to non-content-based. Furthermore, spam data, such as noisy hyperlinks, in the web structure graph adversely affect the speed and efficiency of information retrieval and link mining algorithms. Previous works in this area have focused on removing noisy hyperlinks using structural and string approaches. However, these approaches may incorrectly remove useful links or be unable to detect noisy hyperlinks in certain circumstances. In this paper, a data collection of hyperlinks is initially constructed using an interactive crawler. The semantic and relatedness structure of the hyperlinks is then studied through semantic web approaches and tools such as the DBpedia ontology. Finally, the removal process of noisy hyperlinks is carried out using a reasoner on the DBpedia ontology. Our experiments demonstrate the accuracy and ability of semantic web technologies to remove noisy hyperlinks

data mining, information retrieval, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2303.03321

Country:

Europe > Spain > Galicia > Madrid (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Afghanistan (0.04)

Genre:

Workflow (0.68)
Research Report (0.64)

Industry:

Information Technology > Security & Privacy (1.00)
Leisure & Entertainment (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web > Semantic Web (1.00)
(3 more...)

Add feedback