AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Multi-Granularity Matching Attention Network for Query Intent Classification in E-commerce Retrieval

Yuan, Chunyuan, Qiu, Yiming, Li, Mingming, Hu, Haiqing, Wang, Songlin, Xu, Sulong

arXiv.org Artificial IntelligenceMar-28-2023

Query intent classification, which aims at assisting customers to find desired products, has become an essential component of the e-commerce search. Existing query intent classification models either design more exquisite models to enhance the representation learning of queries or explore label-graph and multi-task to facilitate models to learn external information. However, these models cannot capture multi-granularity matching features from queries and categories, which makes them hard to mitigate the gap in the expression between informal queries and categories. This paper proposes a Multi-granularity Matching Attention Network (MMAN), which contains three modules: a self-matching module, a char-level matching module, and a semantic-level matching module to comprehensively extract features from the query and a query-category interaction matrix. In this way, the model can eliminate the difference in expression between queries and categories for query intent classification. We conduct extensive offline and online A/B experiments, and the results show that the MMAN significantly outperforms the strong baselines, which shows the superiority and effectiveness of MMAN. MMAN has been deployed in production and brings great commercial value for our company.

category, information retrieval, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3543873.3584639

2303.1587

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > New York > New York County > New York City (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Services > e-Commerce Services (0.73)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

Large-scale Training Data Search for Object Re-identification

Yao, Yue, Lei, Huan, Gedeon, Tom, Zheng, Liang

arXiv.org Artificial IntelligenceMar-28-2023

We consider a scenario where we have access to the target domain, but cannot afford on-the-fly training data annotation, and instead would like to construct an alternative training set from a large-scale data pool such that a competitive model can be obtained. We propose a search and pruning (SnP) solution to this training data search problem, tailored to object re-identification (re-ID), an application aiming to match the same object captured by different cameras. Specifically, the search stage identifies and merges clusters of source identities which exhibit similar distributions with the target domain. The second stage, subject to a budget, then selects identities and their images from the Stage I output, to control the size of the resulting training set for efficient training. The two steps provide us with training sets 80\% smaller than the source pool while achieving a similar or even higher re-ID accuracy. These training sets are also shown to be superior to a few existing search methods such as random sampling and greedy sampling under the same budget on training data size. If we release the budget, training sets resulting from the first stage alone allow even higher re-ID accuracy. We provide interesting discussions on the specificity of our method to the re-ID problem and particularly its role in bridging the re-ID domain gap. The code is available at https://github.com/yorkeyao/SnP.

accuracy, information retrieval, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2303.16186

Country:

Asia > Middle East > Israel (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.86)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.67)

Add feedback

Hierarchical Video-Moment Retrieval and Step-Captioning

Zala, Abhay, Cho, Jaemin, Kottur, Satwik, Chen, Xilun, Oğuz, Barlas, Mehdad, Yasher, Bansal, Mohit

arXiv.org Artificial IntelligenceMar-28-2023

There is growing interest in searching for information from large video corpora. Prior works have studied relevant tasks, such as text-based video retrieval, moment retrieval, video summarization, and video captioning in isolation, without an end-to-end setup that can jointly search from video corpora and generate summaries. Such an end-to-end setup would allow for many interesting applications, e.g., a text-based search that finds a relevant video from a video corpus, extracts the most relevant moment from that video, and segments the moment into important steps with captions. To address this, we present the HiREST (HIerarchical REtrieval and STep-captioning) dataset and propose a new benchmark that covers hierarchical information retrieval and visual/textual stepwise summarization from an instructional video corpus. HiREST consists of 3.4K text-video pairs from an instructional video dataset, where 1.1K videos have annotations of moment spans relevant to text query and breakdown of each moment into key instruction steps with caption and timestamps (totaling 8.6K step captions). Our hierarchical benchmark consists of video retrieval, moment retrieval, and two novel moment segmentation and step captioning tasks. In moment segmentation, models break down a video moment into instruction steps and identify start-end boundaries. In step captioning, models generate a textual summary for each step. We also present starting point task-specific and end-to-end joint baseline models for our new benchmark. While the baseline models show some promising results, there still exists large room for future improvement by the community. Project website: https://hirest-cvpr2023.github.io

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.16406

Country:

North America > United States > North Carolina (0.04)
Asia > China > Hong Kong (0.04)

Genre:

Workflow (0.66)
Research Report (0.50)
Instructional Material > Course Syllabus & Notes (0.45)

Industry: Education > Educational Technology (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Distributed Subweb Specifications for Traversing the Web

Bogaerts, Bart, Ketsman, Bas, Zeboudj, Younes, Aamer, Heba, Taelman, Ruben, Verborgh, Ruben

arXiv.org Artificial IntelligenceMar-27-2023

Link Traversal-based Query Processing (ltqp), in which a sparql query is evaluated over a web of documents rather than a single dataset, is often seen as a theoretically interesting yet impractical technique. However, in a time where the hypercentralization of data has increasingly come under scrutiny, a decentralized Web of Data with a simple document-based interface is appealing, as it enables data publishers to control their data and access rights. While ltqp allows evaluating complex queries over such webs, it suffers from performance issues (due to the high number of documents containing data) as well as information quality concerns (due to the many sources providing such documents). In existing ltqp approaches, the burden of finding sources to query is entirely in the hands of the data consumer. In this paper, we argue that to solve these issues, data publishers should also be able to suggest sources of interest and guide the data consumer towards relevant and trustworthy data. We introduce a theoretical framework that enables such guided link traversal and study its properties. We illustrate with a theoretic example that this can improve query results and reduce the number of network requests. We evaluate our proposal experimentally on a virtual linked web with specifications and indeed observe that not just the data quality but also the efficiency of querying improves.

artificial intelligence, natural language, specification, (14 more...)

arXiv.org Artificial Intelligence

2302.14411

Country:

South America > Peru > Lima Department > Lima Province > Lima (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (0.81)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
Information Technology > Data Science (1.00)
(3 more...)

Add feedback

An ontology-aided, natural language-based approach for multi-constraint BIM model querying

Yin, Mengtian, Tang, Llewellyn, Webster, Chris, Xu, Shen, Li, Xiongyi, Ying, Huaquan

arXiv.org Artificial IntelligenceMar-27-2023

Being able to efficiently retrieve the required building information is critical for construction project stakeholders to carry out their engineering and management activities. Natural language interface (NLI) systems are emerging as a time and cost-effective way to query Building Information Models (BIMs). However, the existing methods cannot logically combine different constraints to perform fine-grained queries, dampening the usability of natural language (NL)-based BIM queries. This paper presents a novel ontology-aided semantic parser to automatically map natural language queries (NLQs) that contain different attribute and relational constraints into computer-readable codes for querying complex BIM models. First, a modular ontology was developed to represent NL expressions of Industry Foundation Classes (IFC) concepts and relationships, and was then populated with entities from target BIM models to assimilate project-specific information. Hereafter, the ontology-aided semantic parser progressively extracts concepts, relationships, and value restrictions from NLQs to fully identify constraint conditions, resulting in standard SPARQL queries with reasoning rules to successfully retrieve IFC-based BIM models. The approach was evaluated based on 225 NLQs collected from BIM users, with a 91% accuracy rate. Finally, a case study about the design-checking of a real-world residential building demonstrates the practical value of the proposed approach in the construction industry.

artificial intelligence, information retrieval query processing, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.15116

Country:

Asia > China > Hong Kong (0.05)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry:

Construction & Engineering (1.00)
Banking & Finance > Real Estate (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.68)

Add feedback

Sejarah dan Perkembangan Teknik Natural Language Processing (NLP) Bahasa Indonesia: Tinjauan tentang sejarah, perkembangan teknologi, dan aplikasi NLP dalam bahasa Indonesia

Amien, Mukhlis

arXiv.org Artificial IntelligenceMar-27-2023

This study provides an overview of the history of the development of Natural Language Processing (NLP) in the context of the Indonesian language, with a focus on the basic technologies, methods, and practical applications that have been developed. This review covers developments in basic NLP technologies such as stemming, part-of-speech tagging, and related methods; practical applications in cross-language information retrieval systems, information extraction, and sentiment analysis; and methods and techniques used in Indonesian language NLP research, such as machine learning, statistics-based machine translation, and conflict-based approaches. This study also explores the application of NLP in Indonesian language industry and research and identifies challenges and opportunities in Indonesian language NLP research and development. Recommendations for future Indonesian language NLP research and development include developing more efficient methods and technologies, expanding NLP applications, increasing sustainability, further research into the potential of NLP, and promoting interdisciplinary collaboration. It is hoped that this review will help researchers, practitioners, and the government to understand the development of Indonesian language NLP and identify opportunities for further research and development. Designing an indonesian part of speech tagset and manually tagged indonesian corpus.

information retrieval, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.02746

Country:

Asia > Indonesia (1.00)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > Vietnam > Hanoi > Hanoi (0.04)

Genre:

Overview (1.00)
Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases

Ren, Hongyu, Galkin, Mikhail, Cochez, Michael, Zhu, Zhaocheng, Leskovec, Jure

arXiv.org Artificial IntelligenceMar-26-2023

Complex logical query answering (CLQA) is a recently emerged task of graph machine learning that goes beyond simple one-hop link prediction and solves a far more complex task of multi-hop logical reasoning over massive, potentially incomplete graphs in a latent space. The task received a significant traction in the community; numerous works expanded the field along theoretical and practical axes to tackle different types of complex queries and graph modalities with efficient systems. In this paper, we provide a holistic survey of CLQA with a detailed taxonomy studying the field from multiple angles, including graph types (modality, reasoning domain, background semantics), modeling aspects (encoder, processor, decoder), supported queries (operators, patterns, projected variables), datasets, evaluation metrics, and applications. Refining the CLQA task, we introduce the concept of Neural Graph Databases (NGDBs). Extending the idea of graph databases (graph DBs), NGDB consists of a Neural Graph Storage and a Neural Graph Engine. Inside Neural Graph Storage, we design a graph store, a feature store, and further embed information in a latent embedding store using an encoder. Given a query, Neural Query Engine learns how to perform query planning and execution in order to efficiently retrieve the correct results by interacting with the Neural Graph Storage. Compared with traditional graph DBs, NGDBs allow for a flexible and unified modeling of features in diverse modalities using the embedding store. Moreover, when the graph is incomplete, they can provide robust retrieval of answers which a normal graph DB cannot recover. Finally, we point out promising directions, unsolved problems and applications of NGDB for future research.

machine learning, natural language, question answering, (22 more...)

arXiv.org Artificial Intelligence

2303.14617

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Virginia (0.04)
(6 more...)

Genre:

Workflow (1.00)
Instructional Material (0.65)

Industry:

Education (1.00)
Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Thistle: A Vector Database in Rust

Windsor, Brad, Choi, Kevin

arXiv.org Artificial IntelligenceMar-25-2023

We present Thistle, a fully functional vector database. Thistle is an entry into the domain of latent knowledge use in answering search queries, an ongoing research topic at both start-ups and search engine companies. We implement Thistle with several well-known algorithms, and benchmark results on the MS MARCO dataset. Results help clarify the latent knowledge domain as well as the growing Rust ML ecosystem.

information retrieval, machine learning, thistle, (20 more...)

arXiv.org Artificial Intelligence

2303.1678

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Information Management > Search (0.89)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Knowledge Graphs: Opportunities and Challenges

Peng, Ciyuan, Xia, Feng, Naseriparsa, Mehdi, Osborne, Francesco

arXiv.org Artificial IntelligenceMar-24-2023

With the explosive growth of artificial intelligence (AI) and big data, it has become vitally important to organize and represent the enormous volume of knowledge appropriately. As graph data, knowledge graphs accumulate and convey knowledge of the real world. It has been well-recognized that knowledge graphs effectively represent complex information; hence, they rapidly gain the attention of academia and industry in recent years. Thus to develop a deeper understanding of knowledge graphs, this paper presents a systematic overview of this field. Specifically, we focus on the opportunities and challenges of knowledge graphs. We first review the opportunities of knowledge graphs in terms of two aspects: (1) AI systems built upon knowledge graphs; (2) potential application fields of knowledge graphs. Then, we thoroughly discuss severe technical challenges in this field, such as knowledge graph embeddings, knowledge acquisition, knowledge graph completion, knowledge fusion, and knowledge reasoning. We expect that this survey will shed new light on future research and the development of knowledge graphs.

artificial intelligence, information retrieval query processing, natural language, (12 more...)

arXiv.org Artificial Intelligence

2303.13948

Country:

Oceania > Australia > Queensland > Brisbane (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Buckinghamshire > Milton Keynes (0.04)
(2 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Media (0.93)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.46)

Add feedback

Applications of statistical causal inference in software engineering

Siebert, Julien

arXiv.org Artificial IntelligenceMar-23-2023

This paper focuses on the application of one type of empirical methods, namely statistical causal inference (SCI, see section 2). Such methods have their roots in a number of applied fields (from AI to econometrics) and aim to provide a framework for making valid inferences about causal effects based on interventional or observational data. More specifically, we focus on SCI methods that use graphical models as developed by Pearl and colleagues [1, 2]. This framework has been shown to be equivalent of the potential-outcomes framework (also called the Neyman-Rubin Causal Model [3]) but enriches it by making use of an explicit causal structure called a graphical causal model. Making assumptions about causal effects explicit through a graphical structure has several advantages. First, it helps determine whether causal effects can be estimated and how they might be estimated (see section 2).

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.infsof.2023.107198

2211.11482

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Ohio > Cuyahoga County > Cleveland (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)
(5 more...)

Genre:

Overview (1.00)
Instructional Material > Course Syllabus & Notes (0.94)
Research Report > Experimental Study (0.93)
Workflow (0.92)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.56)
(3 more...)

Add feedback