AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Visually Grounded Keyword Detection and Localisation for Low-Resource Languages

Olaleye, Kayode Kolawole

arXiv.org Artificial IntelligenceFeb-1-2023

This study investigates the use of Visually Grounded Speech (VGS) models for keyword localisation in speech. The study focusses on two main research questions: (1) Is keyword localisation possible with VGS models and (2) Can keyword localisation be done cross-lingually in a real low-resource setting? Four methods for localisation are proposed and evaluated on an English dataset, with the best-performing method achieving an accuracy of 57%. A new dataset containing spoken captions in Yoruba language is also collected and released for cross-lingual keyword localisation. The cross-lingual model obtains a precision of 16% in actual keyword localisation and this performance can be improved by initialising from a model pretrained on English data. The study presents a detailed analysis of the model's success and failure modes and highlights the challenges of using VGS models for keyword localisation in low-resource settings.

information retrieval, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2302.00765

Country:

Africa > Nigeria (0.04)
North America > United States > New York (0.04)
Asia > China (0.04)
Africa > Zimbabwe > Masvingo (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment > Sports (1.00)
Education (1.00)
Media (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(10 more...)

Add feedback

Automated Sentiment and Hate Speech Analysis of Facebook Data by Employing Multilingual Transformer Models

Manuvie, Ritumbra, Chatterjee, Saikat

arXiv.org Artificial IntelligenceJan-31-2023

In recent years, there has been a heightened consensus - both within academia and in the public discourse - that Social Media Platforms (SMPs), amplify the spread of hateful and negative sentiment content. Researchers have identified how hateful content, political propaganda, and targeted messaging contributed to real-world harms including insurrections against democratically elected governments, genocide, and breakdown of social cohesion due to heightened negative discourse towards certain communities in parts of the world. To counter these issues, SMPs have created semi-automated systems that can help identify toxic speech. In this paper we analyse the statistical distribution of hateful and negative sentiment contents within a representative Facebook dataset (n= 604,703) scrapped through 648 public Facebook pages which identify themselves as proponents (and followers) of far-right Hindutva actors. These pages were identified manually using keyword searches on Facebook and on CrowdTangleand classified as far-right Hindutva pages based on page names, page descriptions, and discourses shared on these pages. We employ state-of-the-art, open-source XLM-T multilingual transformer-based language models to perform sentiment and hate speech analysis of the textual contents shared on these pages over a period of 5.5 years. The result shows the statistical distributions of the predicted sentiment and the hate speech labels; top actors, and top page categories. We further discuss the benchmark performances and limitations of these pre-trained language models.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.13668

Country:

Asia > India (0.05)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > California > San Mateo County > Menlo Park (0.04)
Europe > Netherlands > South Holland > The Hague (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Media (1.00)
Information Technology > Services (1.00)
Government (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.54)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Archive TimeLine Summarization (ATLS): Conceptual Framework for Timeline Generation over Historical Document Collections

Gutehrlé, Nicolas, Doucet, Antoine, Jatowt, Adam

arXiv.org Artificial IntelligenceJan-31-2023

Archive collections are nowadays mostly available through search engines interfaces, which allow a user to retrieve documents by issuing queries. The study of these collections may be, however, impaired by some aspects of search engines, such as the overwhelming number of documents returned or the lack of contextual knowledge provided. New methods that could work independently or in combination with search engines are then required to access these collections. In this position paper, we propose to extend TimeLine Summarization (TLS) methods on archive collections to assist in their studies. We provide an overview of existing TLS methods and we describe a conceptual framework for an Archive TimeLine Summarization (ATLS) system, which aims to generate informative, readable and interpretable timelines.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.13479

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > France (0.04)
(18 more...)

Genre:

Overview (0.86)
Research Report (0.64)

Industry: Media > News (0.47)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Priors are Powerful: Improving a Transformer for Multi-camera 3D Detection with 2D Priors

Feng, Di, Ferroni, Francesco

arXiv.org Artificial IntelligenceJan-31-2023

Transfomer-based approaches advance the recent development of multi-camera 3D detection both in academia and industry. In a vanilla transformer architecture, queries are randomly initialised and optimised for the whole dataset, without considering the differences among input frames. In this work, we propose to leverage the predictions from an image backbone, which is often highly optimised for 2D tasks, as priors to the transformer part of a 3D detection network. The method works by (1). augmenting image feature maps with 2D priors, (2). sampling query locations via ray-casting along 2D box centroids, as well as (3). initialising query features with object-level image features. Experimental results shows that 2D priors not only help the model converge faster, but also largely improve the baseline approach by up to 12% in terms of average precision.

detection, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2301.13592

Country:

North America > United States (0.04)
Europe (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

what-is-vector-similarity-search-how-is-it-useful

#artificialintelligenceJan-30-2023, 23:49:52 GMT

Modern data search is a complex domain. Vector similarity search, or VSS, represents data with contextual depth and returns more relevant information to the consumers in response to a search query. Let's take a simple example. Search queries like "data science" and "science fiction" refer to different types of content despite both having a common word ("science"). A traditional search technique would match common phrases to return relevant results, which would be inaccurate in this case.

query, representation, similarity search, (12 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.37)

Add feedback

GE-Blender: Graph-Based Knowledge Enhancement for Blender

Lian, Xiaolei, Tang, Xunzhu, Wang, Yue

arXiv.org Artificial IntelligenceJan-30-2023

Although the great success of open-domain dialogue generation, unseen entities can have a large impact on the dialogue generation task. It leads to performance degradation of the model in the dialog generation. Previous researches used retrieved knowledge of seen entities as the auxiliary data to enhance the representation of the model. Nevertheless, logical explanation of unseen entities remains unexplored, such as possible co-occurrence or semantically similar words of them and their entity category. In this work, we propose an approach to address the challenge above. We construct a graph by extracting entity nodes in them, enhancing the representation of the context of the unseen entity with the entity's 1-hop surrounding nodes. Furthermore, We added the named entity tag prediction task to apply the problem that the unseen entity does not exist in the graph. We conduct our experiments on an open dataset Wizard of Wikipedia and the empirical results indicate that our approach outperforms the state-of-the-art approaches on Wizard of Wikipedia.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2301.1285

Country:

Europe > France (0.04)
Asia > China (0.04)
South America (0.04)
(3 more...)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.70)
Health & Medicine > Therapeutic Area > Immunology (0.48)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)

Add feedback

ContCommRTD: A Distributed Content-based Misinformation-aware Community Detection System for Real-Time Disaster Reporting

Apostol, Elena-Simona, Truică, Ciprian-Octavian, Paschke, Adrian

arXiv.org Artificial IntelligenceJan-30-2023

Real-time social media data can provide useful information on evolving hazards. Alongside traditional methods of disaster detection, the integration of social media data can considerably enhance disaster management. In this paper, we investigate the problem of detecting geolocation-content communities on Twitter and propose a novel distributed system that provides in near real-time information on hazard-related events and their evolution. We show that content-based community analysis leads to better and faster dissemination of reports on hazards. Our distributed disaster reporting system analyzes the social relationship among worldwide geolocated tweets, and applies topic modeling to group tweets by topics. Considering for each tweet the following information: user, timestamp, geolocation, retweets, and replies, we create a publisher-subscriber distribution model for topics. We use content similarity and the proximity of nodes to create a new model for geolocation-content based communities. Users can subscribe to different topics in specific geographical areas or worldwide and receive real-time reports regarding these topics. As misinformation can lead to increase damage if propagated in hazards related tweets, we propose a new deep learning model to detect fake news. The misinformed tweets are then removed from display. We also show empirically the scalability capabilities of the proposed system.

information retrieval, machine learning, real time system, (23 more...)

arXiv.org Artificial Intelligence

2301.12984

Country:

Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.04)
North America > United States > South Carolina (0.04)
Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.95)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.70)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
(3 more...)

Add feedback

GPU-based Private Information Retrieval for On-Device Machine Learning Inference

#artificialintelligenceJan-29-2023, 14:55:35 GMT

GPU-based Private Information Retrieval for On-Device Machine Learning Inference | Maximilian Lam, Jeff Johnson, Wenjie Xiong, Kiwan Maeng, Udit Gupta, Minsoo Rhu, Hsien-Hsin S. Lee, Vijay Janapa Reddi, Gu-Yeon Wei, David Brooks, Edward Suh | Computer science, Information Retrieval, Machine learning, nVidia, nVidia V100, Security

artificial intelligence, gpu-based private information retrieval, information management, (2 more...)

#artificialintelligence

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.99)

Add feedback

DocILE 2023 Teaser: Document Information Localization and Extraction

Šimsa, Štěpán, Šulc, Milan, Skalický, Matyáš, Patel, Yash, Hamdi, Ahmed

arXiv.org Artificial IntelligenceJan-29-2023

The lack of data for information extraction (IE) from semi-structured business documents is a real problem for the IE community. Publications relying on large-scale datasets use only proprietary, unpublished data due to the sensitive nature of such documents. Publicly available datasets are mostly small and domain-specific. The absence of a large-scale public dataset or benchmark hinders the reproducibility and cross-evaluation of published methods. The DocILE 2023 competition, hosted as a lab at the CLEF 2023 conference and as an ICDAR 2023 competition, will run the first major benchmark for the tasks of Key Information Localization and Extraction (KILE) and Line Item Recognition (LIR) from business documents. With thousands of annotated real documents from open sources, a hundred thousand of generated synthetic documents, and nearly a million unlabeled documents, the DocILE lab comes with the largest publicly available dataset for KILE and LIR. We are looking forward to contributions from the Computer Vision, Natural Language Processing, Information Retrieval, and other communities. The data, baselines, code and up-to-date information about the lab and competition are available at https://docile.rossum.ai/.

artificial intelligence, information retrieval, natural language, (16 more...)

arXiv.org Artificial Intelligence

2301.12394

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > United Kingdom (0.04)
Europe > France (0.04)
(2 more...)

Genre: Research Report (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback

G-Rank: Unsupervised Continuous Learn-to-Rank for Edge Devices in a P2P Network

Gold, Andrew, Pouwelse, Johan

arXiv.org Artificial IntelligenceJan-29-2023

Ranking algorithms in traditional search engines are powered by enormous training data sets that are meticulously engineered and curated by a centralized entity. Decentralized peer-to-peer (p2p) networks such as torrenting applications and Web3 protocols deliberately eschew centralized databases and computational architectures when designing services and features. As such, robust search-and-rank algorithms designed for such domains must be engineered specifically for decentralized networks, and must be lightweight enough to operate on consumer-grade personal devices such as a smartphone or laptop computer. We introduce G-Rank, an unsupervised ranking algorithm designed exclusively for decentralized networks. We demonstrate that accurate, relevant ranking results can be achieved in fully decentralized networks without any centralized data aggregation, feature engineering, or model training. Furthermore, we show that such results are obtainable with minimal data preprocessing and computational overhead, and can still return highly relevant results even when a user's device is disconnected from the network. G-Rank is highly modular in design, is not limited to categorical data, and can be implemented in a variety of domains with minimal modification. The results herein show that unsupervised ranking models designed for decentralized p2p networks are not only viable, but worthy of further research.

data mining, machine learning, node, (21 more...)

arXiv.org Artificial Intelligence

2301.1253

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > United States > Hawaii (0.04)
North America > United States > California > Yolo County > Davis (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(4 more...)

Add feedback