AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Evaluating Named Entity Recognition: Comparative Analysis of Mono- and Multilingual Transformer Models on Brazilian Corporate Earnings Call Transcriptions

Abilio, Ramon, Coelho, Guilherme Palermo, da Silva, Ana Estela Antunes

arXiv.org Artificial IntelligenceMar-18-2024

Named Entity Recognition (NER) is a Natural Language Processing technique for extracting information from textual documents. However, much of the existing research on NER has been centered around English-language documents, leaving a gap in the availability of datasets tailored to the financial domain in Portuguese. This study addresses the need for NER within the financial domain, focusing on Portuguese-language texts extracted from earnings call transcriptions of Brazilian banks. By curating a comprehensive dataset comprising 384 transcriptions and leveraging weak supervision techniques for annotation, we evaluate the performance of monolingual models trained on Portuguese (BERTimbau and PTT5) and multilingual models (mBERT and mT5). Notably, we introduce a novel approach that reframes the token classification task as a text generation problem, enabling fine-tuning and evaluation of T5 models. Following the fine-tuning of the models, we conduct an evaluation on the test dataset, employing performance and error metrics. Our findings reveal that BERT-based models consistently outperform T5-based models. Furthermore, while the multilingual models exhibit comparable macro F1-scores, BERTimbau demonstrates superior performance over PTT5. A manual analysis of sentences generated by PTT5 and mT5 unveils a degree of similarity ranging from 0.89 to 1.0, between the original and generated sentences. However, critical errors emerge as both models exhibit discrepancies, such as alterations to monetary and percentage values, underscoring the importance of accuracy and consistency in the financial domain. Despite these challenges, PTT5 and mT5 achieve impressive macro F1-scores of 98.52% and 98.85%, respectively, with our proposed approach. Furthermore, our study sheds light on notable disparities in memory and time consumption for inference across the models.

dataset, subword, transcription, (13 more...)

arXiv.org Artificial Intelligence

2403.12212

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York (0.04)
South America > Brazil > São Paulo (0.04)
(5 more...)

Genre: Research Report > New Finding (0.68)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bias, Skew, and Search Engines Are Sufficient to Explain Online Toxicity

Communications of the ACMMar-15-2024, 14:06:24 GMT

U.S. political discourse seems to have fissioned into discrete bubbles, each reflecting its own distorted image of the world. Many blame machine-learning algorithms that purportedly maximize "engagement"--serving up content that keeps YouTube or Facebook users watching videos or scrolling through their feeds--for radicalizing users or strengthening their partisanship. Sociologist Shoshana Zuboff15 even argues that "surveillance capitalism" uses optimized algorithmic feedback for "automated behavioral modification" at scale, writing the "music" that users then "dance" to. There is debate whether such algorithms in fact maximize engagement (their objective functions also typically contain other desiderata). More recent research3 offers an alternative explanation, suggesting that people consume this content because they want it, independent of the algorithm.

algorithm, interface, social media algorithm, (7 more...)

Communications of the ACM

Country: North America > United States (0.06)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Learning to Search Efficiently in High Dimensions Zhen Li Tong Zhang Yihong Gong Thomas S. Huang

Neural Information Processing SystemsMar-15-2024, 13:08:35 GMT

High dimensional similarity search in large scale databases becomes an important challenge due to the advent of Internet. For such applications, specialized data structures are required to achieve computational efficiency. Traditional approaches relied on algorithmic constructions that are often data independent (such as Locality Sensitive Hashing) or weakly dependent (such as kd-trees, k-means trees). While supervised learning algorithms have been applied to related problems, those proposed in the literature mainly focused on learning hash codes optimized for compact embedding of the data rather than search efficiency. Consequently such an embedding has to be used with linear scan or another search algorithm.

algorithm, computational cost, selection function, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts (0.04)
Asia > China (0.04)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)

Add feedback

Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

Neural Information Processing SystemsMar-15-2024, 04:59:16 GMT

Given a set V of n elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels).

algorithm, denote, probability, (15 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Haifa District > Haifa (0.04)
Asia > Afghanistan > Parwan Province > Charikar (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.61)

Add feedback

An Exact Algorithm for F Measure Maximization

Neural Information Processing SystemsMar-15-2024, 03:59:59 GMT

The F-measure, originally introduced in information retrieval, is nowadays routinely used as a performance metric for problems such as binary classification, multi-label classification, and structured output prediction. Optimizing this measure remains a statistically and computationally challenging problem, since no closed-form maximizer exists. Current algorithms are approximate and typically rely on additional assumptions regarding the statistical distribution of the binary response variables. In this paper, we present an algorithm which is not only computationally efficient but also exact, regardless of the underlying distribution. The algorithm requires only a quadratic number of parameters of the joint distribution (with respect to the number of binary responses). We illustrate its practical performance by means of experimental results for multi-label classification.

algorithm, classification, f-measure, (15 more...)

Neural Information Processing Systems

Country:

Europe > Poland > Greater Poland Province > Poznań (0.05)
Europe > Germany (0.04)
Europe > Belgium > Flanders (0.04)
Asia > Taiwan (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.48)

Add feedback

Active Ranking using Pairwise Comparisons Kevin G. Jamieson Robert D. Nowak University of Wisconsin University of Wisconsin Madison, WI53706, USA

Neural Information Processing SystemsMar-15-2024, 03:58:01 GMT

This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects).

pairwise comparison, query, ranking, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.50)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Linear Submodular Bandits and their Application to Diversified Retrieval

Neural Information Processing SystemsMar-15-2024, 00:46:52 GMT

Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems. In this paper, we propose the linear submodular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models for diversified retrieval.

algorithm, lsbg reedy, recommendation, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Middle East (0.04)
Africa > Middle East (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.68)

Industry: Education > Educational Setting (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)
Information Technology > Data Science > Data Mining > Big Data (0.49)

Add feedback

MAGPIE: Multi-Task Media-Bias Analysis Generalization for Pre-Trained Identification of Expressions

Horych, Tomáš, Wessel, Martin, Wahle, Jan Philip, Ruas, Terry, Waßmuth, Jerome, Greiner-Petter, André, Aizawa, Akiko, Gipp, Bela, Spinde, Timo

arXiv.org Artificial IntelligenceMar-15-2024

Media bias detection poses a complex, multifaceted problem traditionally tackled using single-task models and small in-domain datasets, consequently lacking generalizability. To address this, we introduce MAGPIE, the first large-scale multi-task pre-training approach explicitly tailored for media bias detection. To enable pre-training at scale, we present Large Bias Mixture (LBM), a compilation of 59 bias-related tasks. MAGPIE outperforms previous approaches in media bias detection on the Bias Annotation By Experts (BABE) dataset, with a relative improvement of 3.3% F1-score. MAGPIE also performs better than previous models on 5 out of 8 tasks in the Media Bias Identification Benchmark (MBIB). Using a RoBERTa encoder, MAGPIE needs only 15% of finetuning steps compared to single-task approaches. Our evaluation shows, for instance, that tasks like sentiment and emotionality boost all learning, all tasks enhance fake news detection, and scaling tasks leads to the best results. MAGPIE confirms that MTL is a promising approach for addressing media bias detection, enhancing the accuracy and efficiency of existing models. Furthermore, LBM is the first available resource collection focused on media bias MTL.

computational linguistic, dataset, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2403.0791

Country:

Europe > Germany > Lower Saxony > Gottingen (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
(32 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)
Media > News (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

A comprehensive study on Frequent Pattern Mining and Clustering categories for topic detection in Persian text stream

Zafarani-Moattar, Elnaz, Kangavari, Mohammad Reza, Rahmani, Amir Masoud

arXiv.org Artificial IntelligenceMar-15-2024

Topic detection is a complex process and depends on language because it somehow needs to analyze text. There have been few studies on topic detection in Persian, and the existing algorithms are not remarkable. Therefore, we aimed to study topic detection in Persian. The objectives of this study are: 1) to conduct an extensive study on the best algorithms for topic detection, 2) to identify necessary adaptations to make these algorithms suitable for the Persian language, and 3) to evaluate their performance on Persian social network texts. To achieve these objectives, we have formulated two research questions: First, considering the lack of research in Persian, what modifications should be made to existing frameworks, especially those developed in English, to make them compatible with Persian? Second, how do these algorithms perform, and which one is superior? There are various topic detection methods that can be categorized into different categories. Frequent pattern and clustering are selected for this research, and a hybrid of both is proposed as a new category. Then, ten methods from these three categories are selected. All of them are re-implemented from scratch, changed, and adapted with Persian. These ten methods encompass different types of topic detection methods and have shown good performance in English. The text of Persian social network posts is used as the dataset. Additionally, a new multiclass evaluation criterion, called FS, is used in this paper for the first time in the field of topic detection. Approximately 1.4 billion tokens are processed during experiments. The results indicate that if we are searching for keyword-topics that are easily understandable by humans, the hybrid category is better. However, if the aim is to cluster posts for further analysis, the frequent pattern category is more suitable.

category, detection, topic detection, (15 more...)

arXiv.org Artificial Intelligence

2403.10237

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Asia > Taiwan (0.04)
Asia > Middle East > Iran > East Azerbaijan Province > Tabriz (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.87)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

Query Complexity of Derivative-Free Optimization

Neural Information Processing SystemsMar-14-2024, 19:36:55 GMT

This paper provides lower bounds on the convergence rate of Derivative Free Optimization (DFO) with noisy function evaluations, exposing a fundamental and unavoidable gap between the performance of algorithms with access to gradients and those with access to only function evaluations. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions. A distinctive feature of the algorithm is that it uses only Boolean-valued function comparisons, rather than function evaluations. This makes the algorithm useful in an even wider range of applications, such as optimization based on paired comparisons from human subjects, for example. We also show that regardless of whether DFO is based on noisy function evaluations or Boolean-valued function comparisons, the convergence rate is the same.

apple, comparison oracle, oracle, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.40)

Add feedback