AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

A Comprehensive Guide for Interview Questions on Classical NLP

#artificialintelligenceDec-10-2022, 00:55:53 GMT

This article was published as a part of the Data Science Blogathon. As it is common knowledge that natural language processing is one of the most popular and competitive in the current global IT sector. All of the top organizations and budding startups are on the lookout for candidates with strong NLP-related skills. Natural Language Processing (NLP) is the field at the intersection of Linguistics, Computer Science, and Artificial Intelligence. It is the technology that allows machines to understand, analyze, manipulate, and interpret human languages.

artificial intelligence, information retrieval, natural language, (18 more...)

#artificialintelligence

Country:

Asia > India > Andhra Pradesh (0.05)
Asia > India > Uttar Pradesh (0.04)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.31)

Add feedback

DualNER: A Dual-Teaching framework for Zero-shot Cross-lingual Named Entity Recognition

Zeng, Jiali, Jiang, Yufan, Yin, Yongjing, Wang, Xu, Lin, Binghuai, Cao, Yunbo

arXiv.org Artificial IntelligenceDec-10-2022

We present DualNER, a simple and effective framework to make full use of both annotated source language corpus and unlabeled target language text for zero-shot cross-lingual named entity recognition (NER). In particular, we combine two complementary learning paradigms of NER, i.e., sequence labeling and span prediction, into a unified multi-task framework. After obtaining a sufficient NER model trained on the source data, we further train it on the target data in a {\it dual-teaching} manner, in which the pseudo-labels for one task are constructed from the prediction of the other task. Moreover, based on the span prediction, an entity-aware regularization is proposed to enhance the intrinsic cross-lingual alignment between the same entities in different languages. Experiments and analysis demonstrate the effectiveness of our DualNER. Code is available at https://github.com/lemon0830/dualNER.

computational linguistic, information retrieval, natural language, (16 more...)

arXiv.org Artificial Intelligence

2211.08104

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Add feedback

Information retrieval in single cell chromatin analysis using TF-IDF transformation methods

Zandigohar, Mehrdad, Dai, Yang

arXiv.org Artificial IntelligenceDec-9-2022

Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) assesses genome-wide chromatin accessibility in thousands of cells to reveal regulatory landscapes in high resolutions. However, the analysis presents challenges due to the high dimensionality and sparsity of the data. Several methods have been developed, including transformation techniques of term-frequency inverse-document frequency (TF-IDF), dimension reduction methods such as singular value decomposition (SVD), factor analysis, and autoencoders. Yet, a comprehensive study on the mentioned methods has not been fully performed. It is not clear what is the best practice when analyzing scATAC-seq data. We compared several scenarios for transformation and dimension reduction as well as the SVD-based feature analysis to investigate potential enhancements in scATAC-seq information retrieval. Additionally, we investigate if autoencoders benefit from the TF-IDF transformation. Our results reveal that the TF-IDF transformation generally leads to improved clustering and biologically relevant feature extraction.

autoencoder, information retrieval, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2212.05184

Country: North America > United States > Illinois > Cook County > Chicago (0.05)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)

Add feedback

How AI search is overcoming the unstructured data challenge

#artificialintelligenceDec-8-2022, 16:05:07 GMT

With 80 per cent of company data being unstructured, including text, images and video, getting the most possible value from rising amounts of these assets is proving a challenge across all business sectors. Businesses often meet pitfalls in keyword search capabilities that fail to properly take context, formats or languages into account, leaving users with insufficient results. To solve this challenge, Barcelona-headquartered data startup Nuclia is delivering an API that leverages what company CEO and co-founder Eudald Camprubi has named'AI search as a service', capable of finding and indexing data across any source. An end-to-end solution, it can extract data from file repositories, audio, video, URLs and databases, split it into paragraphs, and present an index that shows exactly where any chosen piece of information is in the file. This is based on continuously trained language models, the creation of which owes much to data annotation.

ai search, nuclia, unstructured data, (9 more...)

#artificialintelligence

Country:

North America > United States (0.06)
Europe > United Kingdom (0.06)
Europe > Portugal > Lisbon > Lisbon (0.06)
Europe > France (0.06)

Technology:

Information Technology > Information Management > Search (0.55)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.39)

Add feedback

Few-Shot Preference Learning for Human-in-the-Loop RL

Hejna, Joey, Sadigh, Dorsa

arXiv.org Artificial IntelligenceDec-6-2022

While reinforcement learning (RL) has become a more popular approach for robotics, designing sufficiently informative reward functions for complex tasks has proven to be extremely difficult due their inability to capture human intent and policy exploitation. Preference based RL algorithms seek to overcome these challenges by directly learning reward functions from human feedback. Unfortunately, prior work either requires an unreasonable number of queries implausible for any human to answer or overly restricts the class of reward functions to guarantee the elicitation of the most informative queries, resulting in models that are insufficiently expressive for realistic robotics tasks. Contrary to most works that focus on query selection to \emph{minimize} the amount of data required for learning reward functions, we take an opposite approach: \emph{expanding} the pool of available data by viewing human-in-the-loop RL through the more flexible lens of multi-task learning. Motivated by the success of meta-learning, we pre-train preference models on prior task data and quickly adapt them for new tasks using only a handful of queries. Empirically, we reduce the amount of online feedback needed to train manipulation policies in Meta-World by 20$\times$, and demonstrate the effectiveness of our method on a real Franka Panda Robot. Moreover, this reduction in query-complexity allows us to train robot policies from actual human users. Videos of our results and code can be found at https://sites.google.com/view/few-shot-preference-rl/home.

machine learning, natural language, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2212.03363

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Retrieval as Attention: End-to-end Learning of Retrieval and Reading within a Single Transformer

Jiang, Zhengbao, Gao, Luyu, Araki, Jun, Ding, Haibo, Wang, Zhiruo, Callan, Jamie, Neubig, Graham

arXiv.org Artificial IntelligenceDec-4-2022

Systems for knowledge-intensive tasks such as open-domain question answering (QA) usually consist of two stages: efficient retrieval of relevant documents from a large corpus and detailed reading of the selected documents to generate answers. Retrievers and readers are usually modeled separately, which necessitates a cumbersome implementation and is hard to train and adapt in an end-to-end fashion. In this paper, we revisit this design and eschew the separate architecture and training in favor of a single Transformer that performs Retrieval as Attention (ReAtt), and end-to-end training solely based on supervision from the end QA task. We demonstrate for the first time that a single model trained end-to-end can achieve both competitive retrieval and QA performance, matching or slightly outperforming state-of-the-art separately trained retrievers and readers. Moreover, end-to-end adaptation significantly boosts its performance on out-of-domain datasets in both supervised and unsupervised settings, making our model a simple and adaptable solution for knowledge-intensive tasks. Code and models are available at https://github.com/jzbjyb/ReAtt.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.02027

Country:

North America > Dominican Republic (0.04)
Europe > Austria (0.04)
Oceania > Australia (0.04)
(9 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

Melody transcription via generative pre-training

Donahue, Chris, Thickstun, John, Liang, Percy

arXiv.org Artificial IntelligenceDec-4-2022

Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2212.01884

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > India > Karnataka > Bengaluru (0.04)

Genre: Research Report (0.82)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

Moving Beyond Downstream Task Accuracy for Information Retrieval Benchmarking

Santhanam, Keshav, Saad-Falcon, Jon, Franz, Martin, Khattab, Omar, Sil, Avirup, Florian, Radu, Sultan, Md Arafat, Roukos, Salim, Zaharia, Matei, Potts, Christopher

arXiv.org Artificial IntelligenceDec-2-2022

Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks. Unfortunately, some dimensions of this progress are illusory: the majority of the popular IR benchmarks today focus exclusively on downstream task accuracy and thus conceal the costs incurred by systems that trade away efficiency for quality. Latency, hardware cost, and other efficiency considerations are paramount to the deployment of IR systems in user-facing settings. We propose that IR benchmarks structure their evaluation methodology to include not only metrics of accuracy, but also efficiency considerations such as a query latency and the corresponding cost budget for a reproducible hardware setting. For the popular IR benchmarks MS MARCO and XOR-TyDi, we show how the best choice of IR system varies according to how these efficiency considerations are chosen and weighed. We hope that future benchmarks will adopt these guidelines toward more holistic IR evaluation.

artificial intelligence, information retrieval, natural language, (16 more...)

arXiv.org Artificial Intelligence

2212.0134

Country:

North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Fast Online Hashing with Multi-Label Projection

Jia, Wenzhe, Cao, Yuan, Liu, Junwei, Gui, Jie

arXiv.org Artificial IntelligenceDec-2-2022

Hashing has been widely researched to solve the large-scale approximate nearest neighbor search problem owing to its time and storage superiority. In recent years, a number of online hashing methods have emerged, which can update the hash functions to adapt to the new stream data and realize dynamic retrieval. However, existing online hashing methods are required to update the whole database with the latest hash functions when a query arrives, which leads to low retrieval efficiency with the continuous increase of the stream data. On the other hand, these methods ignore the supervision relationship among the examples, especially in the multi-label case. In this paper, we propose a novel Fast Online Hashing (FOH) method which only updates the binary codes of a small part of the database. To be specific, we first build a query pool in which the nearest neighbors of each central point are recorded. When a new query arrives, only the binary codes of the corresponding potential neighbors are updated. In addition, we create a similarity matrix which takes the multi-label supervision information into account and bring in the multi-label projection loss to further preserve the similarity among the multi-label data. The experimental results on two common benchmarks show that the proposed FOH can achieve dramatic superiority on query time up to 6.28 seconds less than state-of-the-art baselines with competitive retrieval accuracy.

artificial intelligence, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

2212.03112

Country: Asia > China (0.05)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.86)

Add feedback

Proceedings of the 1st International Workshop on Reading Music Systems

Calvo-Zaragoza, Jorge, Hajič, Jan jr., Pacha, Alexander

arXiv.org Artificial IntelligenceDec-1-2022

The International Workshop on Reading Music Systems (WoRMS) is a workshop that tries to connect researchers who develop systems for reading music, such as in the field of Optical Music Recognition, with other researchers and practitioners that could benefit from such systems, like librarians or musicologists. The relevant topics of interest for the workshop include, but are not limited to: Music reading systems; Optical music recognition; Datasets and performance evaluation; Image processing on music scores; Writer identification; Authoring, editing, storing and presentation systems for music scores; Multi-modal systems; Novel input-methods for music to produce written music; Web-based Music Information Retrieval services; Applications and projects; Use-cases related to written music. These are the proceedings of the 1st International Workshop on Reading Music Systems, held in Paris on the 20th of September 2018.

machine learning, pattern recognition, recognition, (17 more...)

arXiv.org Artificial Intelligence

2301.10062

Country:

North America > Canada > Quebec > Montreal (0.14)
Europe > Austria > Vienna (0.14)
Europe > Finland > Uusimaa > Helsinki (0.04)
(23 more...)

Genre:

Research Report (0.64)
Instructional Material (0.45)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)
Education > Curriculum > Subject-Specific Education (0.66)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
(8 more...)

Add feedback