AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

AKEM: Aligning Knowledge Base to Queries with Ensemble Model for Entity Recognition and Linking

Lu, Di, Liang, Zhongping, Yuan, Caixia, Wang, Xiaojie

arXiv.org Artificial IntelligenceSep-12-2023

This paper presents a novel approach to address the Entity Recognition and Linking Challenge at NLPCC 2015. The task involves extracting named entity mentions from short search queries and linking them to entities within a reference Chinese knowledge base. To tackle this problem, we first expand the existing knowledge base and utilize external knowledge to identify candidate entities, thereby improving the recall rate. Next, we extract features from the candidate entities and utilize Support Vector Regression and Multiple Additive Regression Tree as scoring functions to filter the results. Additionally, we apply rules to further refine the results and enhance precision. Our method is computationally efficient and achieves an F1 score of 0.535.

aligning knowledge base, ensemble model, entity recognition, (2 more...)

arXiv.org Artificial Intelligence

2309.06175

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Add feedback

Annotating Data for Fine-Tuning a Neural Ranker? Current Active Learning Strategies are not Better than Random Selection

Althammer, Sophia, Zuccon, Guido, Hofstätter, Sebastian, Verberne, Suzan, Hanbury, Allan

arXiv.org Artificial IntelligenceSep-12-2023

Search methods based on Pretrained Language Models (PLM) have demonstrated great effectiveness gains compared to statistical and early neural ranking models. However, fine-tuning PLM-based rankers requires a great amount of annotated training data. Annotating data involves a large manual effort and thus is expensive, especially in domain specific tasks. In this paper we investigate fine-tuning PLM-based rankers under limited training data and budget. We investigate two scenarios: fine-tuning a ranker from scratch, and domain adaptation starting with a ranker already fine-tuned on general data, and continuing fine-tuning on a target dataset. We observe a great variability in effectiveness when fine-tuning on different randomly selected subsets of training data. This suggests that it is possible to achieve effectiveness gains by actively selecting a subset of the training data that has the most positive effect on the rankers. This way, it would be possible to fine-tune effective PLM rankers at a reduced annotation budget. To investigate this, we adapt existing Active Learning (AL) strategies to the task of fine-tuning PLM rankers and investigate their effectiveness, also considering annotation and computational costs. Our extensive analysis shows that AL strategies do not significantly outperform random selection of training subsets in terms of effectiveness. We further find that gains provided by AL strategies come at the expense of more assessments (thus higher annotation costs) and AL strategies underperform random selection when comparing effectiveness given a fixed annotation cost. Our results highlight that ``optimal'' subsets of training data that provide high effectiveness at low annotation cost do exist, but current mainstream AL strategies applied to PLM rankers are not capable of identifying them.

effectiveness, plm ranker, ranker, (17 more...)

arXiv.org Artificial Intelligence

2309.06131

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.05)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving Information Extraction on Business Documents with Specific Pre-Training Tasks

Douzon, Thibault, Duffner, Stefan, Garcia, Christophe, Espinas, Jérémy

arXiv.org Artificial IntelligenceSep-11-2023

Transformer-based Language Models are widely used in Natural Language Processing related tasks. Thanks to their pre-training, they have been successfully adapted to Information Extraction in business documents. However, most pre-training tasks proposed in the literature for business documents are too generic and not sufficient to learn more complex structures. In this paper, we use LayoutLM, a language model pre-trained on a collection of business documents, and introduce two new pre-training tasks that further improve its capacity to extract relevant information. The first is aimed at better understanding the complex layout of documents, and the second focuses on numeric values and their order of magnitude. These tasks force the model to learn better-contextualized representations of the scanned documents. We further introduce a new post-processing algorithm to decode BIESO tags in Information Extraction that performs better with complex entities. Our method significantly improves extraction performance on both public (from 93.88 to 95.50 F1 score) and private (from 84.35 to 84.84 F1 score) datasets composed of expense receipts, invoices, and purchase orders.

arxiv, extraction, information extraction, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-06555-2_8

2309.05429

Country:

Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Dominican Republic (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Data Science > Data Mining > Text Mining (0.84)

Add feedback

Analysing Cross-Lingual Transfer in Low-Resourced African Named Entity Recognition

Beukman, Michael, Fokam, Manuel

arXiv.org Artificial IntelligenceSep-11-2023

Transfer learning has led to large gains in performance for nearly all NLP tasks while making downstream models easier and faster to train. This has also been extended to low-resourced languages, with some success. We investigate the properties of cross-lingual transfer learning between ten low-resourced languages, from the perspective of a named entity recognition task. We specifically investigate how much adaptive fine-tuning and the choice of transfer language affect zero-shot transfer performance. We find that models that perform well on a single language often do so at the expense of generalising to others, while models with the best generalisation to other languages suffer in individual language performance. Furthermore, the amount of data overlap between the source and target datasets is a better predictor of transfer performance than either the geographical or genetic distance between the languages.

computational linguistic, dataset, transfer performance, (15 more...)

arXiv.org Artificial Intelligence

2309.05311

Country:

Africa > West Africa (0.14)
Africa > East Africa (0.14)
Africa > Niger (0.04)
(23 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Is Google's Search Engine Smart or Sneaky? A Trial Court Judge Will Decide

WIREDSep-10-2023, 11:00:00 GMT

A family member's hurried Google search for a last-second visa to visit New Zealand recently caused a headache--and provided a timely reminder of why Google faces a landmark US antitrust trial next week. Tapping on the first link took us off to a website that after a few swipes charged $118 for the necessary paperwork. Only later did it emerge that we'd paid a so-called "internet-based travel technology company" and not a government agency, and been fleeced for more than double the required cost. Fortunately, our panicked refund demand was fulfilled, but the miscue highlights a major frustration with Google that helped land it in court. The stacks of ads above its search results, like the visa link we clicked on, too often knock users off course from the information that they are seeking.

competition, google, sneaky, (2 more...)

WIRED

Country:

Oceania > New Zealand (0.26)
North America > United States > Colorado (0.08)
Oceania > Guam (0.06)
(4 more...)

Industry:

Law > Litigation (1.00)
Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities

Li, Harry, Appleby, Gabriel, Brumar, Camelia Daniela, Chang, Remco, Suh, Ashley

arXiv.org Artificial IntelligenceSep-6-2023

This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.

characterizing, knowledge graph, visualization opportunity

arXiv.org Artificial Intelligence

2304.01311

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.53)

Add feedback

Internet Explorer: Targeted Representation Learning on the Open Web

Li, Alexander C., Brown, Ellis, Efros, Alexei A., Pathak, Deepak

arXiv.org Artificial IntelligenceSep-6-2023

Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/

dataset, internet explorer, targeted representation learning, (10 more...)

arXiv.org Artificial Intelligence

2302.14051

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Energy (0.46)
Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications (1.00)
(6 more...)

Add feedback

CRUISE-Screening: Living Literature Reviews Toolbox

Kusa, Wojciech, Knoth, Petr, Hanbury, Allan

arXiv.org Artificial IntelligenceSep-4-2023

Keeping up with research and finding related work is still a time-consuming task for academics. Researchers sift through thousands of studies to identify a few relevant ones. Automation techniques can help by increasing the efficiency and effectiveness of this task. To this end, we developed CRUISE-Screening, a web-based application for conducting living literature reviews - a type of literature review that is continuously updated to reflect the latest research in a particular field. CRUISE-Screening is connected to several search engines via an API, which allows for updating the search results periodically. Moreover, it can facilitate the process of screening for relevant publications by using text classification and question answering models. CRUISE-Screening can be used both by researchers conducting literature reviews and by those working on automating the citation screening process to validate their algorithms. The application is open-source: https://github.com/ProjectDoSSIER/cruise-screening, and a demo is available under this URL: https://citation-screening.ec.tuwien.ac.at. We discuss the limitations of our tool in Appendix A.

cruise-screening, literature review, systematic review, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3583780.3614736

2309.01684

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > West Midlands > Birmingham (0.05)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(5 more...)

Genre:

Overview (1.00)
Research Report (0.83)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.52)

Add feedback

Keyword search is finally coming to Threads

EngadgetAug-31-2023, 13:55:16 GMT

A much-demanded feature is finally coming to Threads very soon, Meta CEO Mark Zuckerberg has confirmed. A keyword search function is on the way. Until now, the search box has only been useful for looking up user accounts. In fact, the company is already testing the function in Australia and New Zealand. It plans to roll out keyword search more broadly to English-speaking territories in short order.

keyword search, meta

Engadget

Country:

Oceania > New Zealand (0.28)
Oceania > Australia (0.28)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback

Simulation-Based Optimization of User Interfaces for Quality-Assuring Machine Learning Model Predictions

Zhang, Yu, Tennekes, Martijn, de Jong, Tim, Curier, Lyana, Coecke, Bob, Chen, Min

arXiv.org Artificial IntelligenceAug-30-2023

Quality-sensitive applications of machine learning (ML) require quality assurance (QA) by humans before the predictions of an ML model can be deployed. QA for ML (QA4ML) interfaces require users to view a large amount of data and perform many interactions to correct errors made by the ML model. An optimized user interface (UI) can significantly reduce interaction costs. While UI optimization can be informed by user studies evaluating design options, this approach is not scalable because there are typically numerous small variations that can affect the efficiency of a QA4ML interface. Hence, we propose using simulation to evaluate and aid the optimization of QA4ML interfaces. In particular, we focus on simulating the combined effects of human intelligence in initiating appropriate interaction commands and machine intelligence in providing algorithmic assistance for accelerating QA4ML processes. As QA4ML is usually labor-intensive, we use the simulated task completion time as the metric for UI optimization under different interface and algorithm setups. We demonstrate the usage of this UI design method in several QA4ML applications.

application, interface, time cost, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3594552

2104.01129

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.28)
North America > United States > California > Santa Clara County > San Jose (0.14)
North America > United States > California > Monterey County > Monterey (0.14)
(22 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.68)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Human Computer Interaction > Interfaces (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
(2 more...)

Add feedback