AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

SciLit: A Platform for Joint Scientific Literature Discovery, Summarization and Citation Generation

Gu, Nianlong, Hahnloser, Richard H. R.

arXiv.org Artificial IntelligenceNov-6-2023

Scientific writing involves retrieving, summarizing, and citing relevant papers, which can be time-consuming processes in large and rapidly evolving fields. By making these processes inter-operable, natural language processing (NLP) provides opportunities for creating end-to-end assistive writing tools. We propose SciLit, a pipeline that automatically recommends relevant papers, extracts highlights, and suggests a reference sentence as a citation of a paper, taking into consideration the user-provided context and keywords. SciLit efficiently recommends papers from large databases of hundreds of millions of papers using a two-stage pre-fetching and re-ranking literature search system that flexibly deals with addition and removal of a paper database. We provide a convenient user interface that displays the recommended papers as extractive summaries and that offers abstractively-generated citing sentences which are aligned with the provided context and which mention the chosen keyword(s). Our assistive tool for literature discovery and scientific writing is available at https://scilit.vercel.app

citation sentence, computational linguistic, keyword, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.18653/v1/2023.acl-demo.22

2306.03535

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Switzerland > Zürich > Zürich (0.05)
Asia > China > Hong Kong (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)

Add feedback

MAWSEO: Adversarial Wiki Search Poisoning for Illicit Online Promotion

Lin, Zilong, Li, Zhengyi, Liao, Xiaojing, Wang, XiaoFeng, Liu, Xiaozhong

arXiv.org Artificial IntelligenceNov-5-2023

Public Wiki systems are collaborative knowledge bases More specifically, a research question is that, given a query, that anyone can contribute. This open model is user-friendly whether strategic revisions can be made on selected Wiki and powerful, which reduces participation barriers and allows articles (which we call adversarial revisions) to ensure people with different backgrounds to contribute. As that the following goals are achieved simultaneously: 1) a prominent example of public Wiki systems, Wikipedia the ranks of the revised articles are significantly improved is instrumental in making open knowledge that millions of among query results, 2) the revisions cannot be detected people use, redistribute, and contribute to [28]. In another by Wiki vandalism detection even when the detector is instance, Wikidata [83] is a free and open knowledge base blackbox to the adversary, and 3) the content of revisions with 0.1 billion data items that can be read and edited does not arouse any suspicion from Wiki users but can still by both humans and machines. Public Wiki systems have capture their attention by keeping the semantic consistency already served as key knowledge sources in people's daily and topic relevancy of the revised articles.

mawseo, paragraph, revision, (16 more...)

arXiv.org Artificial Intelligence

2304.113

Country:

Europe > Belgium (0.04)
North America > United States > Indiana (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.72)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.46)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
(5 more...)

Add feedback

Perturbation-based Active Learning for Question Answering

Luo, Fan, Surdeanu, Mihai

arXiv.org Artificial IntelligenceNov-4-2023

Building a question answering (QA) model with less annotation costs can be achieved by utilizing active learning (AL) training strategy. It selects the most informative unlabeled training data to update the model effectively. Acquisition functions for AL are used to determine how informative each training example is, such as uncertainty or diversity based sampling. In this work, we propose a perturbation-based active learning acquisition strategy and demonstrate it is more effective than existing commonly used strategies.

arxiv preprint arxiv, learning, proceedings, (13 more...)

arXiv.org Artificial Intelligence

2311.02345

Country:

North America > United States > Arizona > Pima County > Tucson (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Czechia > Prague (0.04)

Genre: Research Report (1.00)

Industry:

Education (0.48)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Multi-EuP: The Multilingual European Parliament Dataset for Analysis of Bias in Information Retrieval

Yang, Jinrui, Baldwin, Timothy, Cohn, Trevor

arXiv.org Artificial IntelligenceNov-3-2023

We present Multi-EuP, a new multilingual benchmark dataset, comprising 22K multi-lingual documents collected from the European Parliament, spanning 24 languages. This dataset is designed to investigate fairness in a multilingual information retrieval (IR) context to analyze both language and demographic bias in a ranking context. It boasts an authentic multilingual corpus, featuring topics translated into all 24 languages, as well as cross-lingual relevance judgments. Furthermore, it offers rich demographic information associated with its documents, facilitating the study of demographic bias. We report the effectiveness of Multi-EuP for benchmarking both monolingual and multilingual IR. We also conduct a preliminary experiment on language bias caused by the choice of tokenization strategy.

dataset, query, retrieval, (15 more...)

arXiv.org Artificial Intelligence

2311.0187

Country:

Europe > Belgium (0.05)
Europe > Middle East > Malta (0.04)
Europe > Ireland (0.04)
(29 more...)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)

Add feedback

VQPy: An Object-Oriented Approach to Modern Video Analytics

Yu, Shan, Zhu, Zhenting, Chen, Yu, Xu, Hanchen, Zhao, Pengzhan, Wang, Yang, Padmanabhan, Arthi, Latapie, Hugo, Xu, Harry

arXiv.org Artificial IntelligenceNov-3-2023

Video analytics is widely used in contemporary systems and services. At the forefront of video analytics are video queries that users develop to find objects of particular interest. Building upon the insight that video objects (e.g., human, animals, cars, etc.), the center of video analytics, are similar in spirit to objects modeled by traditional object-oriented languages, we propose to develop an object-oriented approach to video analytics. This approach, named VQPy, consists of a frontend$\unicode{x2015}$a Python variant with constructs that make it easy for users to express video objects and their interactions$\unicode{x2015}$as well as an extensible backend that can automatically construct and optimize pipelines based on video objects. We have implemented and open-sourced VQPy, which has been productized in Cisco as part of its DeepVision framework.

query, video, vqpy, (15 more...)

arXiv.org Artificial Intelligence

2311.01623

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)
North America > United States > New York > New York County > New York City (0.05)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.46)

Add feedback

Metaphorical User Simulators for Evaluating Task-oriented Dialogue Systems

Sun, Weiwei, Guo, Shuyu, Zhang, Shuo, Ren, Pengjie, Chen, Zhumin, de Rijke, Maarten, Ren, Zhaochun

arXiv.org Artificial IntelligenceNov-2-2023

Task-oriented dialogue systems (TDSs) are assessed mainly in an offline setting or through human evaluation. The evaluation is often limited to single-turn or is very time-intensive. As an alternative, user simulators that mimic user behavior allow us to consider a broad set of user goals to generate human-like conversations for simulated evaluation. Employing existing user simulators to evaluate TDSs is challenging as user simulators are primarily designed to optimize dialogue policies for TDSs and have limited evaluation capabilities. Moreover, the evaluation of user simulators is an open challenge. In this work, we propose a metaphorical user simulator for end-to-end TDS evaluation, where we define a simulator to be metaphorical if it simulates user's analogical thinking in interactions with systems. We also propose a tester-based evaluation framework to generate variants, i.e., dialogue systems with different capabilities. Our user simulator constructs a metaphorical user model that assists the simulator in reasoning by referring to prior knowledge when encountering new items. We estimate the quality of simulators by checking the simulated interactions between simulators and variants. Our experiments are conducted using three TDS datasets. The proposed user simulator demonstrates better consistency with manual evaluation than an agenda-based simulator and a seq2seq model on three datasets; our tester framework demonstrates efficiency and has been tested on multiple tasks, such as conversational recommendation and e-commerce dialogues.

evaluation, simulator, user simulator, (14 more...)

arXiv.org Artificial Intelligence

2204.00763

Country:

Asia > China > Shandong Province > Qingdao (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Restaurants (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.94)
Information Technology > Information Management (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.94)
(3 more...)

Add feedback

Towards Characterizing the First-order Query Complexity of Learning (Approximate) Nash Equilibria in Zero-sum Matrix Games

Hadiji, Hédi, Sachs, Sarah, van Erven, Tim, Koolen, Wouter M.

arXiv.org Machine LearningNov-2-2023

In the first-order query model for zero-sum $K\times K$ matrix games, players observe the expected pay-offs for all their possible actions under the randomized action played by their opponent. This classical model has received renewed interest after the discovery by Rakhlin and Sridharan that $\epsilon$-approximate Nash equilibria can be computed efficiently from $O(\frac{\ln K}{\epsilon})$ instead of $O(\frac{\ln K}{\epsilon^2})$ queries. Surprisingly, the optimal number of such queries, as a function of both $\epsilon$ and $K$, is not known. We make progress on this question on two fronts. First, we fully characterise the query complexity of learning exact equilibria ($\epsilon=0$), by showing that they require a number of queries that is linear in $K$, which means that it is essentially as hard as querying the whole matrix, which can also be done with $K$ queries. Second, for $\epsilon > 0$, the current query complexity upper bound stands at $O(\min(\frac{\ln(K)}{\epsilon} , K))$. We argue that, unfortunately, obtaining a matching lower bound is not possible with existing techniques: we prove that no lower bound can be derived by constructing hard matrices whose entries take values in a known countable set, because such matrices can be fully identified by a single query. This rules out, for instance, reducing to an optimization problem over the hypercube by encoding it as a binary payoff matrix. We then introduce a new technique for lower bounds, which allows us to obtain lower bounds of order $\tilde\Omega(\log(\frac{1}{K\epsilon})$ for any $\epsilon \leq 1 / (cK^4)$, where $c$ is a constant independent of $K$. We further discuss possible future directions to improve on our techniques in order to close the gap with the upper bounds.

machine learning, natural language, query complexity, (16 more...)

arXiv.org Machine Learning

2304.12768

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.84)

Add feedback

Zero-Shot Medical Information Retrieval via Knowledge Graph Embedding

Wang, Yuqi, Wang, Zeqiang, Wang, Wei, Chen, Qi, Huang, Kaizhu, Nguyen, Anh, De, Suparna

arXiv.org Artificial IntelligenceOct-31-2023

In the era of the Internet of Things (IoT), the retrieval of relevant medical information has become essential for efficient clinical decision-making. This paper introduces MedFusionRank, a novel approach to zero-shot medical information retrieval (MIR) that combines the strengths of pre-trained language models and statistical methods while addressing their limitations. The proposed approach leverages a pre-trained BERT-style model to extract compact yet informative keywords. These keywords are then enriched with domain knowledge by linking them to conceptual entities within a medical knowledge graph. Experimental evaluations on medical datasets demonstrate MedFusion-Rank's superior performance over existing methods, with promising results with a variety of evaluation metrics. MedFusionRank demonstrates efficacy in retrieving relevant information, even from short or single-term queries.

keyword, knowledge graph, zero-shot medical information retrieval, (9 more...)

arXiv.org Artificial Intelligence

2310.20588

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Oncology (0.46)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reboost Large Language Model-based Text-to-SQL, Text-to-Python, and Text-to-Function -- with Real Applications in Traffic Domain

Sui, Guanghu, Li, Zhishuai, Li, Ziyue, Yang, Sun, Ruan, Jingqing, Mao, Hangyu, Zhao, Rui

arXiv.org Artificial IntelligenceOct-31-2023

Previous state-of-the-art (SOTA) method achieved a remarkable execution accuracy on the Spider dataset, which is one of the largest and most diverse datasets in the Text-to-SQL domain. However, during our reproduction of the business dataset, we observed a significant drop in performance. We examined the differences in dataset complexity, as well as the clarity of questions' intentions, and assessed how those differences could impact the performance of prompting methods. Subsequently, We develop a more adaptable and more general prompting method, involving mainly query rewriting and SQL boosting, which respectively transform vague information into exact and precise information and enhance the SQL itself by incorporating execution feedback and the query results from the database content. In order to prevent information gaps, we include the comments, value types, and value samples for columns as part of the database description in the prompt. Our experiments with Large Language Models (LLMs) illustrate the significant performance improvement on the business dataset and prove the substantial potential of our method. In terms of execution accuracy on the business dataset, the SOTA method scored 21.05, while our approach scored 65.79. As a result, our approach achieved a notable performance improvement even when using a less capable pre-trained language model. Last but not the least, we also explore the Text-to-Python and Text-to-Function options, and we deeply analyze the pros and cons among them, offering valuable insights to the community.

database, llm, query, (14 more...)

arXiv.org Artificial Intelligence

2310.18752

Country:

Asia > China > Shanghai > Shanghai (0.05)
Asia > China > Beijing > Beijing (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.35)

Add feedback

Born for Auto-Tagging: Faster and better with new objective functions

Liu, Chiung-ju, Shieh, Huang-Ting

arXiv.org Artificial IntelligenceOct-31-2023

Keyword extraction is a task of text mining. It is applied to increase search volume in SEO and ads. Implemented in auto-tagging, it makes tagging on a mass scale of online articles and photos efficiently and accurately. BAT is invented for auto-tagging which served as awoo's AI marketing platform (AMP). awoo AMP not only provides service as a customized recommender system but also increases the converting rate in E-commerce. The strength of BAT converges faster and better than other SOTA models, as its 4-layer structure achieves the best F scores at 50 epochs. In other words, it performs better than other models which require deeper layers at 100 epochs. To generate rich and clean tags, awoo creates new objective functions to maintain similar ${\rm F_1}$ scores with cross-entropy while enhancing ${\rm F_2}$ scores simultaneously. To assure the even better performance of F scores awoo revamps the learning rate strategy proposed by Transformer \cite{Transformer} to increase ${\rm F_1}$ and ${\rm F_2}$ scores at the same time.

category, epoch, negative sample, (16 more...)

arXiv.org Artificial Intelligence

2206.07264

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > California > San Francisco County > San Francisco (0.14)
Asia > China (0.04)
(9 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.66)

Add feedback