AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Automated Query Reformulation for Efficient Search based on Query Logs From Stack Overflow

Cao, Kaibo, Chen, Chunyang, Baltes, Sebastian, Treude, Christoph, Chen, Xiang

arXiv.org Artificial IntelligenceFeb-10-2021

As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we propose an automated software-specific query reformulation approach based on deep learning. With query logs provided by Stack Overflow, we construct a large-scale query reformulation corpus, including the original queries and corresponding reformulated ones. Our approach trains a Transformer model that can automatically generate candidate reformulated queries when given the user's original query. The evaluation results show that our approach outperforms five state-of-the-art baselines, and achieves a 5.6% to 33.5% boost in terms of $\mathit{ExactMatch}$ and a 4.8% to 14.4% boost in terms of $\mathit{GLEU}$.

query, query reformulation, reformulation, (15 more...)

arXiv.org Artificial Intelligence

2102.00826

Country:

Asia > China > Jiangsu Province > Nanjing (0.04)
Oceania > Australia > South Australia > Adelaide (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Spain (0.04)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(3 more...)

Add feedback

A Knowledge Compilation Map for Conditional Preference Statements-based Languages

Fargier, Hélène, Mengin, Jérôme

arXiv.org Artificial IntelligenceFeb-8-2021

Conditional preference statements have been used to compactly represent preferences over combinatorial domains. They are at the core of CP-nets and their generalizations, and lexicographic preference trees. Several works have addressed the complexity of some queries (optimization, dominance in particular). We extend in this paper some of these results, and study other queries which have not been addressed so far, like equivalence, thereby contributing to a knowledge compilation map for languages based on conditional preference statements. We also introduce a new parameterised family of languages, which enables to balance expressiveness against the complexity of some queries.

expressiveness, formula, var, (16 more...)

arXiv.org Artificial Intelligence

2102.04107

Country:

Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
North America > Canada > Quebec > Capitale-Nationale Region > Québec (0.04)
North America > Canada > Quebec > Capitale-Nationale Region > Quebec City (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Problematic Machine Behavior: A Systematic Literature Review of Algorithm Audits

Bandy, Jack

arXiv.org Artificial IntelligenceFeb-3-2021

While algorithm audits are growing rapidly in commonality and public importance, relatively little scholarly work has gone toward synthesizing prior work and strategizing future research in the area. This systematic literature review aims to do just that, following PRISMA guidelines in a review of over 500 English articles that yielded 62 algorithm audit studies. The studies are synthesized and organized primarily by behavior (discrimination, distortion, exploitation, and misjudgement), with codes also provided for domain (e.g. search, vision, advertising, etc.), organization (e.g. Google, Facebook, Amazon, etc.), and audit method (e.g. sock puppet, direct scrape, crowdsourcing, etc.). The review shows how previous audit studies have exposed public-facing algorithms exhibiting problematic behavior, such as search algorithms culpable of distortion and advertising algorithms culpable of discrimination. Based on the studies reviewed, it also suggests some behaviors (e.g. discrimination on the basis of intersectional identities), domains (e.g. advertising algorithms), methods (e.g. code auditing), and organizations (e.g. Twitter, TikTok, LinkedIn) that call for future audit attention. The paper concludes by offering the common ingredients of successful audits, and discussing algorithm auditing in the context of broader research working toward algorithmic justice.

algorithm, audit, discrimination, (13 more...)

arXiv.org Artificial Intelligence

2102.04256

Country:

North America > United States > New York > New York County > New York City (0.14)
North America > United States > Minnesota (0.04)
North America > United States > Virginia (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.92)

Industry:

Media > News (1.00)
Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Services (1.00)
(8 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.49)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

Focusing Knowledge-based Graph Argument Mining via Topic Modeling

Abels, Patrick, Ahmadi, Zahra, Burkhardt, Sophie, Schiller, Benjamin, Gurevych, Iryna, Kramer, Stefan

arXiv.org Artificial IntelligenceFeb-3-2021

Decision-making usually takes five steps: identifying the problem, collecting data, extracting evidence, identifying pro and con arguments, and making decisions. Focusing on extracting evidence, this paper presents a hybrid model that combines latent Dirichlet allocation and word embeddings to obtain external knowledge from structured and unstructured data. We study the task of sentence-level argument mining, as arguments mostly require some degree of world knowledge to be identified and understood. Given a topic and a sentence, the goal is to classify whether a sentence represents an argument in regard to the topic. We use a topic model to extract topic- and sentence-specific evidence from the structured knowledge base Wikidata, building a graph based on the cosine similarity between the entity word vectors of Wikidata and the vector of the given sentence. Also, we build a second graph based on topic-specific articles found via Google to tackle the general incompleteness of structured knowledge bases. Combining these graphs, we obtain a graph-based model which, as our evaluation shows, successfully capitalizes on both structured and unstructured data.

graph, knowledge graph, wikidata, (12 more...)

arXiv.org Artificial Intelligence

2102.02086

Country:

North America > United States (0.14)
Europe > Germany > Rheinland-Pfalz > Mainz (0.05)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry:

Law (0.46)
Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.91)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.87)

Add feedback

Google's search engine not as good as its competitors for news, research finds

The GuardianFeb-2-2021, 16:30:34 GMT

Australians trying to stay up to date with the news by searching online may be better off ditching Google and using its competitors, research by Monash University has shown. On Australia Day "Grace Tame" was the most popular search term used on Google – reflecting the fact that she had just been made Australian of the Year. The top 50 results delivered by Google included only 70% of professional news websites, compared with 94% for the same search term on Bing and 82% on Ecosia. Last Sunday Australians rushing to find out more about the suddenly announced coronavirus lockdown in Perth made "perth lockdown" the most popular search term. Google delivered only 80% of news websites in the top 50, compared with 90% from Bing and 86% from Ecosia.

ecosia, google, search term, (11 more...)

The Guardian

Country: Oceania > Australia (0.31)

Industry:

Media > News (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.37)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.44)

Add feedback

Taxonomic survey of Hindi Language NLP systems

Desai, Nikita P., Prof., null, Dabhi, Vipul K.

arXiv.org Artificial IntelligenceJan-30-2021

The field of Natural language processing can be formally defined as - "A theoretically motivated range of computational techniques for analyzing and representing naturally occurring texts at one or more levels of linguistic analysis for the purpose of achieving human-like language processing for a range of tasks or applications"[69]. The naturally occurring text can be in written or spoken form.A wide array of domains contribute to NLP development like linguistics, computer science and psychology.The linguistics field helps to understand the formal structure of language while computer science domain helps to find efficient internal representations and data structures.The study of "Psychology" can be useful to understand the methodology used by humans for dealing with languages. NLP can be considered to be having two distinct focus namely (1)Natural Language Generation(NLG) and (2)Natural Language Understanding(NLU). The NLG deals with planning to use the representation of language to decide what should be generated at each point in interaction, while NLU needs to analyze language and decide which is best way to represent it meaningfully.We, in this survey paper, concentrate on area of NLU for written text.Hence the NLP henceforth might be considered as NLU and vice versa. Motivation for designing Indian NLP systems Hindi and English are the official languages in central government of India(GOI). Indian community faces a "Digital Divide" due to dominance of English as mode of communication in higher education, judiciary, corporate sector and Public administration at Central level whereas the government in states work in their respective regional languages [67].The expansion of Internet has inter-connected the socioeconomic environment of the world and redefined the concept of global culture.As per a report in 2017 by the companies kpmg and Google

application, hindi, proceedings, (16 more...)

arXiv.org Artificial Intelligence

2102.00214

Country:

Asia > India > Tamil Nadu > Chennai (0.04)
Asia > Indonesia > Bali (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(9 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry: Government > Regional Government > Asia Government > India Government (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(10 more...)

Add feedback

BERTa\'u: Ita\'u BERT for digital customer service

Finardi, Paulo, Viegas, José Dié, Ferreira, Gustavo T., Mansano, Alex F., Caridá, Vinicius F.

arXiv.org Artificial IntelligenceJan-28-2021

In the last few years, three major topics received increased interest: deep learning, NLP and conversational agents. Bringing these three topics together to create an amazing digital customer experience and indeed deploy in production and solve real-world problems is something innovative and disruptive. We introduce a new Portuguese financial domain language representation model called BERTa\'u. BERTa\'u is an uncased BERT-base trained from scratch with data from the Ita\'u virtual assistant chatbot solution. Our novel contribution is that BERTa\'u pretrained language model requires less data, reached state-of-the-art performance in three NLP tasks, and generates a smaller and lighter model that makes the deployment feasible. We developed three tasks to validate our model: information retrieval with Frequently Asked Questions (FAQ) from Ita\'u bank, sentiment analysis from our virtual assistant data, and a NER solution. All proposed tasks are real-world solutions in production on our environment and the usage of a specialist model proved to be effective when compared to Google BERT multilingual and the DPRQuestionEncoder from Facebook, available at Hugging Face. The BERTa\'u improves the performance in 22% of FAQ Retrieval MRR metric, 2.1% in Sentiment Analysis F1 score, 4.4% in NER F1 score and can also represent the same sequence in up to 66% fewer tokens when compared to "shelf models".

berta, digital customer service

arXiv.org Artificial Intelligence

2101.12015

Genre: Frequently Asked Questions (FAQ) (0.73)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.73)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.73)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.53)

Add feedback

DP-Cryptography

Communications of the ACMJan-26-2021, 13:21:32 GMT

On Feb 15, 2019, John Abowd, chief scientist at the U.S. Census Bureau, announced the results of a reconstruction attack that they proactively launched using data released under the 2010 Decennial Census.19 The decennial census released billions of statistics about individuals like "how many people of the age 10-20 live in New York City" or "how many people live in four-person households." Using only the data publicly released in 2010, an internal team was able to correctly reconstruct records of address (by census block), age, gender, race, and ethnicity for 142 million people (about 46% of the U.S. population), and correctly match these data to commercial datasets circa 2010 to associate personal-identifying information such as names for 52 million people (17% of the population). This is not specific to the U.S. Census Bureau--such attacks can occur in any setting where statistical information in the form of deidentified data, statistics, or even machine learning models are released. That such attacks are possible was predicted over 15 years ago by a seminal paper by Irit Dinur and Kobbi Nissim12--releasing a sufficiently large number of aggregate statistics with sufficiently high accuracy provides sufficient information to reconstruct the underlying database with high accuracy. The practicality of such a large-scale reconstruction by the U.S. Census Bureau underscores the grand challenge that public organizations, industry, and scientific research faces: How can we safely disseminate results of data analysis on sensitive databases? An emerging answer is differential privacy. An algorithm satisfies differential privacy (DP) if its output is insensitive to adding, removing or changing one record in its input database. DP is considered the "gold standard" for privacy for a number of reasons. It provides a persuasive mathematical proof of privacy to individuals with several rigorous interpretations.25,26 The DP guarantee is composable and repeating invocations of differentially private algorithms lead to a graceful degradation of privacy.

differential privacy, privacy, privacy guarantee, (13 more...)

Communications of the ACM

Country:

North America > United States > New York (0.24)
North America > United States > California > Alameda County > Berkeley (0.14)
North America > United States > North Carolina > Durham County > Durham (0.04)
(2 more...)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Databases (1.00)
Information Technology > Data Science (1.00)
(2 more...)

Add feedback

Event-Driven News Stream Clustering using Entity-Aware Contextual Embeddings

Saravanakumar, Kailash Karthik, Ballesteros, Miguel, Chandrasekaran, Muthu Kumar, McKeown, Kathleen

arXiv.org Artificial IntelligenceJan-26-2021

We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm. Our model uses a combination of sparse and dense document representations, aggregates document-cluster similarity along these multiple representations and makes the clustering decision using a neural classifier. The weighted document-cluster similarity model is learned using a novel adaptation of the triplet loss into a linear classification objective. We show that the use of a suitable fine-tuning objective and external knowledge in pre-trained transformer models yields significant improvements in the effectiveness of contextual embeddings for clustering. Our model achieves a new state-of-the-art on a standard stream clustering dataset of English documents.

proceedings, representation, similarity, (15 more...)

arXiv.org Artificial Intelligence

2101.11059

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
North America > United States > New York > New York County > New York City (0.04)
(15 more...)

Genre: Research Report (1.00)

Industry:

Media > News (0.48)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback

Towards Entity Alignment in the Open World: An Unsupervised Approach

Zeng, Weixin, Zhao, Xiang, Tang, Jiuyang, Li, Xinyi, Luo, Minnan, Zheng, Qinghua

arXiv.org Artificial IntelligenceJan-25-2021

Entity alignment (EA) aims to discover the equivalent entities in different knowledge graphs (KGs). It is a pivotal step for integrating KGs to increase knowledge coverage and quality. Recent years have witnessed a rapid increase of EA frameworks. However, state-of-the-art solutions tend to rely on labeled data for model training. Additionally, they work under the closed-domain setting and cannot deal with entities that are unmatchable. To address these deficiencies, we offer an unsupervised framework that performs entity alignment in the open world. Specifically, we first mine useful features from the side information of KGs. Then, we devise an unmatchable entity prediction module to filter out unmatchable entities and produce preliminary alignment results. These preliminary results are regarded as the pseudo-labeled data and forwarded to the progressive learning framework to generate structural representations, which are integrated with the side information to provide a more comprehensive view for alignment. Finally, the progressive learning framework gradually improves the quality of structural embeddings and enhances the alignment performance by enriching the pseudo-labeled data with alignment results from the previous round. Our solution does not require labeled data and can effectively filter out unmatchable entities. Comprehensive experimental evaluations validate its superiority.

alignment, alignment result, information, (11 more...)

arXiv.org Artificial Intelligence

2101.10535

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
Asia > Japan (0.04)
Asia > China > Hunan Province > Changsha (0.04)

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.69)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.46)

Add feedback