AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Recommendation or Discrimination?: Quantifying Distribution Parity in Information Retrieval Systems

Khaziev, Rinat, Casavant, Bryce, Washabaugh, Pearce, Winecoff, Amy A., Graham, Matthew

arXiv.org Machine LearningSep-13-2019

Information retrieval (IR) systems often leverage query data to suggest relevant items to users. This introduces the possibility of unfairness if the query (i.e., input) and the resulting recommendations unintentionally correlate with latent factors that are protected variables (e.g., race, gender, and age). For instance, a visual search system for fashion recommendations may pick up on features of the human models rather than fashion garments when generating recommendations. In this work, we introduce a statistical test for "distribution parity" in the top-K IR results, which assesses whether a given set of recommendations is fair with respect to a specific protected variable. We evaluate our test using both simulated and empirical results. First, using artificially biased recommendations, we demonstrate the trade-off between statistically detectable bias and the size of the search catalog. Second, we apply our test to a visual search system for fashion garments, specifically testing for recommendation bias based on the skin tone of fashion models. Our distribution parity test can help ensure that IR systems' results are fair and produce a good experience for all users.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1909.06429

Country: North America > United States > New York (0.15)

Genre: Research Report (0.64)

Industry:

Law (0.68)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

d-blink: Distributed End-to-End Bayesian Entity Resolution

Marchant, Neil G., Steorts, Rebecca C., Kaplan, Andee, Rubinstein, Benjamin I. P., Elazar, Daniel N.

arXiv.org Machine LearningSep-13-2019

Entity resolution (ER) (record linkage or de-duplication) is the process of merging together noisy databases, often in the absence of a unique identifier. A major advancement in ER methodology has been the application of Bayesian generative models. Such models provide a natural framework for clustering records to unobserved (latent) entities, while providing exact uncertainty quantification and tight performance bounds. Despite these advancements, existing models do not scale to realistically-sized databases (larger than 1000 records) and they do not incorporate probabilistic blocking. In this paper, we propose "distributed Bayesian linkage" or d-blink -- the first scalable and distributed end-to-end Bayesian model for ER, which propagates uncertainty in blocking, matching and merging. We make several novel contributions, including: (i) incorporating probabilistic blocking directly into the model through auxiliary partitions; (ii) support for missing values; (iii) a partially-collapsed Gibbs sampler; and (iv) a novel perturbation sampling algorithm (leveraging the Vose-Alias method) that enables fast updates of the entity attributes. Finally, we conduct experiments on five data sets which show that d-blink can achieve significant efficiency gains -- in excess of 300$\times$ -- when compared to existing non-distributed methods.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1909.06039

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.84)

Add feedback

Google verticals, machine learning and no-click searches expected to have the biggest impacts on SEO - Search Engine Land

#artificialintelligenceSep-12-2019, 00:41:18 GMT

Google entering verticals and competing directly against publishers, advancements in machine learning and AI and zero-click searches are the trends most likely to affect SEO in the next three years, according to a SparkToro survey of over 1,500 SEOs. Trends that are here to stay? Respondents were presented with a list of choices and asked, "How much of an impact do you believe the following trends will have on SEO in the next 3 years?" Options were ranked on a zero-to-four scale; zero meaning "no impact" and four meaning "huge impact." The trend that professionals responded were least likely to affect SEO included outcomes from US Congressional and Department of Justice investigations, visual search advances and "content-nudging" products such as Google Discover.

artificial intelligence, information retrieval, natural language, (15 more...)

#artificialintelligence

Country: North America > United States (1.00)

Industry: Government > Regional Government > North America Government > United States Government (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

Accelerating Column Generation via Flexible Dual Optimal Inequalities with Application to Entity Resolution

Lokhande, Vishnu Suresh, Wang, Shaofei, Singh, Maneesh, Yarkony, Julian

arXiv.org Artificial IntelligenceSep-12-2019

In this paper, we introduce a new optimization approach to Entity Resolution. Traditional approaches tackle entity resolution with hierarchical clustering, which does not benefit from a formal optimization formulation. In contrast, we model entity resolution as correlation-clustering, which we treat as a weighted set-packing problem and write as an integer linear program (ILP). In this case sources in the input data correspond to elements and entities in output data correspond to sets/clusters. We tackle optimization of weighted set packing by relaxing integrality in our ILP formulation. The set of potential sets/clusters can not be explicitly enumerated, thus motivating optimization via column generation. In addition to the novel formulation, we also introduce new dual optimal inequalities (DOI), that we call flexible dual optimal inequalities, which tightly lower-bound dual variables during optimization and accelerate column generation. We apply our formulation to entity resolution (also called de-duplication of records), and achieve state-of-the-art accuracy on two popular benchmark datasets.

information retrieval, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1909.0546

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

PhD in Computing Science: Emerging information retrieval challenges when processing real-time data streams at University of Glasgow on FindAPhD.com

#artificialintelligenceSep-11-2019, 07:34:50 GMT

Eligibility: Full funding is provided for EU/UK students (standard home/EU fees and stipend rates included). Non-EU/UK students can apply, however they would be required to pay the difference between the home/EU and international fee. Funding is available to cover tuition fees for UK/EU applicants for 3 years, as well as paying a stipend at the Research Council rate (estimated £15,009 for Session 2019-20). FTE Category A staff submitted: 41.60

artificial intelligence, natural language, real time system, (7 more...)

#artificialintelligence

Technology:

Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Sensing and Signal Processing (0.85)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases

Yu, Tao, Zhang, Rui, Er, He Yang, Li, Suyi, Xue, Eric, Pang, Bo, Lin, Xi Victoria, Tan, Yi Chern, Shi, Tianze, Li, Zihan, Jiang, Youxuan, Yasunaga, Michihiro, Shim, Sungrok, Chen, Tao, Fabbri, Alexander, Li, Zifan, Chen, Luyao, Zhang, Yuwen, Dixit, Shreya, Zhang, Vincent, Xiong, Caiming, Socher, Richard, Lasecki, Walter S, Radev, Dragomir

arXiv.org Artificial IntelligenceSep-11-2019

It consists of 30k turns plus 10k annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains. Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions. When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow. CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains. CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from query results, and user dialogue act prediction. We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research. The dataset, baselines, and leaderboard will be released at https:// yale-lily.github.io/cosql .

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

1909.05378

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

Four ways you can use AI to optimize your AdWords campaigns - Search Engine Watch

#artificialintelligenceSep-9-2019, 18:14:40 GMT

Artificial intelligence (AI) and machine learning algorithms are mainstreaming in a way that was never before possible, and these changes are having a significant influence on the way in which marketers need to approach search advertising. In addition to AdWords itself incorporating AI into its framework, new opportunities are arising that can give marketers an edge over their competitors, or automate lower-level tasks, freeing up more time for strategy. Here are four ways you can start taking advantage of AI to make the most of your AdWords campaigns. Automated machine learning as a solution to the decision of what price to bid on paid advertising is becoming an increasingly popular option as the necessary technologies become available to more firms. Bidding too low means missing out on opportunities to reach leads, while bidding too high means sacrificing ROI.

adword campaign, information retrieval, machine learning, (15 more...)

#artificialintelligence

Industry:

Marketing (1.00)
Information Technology > Services (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

How A New Design And Content Helps Your Website To Rank Higher On Search Engines

#artificialintelligenceSep-9-2019, 18:14:09 GMT

Are you looking for ways to increase your website ranking? Trying to figure out how to be on top of the google search engine pages? Well, the information below is going to be your best friend. The website is the heart of any business in this digital world. The most important part of traffic is from organic searches.

artificial intelligence, information retrieval, natural language, (11 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.81)

Add feedback

General Fragment Model for Information Artifacts

Fiorini, Sandro Rama, Santos, Wallas Sousa dos, Mesquita, Rodrigo Costa, Lima, Guilherme Ferreira, Moreno, Marcio F.

arXiv.org Artificial IntelligenceSep-9-2019

The use of semantic descriptions in data intensive domains require a systematic model for linking semantic descriptions with their manifestations in fragments of heterogeneous information and data objects. Such information heterogeneity requires a fragment model that is general enough to support the specification of anchors from conceptual models to multiple types of information artifacts. While diverse proposals of anchoring models exist in the literature, they are usually focused in audiovisual information. We propose a generalized fragment model that can be instantiated to different kinds of information artifacts. Our objective is to systematize the way in which fragments and anchors can be described in conceptual models, without committing to a specific vocabulary.

artificial intelligence, information retrieval, natural language, (17 more...)

arXiv.org Artificial Intelligence

1909.04117

Genre: Research Report (0.40)

Technology:

Information Technology > Information Management (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.48)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)

Add feedback

Endless AI-generated spam risks clogging up Google's search results

#artificialintelligenceSep-6-2019, 23:47:34 GMT

Over the past year, AI systems have made huge strides in their ability to generate convincing text, churning out everything from song lyrics to short stories. Experts have warned that these tools could be used to spread political disinformation, but there's another target that's equally plausible and potentially more lucrative: gaming Google. Instead of being used to create fake news, AI could churn out infinite blogs, websites, and marketing spam. The content would be cheap to produce and stuffed full of relevant keywords. But like most AI-generated text, it would only have surface meaning, with little correspondence to the real world.

artificial intelligence, information retrieval, natural language, (17 more...)

#artificialintelligence

Industry: Media > News (0.57)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.37)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.37)

Add feedback