Information Retrieval


Google Search Console unparsable structured data report data issue - Search Engine Land

#artificialintelligence

Google has informed us that you may see a spike in errors in the unparsable structured data report within Google Search Console. This is a bug in the reporting system and you do not need to worry. The issue happened between January 13, 2020 and January 16, 2020. Google wrote on the data anomalies page "Some users may see a spike in unparsable structured data errors. This was due to an internal misconfiguration that will be fixed soon, and can be ignored."


On Making A Multilingual Search Engine

#artificialintelligence

You can read more about USE in this paper. Let's first read the data. Because the quora dataset is huge and takes a lot of time, we will take only 1% of the data. This will take around 3 minutes for encoding and indexing. It will have 4000 questions.


University of Warwick Job Search: Research Fellow or Senior Research Fellow (102493-0120)

#artificialintelligence

Research Fellow or Senior Research Fellow (Deep Learning for Health Trajectory Perdiction) The full-time fixed term post is available until 31st March 2023 (approximately 3 years). You will work with the Principal Investigator (Dr Leandro Pecchia), the project partners and the Warwick GATEKEEPER team for the successful execution of the project. Further information on the project can be read here https://www.gatekeeper-project.eu/ You will have a PhD in Biomedical Engineering or in a relevant discipline (e.g., Computer Science, Information Engineering, Applied Math or similar disciplines). The level of appointment (Research or Senior Research Fellow) will be determined by the successful candidate--s skills and experience, including a proven ability and achievement in research and the ability to generate external funding to support research projects.


Privacy concerns over Russia's 'most popular search engine' Yandex as its uses facial recognition

Daily Mail - Science & tech

A Russian search engine is being accused of providing an unregulated facial recognition system to members of the public -- violating personal privacy. Experts have slammed the feature as'poor' and'creepy' while dubbing it a'definite privacy concern'. Yandex, much like Google, Bing and other search engines, allows users to input an image and see similar results. But only Yandex, which claims to conduct more than 50 per cent of Russian searches on Android, produces images of the exact same person. MailOnline tested the image search facilities of Yandex, Bing, Google and specialist site TinEye by submitting a photo that was not available online.


Verizon launches 'privacy-focused' search engine leaving some skeptical because of the firm's past

Daily Mail - Science & tech

There is a new internet watchdog in town and it is powered by Verizon. The tech giant released a'privacy-focused' search engine, called OneSearch, which encrypts searches, leaves results unfiltered and claims to not store or transfer user information. The platform is also Advanced Privacy Mode enabled, meaning all search result links expire within an hour. However, some users are suspicions about the platform, as Verizon has come under fire in the past for its tracking customers as on the internet without permission. Verizon launched a'privacy-focused' search engine, called OneSearch.


Query Complexity of Derivative-Free Optimization

Neural Information Processing Systems

Derivative Free Optimization (DFO) is attractive when the objective function's derivatives are not available and evaluations are costly. Moreover, if the function evaluations are noisy, then approximating gradients by finite differences is difficult. This paper gives quantitative lower bounds on the performance of DFO with noisy function evaluations, exposing a fundamental and unavoidable gap between optimization performance based on noisy evaluations versus noisy gradients. This challenges the conventional wisdom that the method of finite differences is comparable to a stochastic gradient. However, there are situations in which DFO is unavoidable, and for such situations we propose a new DFO algorithm that is proved to be near optimal for the class of strongly convex objective functions.


Theoretical Analysis of Heuristic Search Methods for Online POMDPs

Neural Information Processing Systems

Planning in partially observable environments remains a challenging problem, despite significant recent advances in offline approximation techniques. A few online methods have also been proposed recently, and proven to be remarkably scalable, but without the theoretical guarantees of their offline counterparts. Thus it seems natural to try to unify offline and online techniques, preserving the theoretical properties of the former, and exploiting the scalability of the latter. In this paper, we provide theoretical guarantees on an anytime algorithm for POMDPs which aims to reduce the error made by approximate offline value iteration algorithms through the use of an efficient online searching procedure. The algorithm uses search heuristics based on an error analysis of lookahead search, to guide the online search towards reachable beliefs with the most potential to reduce error.


Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

Neural Information Processing Systems

We propose a model that leverages the millions of clicks received by web search engines, to predict document relevance. This allows the comparison of ranking functions when clicks are available but complete relevance judgments are not. After an initial training phase using a set of relevance judgments paired with click data, we show that our model can predict the relevance score of documents that have not been judged. These predictions can be used to evaluate the performance of a search engine, using our novel formalization of the confidence of the standard evaluation metric discounted cumulative gain (DCG), so comparisons can be made across time and datasets. This contrasts with previous methods which can provide only pair-wise relevance judgements between results shown for the same query.


Linear Submodular Bandits and their Application to Diversified Retrieval

Neural Information Processing Systems

Diversified retrieval and online learning are two core research areas in the design of modern information retrieval systems.In this paper, we propose the linear submodular bandits problem, which is an online learning setting for optimizing a general class of feature-rich submodular utility models for diversified retrieval. We present an algorithm, called LSBGREEDY, and prove that it efficiently converges to a near-optimal model. As a case study, we applied our approach to the setting of personalized news recommendation, where the system must recommend small sets of news articles selected from tens of thousands of available articles each day. In a live user study, we found that LSBGREEDY significantly outperforms existing online learning approaches. Papers published at the Neural Information Processing Systems Conference.


Active Learning Ranking from Pairwise Preferences with Almost Optimal Query Complexity

Neural Information Processing Systems

Given a set $V$ of $n$ elements we wish to linearly order them using pairwise preference labels which may be non-transitive (due to irrationality or arbitrary noise). The goal is to linearly order the elements while disagreeing with as few pairwise preference labels as possible. Our performance is measured by two parameters: The number of disagreements (loss) and the query complexity (number of pairwise preference labels). Our algorithm adaptively queries at most $O(n\poly(\log n,\eps {-1}))$ preference labels for a regret of $\eps$ times the optimal loss. This is strictly better, and often significantly better than what non-adaptive sampling could achieve.