Goto

Collaborating Authors

 Information Retrieval


Chatbots could one day replace search engines. Here's why that's a terrible idea.

MIT Technology Review

To support MIT Technology Review's journalism, please consider becoming a subscriber. But critics are starting to push back, arguing that the approach is wrong-headed. Asking computers a question and getting an answer in natural language can hide complexity behind a veneer of authority that is not deserved. "We got too bogged down by what we could do; we haven't looked at what we should do," says Chirag Shah at the University of Washington, who works on search technologies. On March 14, Shah and his University of Washington colleague Emily M. Bender, who studies computational linguistics and ethical issues in natural-language processing, published a paper that criticizes what they see as a rush to embrace language models for tasks they are not designed to address.


Here is a CheatSheet for DuckDuckGo - the search engine that doesn't track you - Techglimpse

#artificialintelligence

Wouldn't it be handy to have a cheat-sheet for a search engine you use daily? That too for a search engine that has tons of goodness to make life easy and display instant results?! Yes, we too thought the same and here is a smart sheet that can actually help you to get better and desired search results easily and quickly. If you are a regular user of DuckDuckGo, scroll to get the cheat-sheet, for others, here is a quick briefing about the search engine. DuckDuckGo is a search engine just like Google, Yahoo, Bing and so on! But, there is a great difference and the difference is what we treasure โ€“ privacy!


Building a Knowledge Graph for Job Search using BERT Transformer - DataScienceCentral.com

#artificialintelligence

While the natural language processing (NLP) field has been growing at an exponential rate for the last two years -- thanks to the development of transfer-based models -- their applications have been limited in scope for the job search field. LinkedIn, the leading company in job search and recruitment, is a good example. While I hold a Ph.D. in Material Science and a Master in Physics, I am receiving job recommendations such as Technical Program Manager at MongoDB and a Go Developer position at Toptal which are both web developing companies that are not relevant to my background. This feeling of irrelevancy is shared by many users and is a cause of big frustration. In general, however, traditional job search engines are based on simple keyword and/or semantic similarities that are usually not well suited to providing good job recommendations since they don't take into account the interlinks between entities.


OCR quality affects perceived usefulness of historical newspaper clippings -- a user study

arXiv.org Artificial Intelligence

Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so far been studied in data-oriented scenarios regarding the effectiveness of retrieval results. Such studies have either focused on the effects of artificially degraded OCR quality (see, e.g., [1-2]) or utilized test collections containing texts based on authentic low quality OCR data (see, e.g., [3]). In this paper the effects of OCR quality are studied in a user-oriented information retrieval setting. Thirty-two users evaluated subjectively query results of six topics each (out of 30 topics) based on pre-formulated queries using a simulated work task setting. To the best of our knowledge our simulated work task experiment is the first one showing empirically that users' subjective relevance assessments of retrieved documents are affected by a change in the quality of optically read text. Users of historical newspaper collections have so far commented effects of OCR'ed data quality mainly in impressionistic ways, and controlled user environments for studying effects of OCR quality on users' relevance assessments of the retrieval results have so far been missing. To remedy this The National Library of Finland (NLF) set up an experimental query environment for the contents of one Finnish historical newspaper, Uusi Suometar 1869-1918, to be able to compare users' evaluation of search results of two different OCR qualities for digitized newspaper articles. The query interface was able to present the same underlying document for the user based on two alternatives: either based on the lower OCR quality, or based on the higher OCR quality, and the choice was randomized. The users did not know about quality differences in the article texts they evaluated. The main result of the study is that improved optical character recognition quality affects perceived usefulness of historical newspaper articles significantly. The mean average evaluation score for the improved OCR results was 7.94% higher than the mean average evaluation score of the old OCR results.


Fashion Image Search Engine - AI Summary

#artificialintelligence

Introduction Computers are able to see, hear and learn. Welcome to the future.Dave Waters In this post, I want to talk about a computer vision use case, it's called Content Based Image Retrieval or CBIR in short. In simple words, retrievingย imagesย relevant to the user needs fromย imageย databases on the basis of low-level visual features. Image Searchโ€ฆ


Scraping Data from Google Search Using Python and Scrapy - DataScienceCentral.com

#artificialintelligence

Scraping Google SERPs (search engine result pages) is as straightforward or as complicated as the tools we use. For this tutorial, we'll be using Scrapy, a web scraping framework designed for Python. Python and Scrapy combine to create a powerful duo that we can use to scrape almost any website. Scrapy has many useful built-in features that will make scraping Google a walk in the park without compromising any data we would like to scrape. For example, with Scrapy all it takes is a single command to format our data as CSV or JSON files โ€“ a process we would have to code ourselves otherwise.


Is Google's Search Engine Powered by Artificial Intelligence?

#artificialintelligence

Have you ever wondered when you are searching for something, how did the search engine know what I am searching for this fast? Google's search engine isn't just a line of logic and cookies, Google's search algorithm incorporates machine learning, Artificial intelligence, and natural language processing to improve search every day. For search results to be accurate, Google uses an AI algorithm that can understand what the user is trying to say when they make the search query. This algorithm was launched by Google in October 2015, six years ago. I will discuss in this article how AI got its start, the difference between AI, machine learning, and deep learning, how did AI affect Google's search engine forever, and what google's RankBrain and beyond is all about.


SeMI Technologies' search engine opens up new ways to query your data

#artificialintelligence

It is a unique type of AI-first database using machine learning models outputting vectors, also known as embeddings, hence the name vector searchย โ€ฆ


Batched Dueling Bandits

arXiv.org Machine Learning

The K-armed dueling bandits problem has been widely studied in machine learning due to its applications in search ranking, recommendation systems, sports ranking, etc. [3, 14, 16, 26, 29, 30, 34, 38, 41, 43-46]. It is a variation of the traditional stochastic bandit problem in which feedback is obtained in the form of pairwise preferences. This problem falls under the umbrella of preference learning [39], where the goal is to learn from relative feedback (in our case, given two alternatives, which of the two is preferred). Designing learning algorithms for such relative feedback becomes crucial in domains where qualitative feedback is easily obtained, but real-valued feedback would be arbitrary or not interpretable. We illustrate this using the web-search ranking application. Web-search ranking is an example of a complex information retrieval system, where the goal is to provide a list (usually ranked) of candidate documents to the user of the system in response to a query [25, 27, 33, 42]. Modern day search engines comprise hundreds of parameters which are used to output a ranked list in response to a query. However, manually tuning these parameters can sometimes be infeasible, and online learning frameworks (based on user feedback) have been invaluable in automatically tuning these parameters [31]. These methods do not affect user experience, enable the system to continuously learn about user preferences, and thus continuously adapt to user behavior.


Interactive Visual Pattern Search on Graph Data via Graph Representation Learning

arXiv.org Machine Learning

Graphs are a ubiquitous data structure to model processes and relations in a wide range of domains. Examples include control-flow graphs in programs and semantic scene graphs in images. Identifying subgraph patterns in graphs is an important approach to understanding their structural properties. We propose a visual analytics system GraphQ to support human-in-the-loop, example-based, subgraph pattern search in a database containing many individual graphs. To support fast, interactive queries, we use graph neural networks (GNNs) to encode a graph as fixed-length latent vector representation, and perform subgraph matching in the latent space. Due to the complexity of the problem, it is still difficult to obtain accurate one-to-one node correspondences in the matching results that are crucial for visualization and interpretation. We, therefore, propose a novel GNN for node-alignment called NeuroAlign, to facilitate easy validation and interpretation of the query results. GraphQ provides a visual query interface with a query editor and a multi-scale visualization of the results, as well as a user feedback mechanism for refining the results with additional constraints. We demonstrate GraphQ through two example usage scenarios: analyzing reusable subroutines in program workflows and semantic scene graph search in images. Quantitative experiments show that NeuroAlign achieves 19-29% improvement in node-alignment accuracy compared to baseline GNN and provides up to 100x speedup compared to combinatorial algorithms. Our qualitative study with domain experts confirms the effectiveness for both usage scenarios.