Goto

Collaborating Authors

 Information Retrieval


Consensus measure of rankings

arXiv.org Artificial Intelligence

In many information systems, rankings are widely used to represent the preferences over a set of items or candidates, ranging from information retrieval, recommender to decision making systems [1], [2], [3], [4], [5], [6], in order to improve quality of the services provided by the systems. For example, in search engine, the list of the terms suggested by a search engine after a user's few keystrokes is a typical ranking and such ranking service, widely adopted nowadays, has great impact on user's search experience; it is also recognized that the list of search results is a ranking after a query is issued. A ranking is an ordered sequence of items, in which an item with higher ranking score is more preferred than the items with lower ranking scores. The consensus of rankings is the degree to which the rankings agree according to certain common patterns. The consensus measure, can be used in many information systems, in order to uncover how close or related the rankings are. For example, in the group decision making, a group of experts express their preferences over a set of candidates by using rankings and the measure of the degree of consensus is very useful for reaching consensus [2]. In many information system with large volume of items, such as search engines, it is hard to clearly define what ground truth is, which make it more difficult to evaluate and compare the rankings returned from the systems. The consensus measure of rankings, as a tool for understanding how related or close the rankings are, will help engineers and researchers to discern what aspects of a ranking system need to be improved and to detect outliers [7], [8].


The future of search engines: Researchers combine artificial intelligence, crowdsourcing and supercomputers

#artificialintelligence

The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence--especially natural language processing--and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results--and the algorithms that generate them--for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws.


Unsupervised, Efficient and Semantic Expertise Retrieval

arXiv.org Artificial Intelligence

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.


Web Page Ranking using Machine Learning

@machinelearnbot

Example- List of URLS listed for a search query in search engine Experiments are conducted using real web services datasets and the outcome of the experiments using machine learning confirms an improvement over existing methods in Page Ranking. Supervised Learning algorithms are, K-Nearest Neighbour Ranking Static Ranking 8. KNN RANKING Many supervised learning problems are "classification" problems. KNN is one type of many different classification algorithms. The sheer number of both good and bad pages on the Web has led to an increasing reliance on search engines for the discovery of useful information. Users rely on search engines not only to return pages related to their search query, but also to separate the good from the bad, and order results so that the best pages are suggested first.


Implementing kd-tree for fast range-search, nearest-neighbor search and k-nearest-neighbor search algorithms in 2D in Java and python

@machinelearnbot

The following problem appeared as an assignment in the coursera course Algorithm-I by Prof.Robert Sedgewick from the Princeton University few years back (and also in the course cos226 offered at Princeton). The problem definition and the description is taken from the course website and lectures. The original assignment was to be done in java, where in this article both the java and a corresponding python implementation will also be described. The idea is to build a BST with points in the nodes, using the xโ€“ and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates, as shown in the next figure. The following figures and animations show how the 2-d-tree is grown with recursive space-partioning for a few sample datasets.


Google appeals against EU's โ‚ฌ2.4bn fine over search engine results

The Guardian

Google is appealing against the record โ‚ฌ2.4bn (ยฃ2.2bn) fine imposed by the European Union for its abuse of its dominance of the search engine market in building its shopping comparison service. The world's most popular internet search engine has launched its appeal after it was fined by the European commission for what was described as an "old school" form of illegality. The Luxembourg-based general court, Europe's second-highest, is expected to take several years before ruling on Google's appeal, which had been widely expected. The Silicon Valley giant had responded to the fine at the time of its announcement by saying that it "respectfully" disagreed with the legal argument being pursued. A spokesman for the commission said: "The commission will defend its decision in court."


Cost Based Optimizer in Apache Spark 2.2 - The Databricks Blog

@machinelearnbot

This is a joint engineering effort between Databricks' Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei's engineering team (Ron Hu and Zhenhua Wang) Apache Spark 2.2 recently shipped with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, average/max length, etc.) to improve the quality of query execution plans. Leveraging these statistics helps Spark to make better decisions in picking the most optimal query plan. Examples of these optimizations include selecting the correct build side in a hash-join, choosing the right join type (broadcast hash-join vs. shuffled hash-join) or adjusting a multi-way join order, among others. In this blog, we'll take a deep dive into Spark's Cost Based Optimizer (CBO) and discuss how Spark collects and stores these statistics, optimizes queries, and show its performance impact on TPC-DS benchmark queries. At its core, Spark's Catalyst optimizer is a general library for representing query plans as trees and sequentially applying a number of optimization rules to manipulate them.


How Medical Search Technology Relies on Google Alphabet and Big Data

#artificialintelligence

One aspect of Artificial Intelligence is an effort to build machines and to advance technology using Google Alphabet that can learn from environments, from mishaps, and from real-life user experience to help individuals seeking a medical diagnosis. This takes advantage of Google's intelligent medical search engine. A lot of research and testing goes into finding the right path and the right breakthrough. Google CEO Sundar Pichai said in a company's annual Founders' Letter to stockholders back in April, "This is another important step toward creating artificial intelligence that can help us in everything from accomplishing our daily tasks and travels to eventually tackling even bigger challenges like climate change and cancer diagnosis." He cited examples such as voice search, translation tools, and image recognition; he spoke about how Google scientists work to build products that improve over time, making them increasingly useful and helpful to the human race. U.S. Internet users can now search Google for help sorting out medical symptoms and not just actual conditions. While it may be surprising the number of individuals who ask Google to help to diagnose ailments, Google's mobile site, as well as its iOS and Android apps, now have a feature that that proposes to track down information on medical symptoms. Instead of having to search for a medical condition, an individual can search for a certain symptom, such as "I have a pounding headache."


Digital Marketing and Machine Learning Smart Insights

#artificialintelligence

The launch of Google's new machine learning tool, RankBrain which contributes to search engine results, left many people wondering what impact machine learning would have in the realm of Search Engine Optimization (SEO). With the tech industry going crazy for all things Artificial Intelligence (AI), Natural Language Processing (NLP), machine learning, and chatbots, it's important to know what the technology is, where it's going, and what impact it will have on digital marketing as a whole. This article will explain these concepts as well as share some tips on how to adapt to machine learning. Explains how businesses can harness AI with a focus on marketing automation and email marketing. Machine learning is, in fact, not new to the tech world.


Text Retrieval and Search Engines Coursera

@machinelearnbot

About this course: Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people's opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern.