AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Consensus measure of rankings

Lin, Zhiwei, Li, Yi, Guo, Xiaolian

arXiv.org Artificial IntelligenceSep-21-2017

In many information systems, rankings are widely used to represent the preferences over a set of items or candidates, ranging from information retrieval, recommender to decision making systems [1], [2], [3], [4], [5], [6], in order to improve quality of the services provided by the systems. For example, in search engine, the list of the terms suggested by a search engine after a user's few keystrokes is a typical ranking and such ranking service, widely adopted nowadays, has great impact on user's search experience; it is also recognized that the list of search results is a ranking after a query is issued. A ranking is an ordered sequence of items, in which an item with higher ranking score is more preferred than the items with lower ranking scores. The consensus of rankings is the degree to which the rankings agree according to certain common patterns. The consensus measure, can be used in many information systems, in order to uncover how close or related the rankings are. For example, in the group decision making, a group of experts express their preferences over a set of candidates by using rankings and the measure of the degree of consensus is very useful for reaching consensus [2]. In many information system with large volume of items, such as search engines, it is hard to clearly define what ground truth is, which make it more difficult to evaluate and compare the rankings returned from the systems. The consensus measure of rankings, as a tool for understanding how related or close the rankings are, will help engineers and researchers to discern what aspects of a ranking system need to be improved and to detect outliers [7], [8].

information retrieval, machine learning, ranking, (17 more...)

arXiv.org Artificial Intelligence

1704.08464

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)
Europe > Switzerland (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.95)

Add feedback

The future of search engines: Researchers combine artificial intelligence, crowdsourcing and supercomputers

#artificialintelligenceSep-17-2017, 20:25:11 GMT

The outcome is the result of two powerful forces in the evolution of information retrieval: artificial intelligence--especially natural language processing--and crowdsourcing. Computer algorithms interpret the relationship between the words we type and the vast number of possible web pages based on the frequency of linguistic connections in the billions of texts on which the system has been trained. But that is not the only source of information. The semantic relationships get strengthened by professional annotators who hand-tune results--and the algorithms that generate them--for topics of importance, and by web searchers (us) who, in our clicks, tell the algorithms which connections are the best ones. Despite the incredible, world-changing success of this model, it has its flaws.

artificial intelligence, information retrieval, natural language, (14 more...)

#artificialintelligence

Country:

North America > United States > Texas > Travis County > Austin (0.05)
North America > United States > California > Alameda County > Berkeley (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Genre: Research Report (0.30)

Industry:

Health & Medicine (0.49)
Government > Regional Government > North America Government > United States Government (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
Information Technology > Communications > Social Media > Crowdsourcing (0.63)

Add feedback

Unsupervised, Efficient and Semantic Expertise Retrieval

Van Gysel, Christophe, de Rijke, Maarten, Worring, Marcel

arXiv.org Artificial IntelligenceSep-17-2017

We introduce an unsupervised discriminative model for the task of retrieving experts in online document collections. We exclusively employ textual evidence and avoid explicit feature engineering by learning distributed word representations in an unsupervised way. We compare our model to state-of-the-art unsupervised statistical vector space and probabilistic generative approaches. Our proposed log-linear model achieves the retrieval performance levels of state-of-the-art document-centric methods with the low inference cost of so-called profile-centric approaches. It yields a statistically significant improved ranking over vector space and generative models in most cases, matching the performance of supervised methods on various benchmarks. That is, by using solely text we can do as well as methods that work with external evidence and/or relevance feedback. A contrastive analysis of rankings produced by discriminative and generative approaches shows that they have complementary strengths due to the ability of the unsupervised discriminative model to perform semantic matching.

information retrieval, log-linear model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/2872427.2882974

1608.06651

Country: North America > Canada (0.28)

Genre: Research Report > Experimental Study (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(3 more...)

Add feedback

Web Page Ranking using Machine Learning

@machinelearnbotSep-15-2017, 20:37:05 GMT

Example- List of URLS listed for a search query in search engine Experiments are conducted using real web services datasets and the outcome of the experiments using machine learning confirms an improvement over existing methods in Page Ranking. Supervised Learning algorithms are, K-Nearest Neighbour Ranking Static Ranking 8. KNN RANKING Many supervised learning problems are "classification" problems. KNN is one type of many different classification algorithms. The sheer number of both good and bad pages on the Web has led to an increasing reliance on search engines for the discovery of useful information. Users rely on search engines not only to return pages related to their search query, but also to separate the good from the bad, and order results so that the best pages are suggested first.

information retrieval, machine learning, ranking, (18 more...)

@machinelearnbot

Industry: Health & Medicine (0.31)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.96)

Add feedback

Implementing kd-tree for fast range-search, nearest-neighbor search and k-nearest-neighbor search algorithms in 2D in Java and python

@machinelearnbotSep-14-2017, 05:15:11 GMT

The following problem appeared as an assignment in the coursera course Algorithm-I by Prof.Robert Sedgewick from the Princeton University few years back (and also in the course cos226 offered at Princeton). The problem definition and the description is taken from the course website and lectures. The original assignment was to be done in java, where in this article both the java and a corresponding python implementation will also be described. The idea is to build a BST with points in the nodes, using the x– and y-coordinates of the points as keys in strictly alternating sequence, starting with the x-coordinates, as shown in the next figure. The following figures and animations show how the 2-d-tree is grown with recursive space-partioning for a few sample datasets.

information retrieval, machine learning, natural language, (14 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.85)

Add feedback

Google appeals against EU's €2.4bn fine over search engine results

The GuardianSep-11-2017, 17:44:06 GMT

Google is appealing against the record €2.4bn (£2.2bn) fine imposed by the European Union for its abuse of its dominance of the search engine market in building its shopping comparison service. The world's most popular internet search engine has launched its appeal after it was fined by the European commission for what was described as an "old school" form of illegality. The Luxembourg-based general court, Europe's second-highest, is expected to take several years before ruling on Google's appeal, which had been widely expected. The Silicon Valley giant had responded to the fine at the time of its announcement by saying that it "respectfully" disagreed with the legal argument being pursued. A spokesman for the commission said: "The commission will defend its decision in court."

artificial intelligence, information retrieval, natural language, (9 more...)

The Guardian

Country:

North America > United States > California (0.26)
Europe > United Kingdom (0.06)
Europe > Germany (0.06)
Europe > France (0.06)

Industry:

Government > Regional Government > Europe Government (0.93)
Law (0.75)

Technology:

Information Technology > Information Management > Search (0.95)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.84)

Add feedback

Cost Based Optimizer in Apache Spark 2.2 - The Databricks Blog

@machinelearnbotSep-5-2017, 08:35:03 GMT

This is a joint engineering effort between Databricks' Apache Spark engineering team (Sameer Agarwal and Wenchen Fan) and Huawei's engineering team (Ron Hu and Zhenhua Wang) Apache Spark 2.2 recently shipped with a state-of-art cost-based optimization framework that collects and leverages a variety of per-column data statistics (e.g., cardinality, number of distinct values, NULL values, max/min, average/max length, etc.) to improve the quality of query execution plans. Leveraging these statistics helps Spark to make better decisions in picking the most optimal query plan. Examples of these optimizations include selecting the correct build side in a hash-join, choosing the right join type (broadcast hash-join vs. shuffled hash-join) or adjusting a multi-way join order, among others. In this blog, we'll take a deep dive into Spark's Cost Based Optimizer (CBO) and discuss how Spark collects and stores these statistics, optimizes queries, and show its performance impact on TPC-DS benchmark queries. At its core, Spark's Catalyst optimizer is a general library for representing query plans as trees and sequentially applying a number of optimization rules to manipulate them.

artificial intelligence, information retrieval query processing, natural language, (16 more...)

@machinelearnbot

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Add feedback

How Medical Search Technology Relies on Google Alphabet and Big Data

#artificialintelligenceAug-28-2017, 16:15:14 GMT

One aspect of Artificial Intelligence is an effort to build machines and to advance technology using Google Alphabet that can learn from environments, from mishaps, and from real-life user experience to help individuals seeking a medical diagnosis. This takes advantage of Google's intelligent medical search engine. A lot of research and testing goes into finding the right path and the right breakthrough. Google CEO Sundar Pichai said in a company's annual Founders' Letter to stockholders back in April, "This is another important step toward creating artificial intelligence that can help us in everything from accomplishing our daily tasks and travels to eventually tackling even bigger challenges like climate change and cancer diagnosis." He cited examples such as voice search, translation tools, and image recognition; he spoke about how Google scientists work to build products that improve over time, making them increasingly useful and helpful to the human race. U.S. Internet users can now search Google for help sorting out medical symptoms and not just actual conditions. While it may be surprising the number of individuals who ask Google to help to diagnose ailments, Google's mobile site, as well as its iOS and Android apps, now have a feature that that proposes to track down information on medical symptoms. Instead of having to search for a medical condition, an individual can search for a certain symptom, such as "I have a pounding headache."

data mining, information retrieval, machine learning, (19 more...)

#artificialintelligence

Country: North America > United States > California (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry:

Information Technology > Services (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
(5 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Applied AI (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

Add feedback

Digital Marketing and Machine Learning Smart Insights

#artificialintelligenceAug-28-2017, 02:45:12 GMT

The launch of Google's new machine learning tool, RankBrain which contributes to search engine results, left many people wondering what impact machine learning would have in the realm of Search Engine Optimization (SEO). With the tech industry going crazy for all things Artificial Intelligence (AI), Natural Language Processing (NLP), machine learning, and chatbots, it's important to know what the technology is, where it's going, and what impact it will have on digital marketing as a whole. This article will explain these concepts as well as share some tips on how to adapt to machine learning. Explains how businesses can harness AI with a focus on marketing automation and email marketing. Machine learning is, in fact, not new to the tech world.

information retrieval, machine learning, natural language, (15 more...)

#artificialintelligence

Industry: Marketing (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.84)

Add feedback

Text Retrieval and Search Engines Coursera

@machinelearnbotAug-17-2017, 03:50:25 GMT

About this course: Recent years have seen a dramatic growth of natural language text data, including web pages, news articles, scientific literature, emails, enterprise documents, and social media such as blog articles, forum posts, product reviews, and tweets. Text data are unique in that they are usually generated directly by humans rather than a computer system or sensors, and are thus especially valuable for discovering knowledge about people's opinions and preferences, in addition to many other kinds of knowledge that we encode in text. This course will cover search engine technologies, which play an important role in any data mining applications involving text data for two reasons. First, while the raw data may be large for any particular problem, it is often a relatively small subset of the data that are relevant, and a search engine is an essential tool for quickly discovering a small subset of relevant text data in a large text collection. Second, search engines are needed to help analysts interpret any patterns discovered in the data by allowing them to examine the relevant original text data to make sense of any discovered pattern.

artificial intelligence, information retrieval, natural language, (6 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback