The World Wide Web (WWW) abounds with ever-increasing information on many topics. However, since every user has specific information needs and interests, only a tiny part of the WWW is useful to them. For example, in a family, a mother may wish to "find recipes with salmon as the main ingredient", the father may be interested in "what movie to watch tonight?", and the teenage daughter may be wondering "what is artificial intelligence?". In order for humans to quickly ‘retrieve’ relevant information of interest, they usually search the Web using a search engine such as Google.
Although it sounds simple, information retrieval is a complex field involving many sub-tasks and applications. According to "the father of information retrieval", Gerard Salton, information retrieval is the field concerned with the tasks of structure, analysis, organization, storage, searching, and retrieval of information. Applications include, but are not limited to, web search (i.e., searching the WWW) which is the most common type, where the search is specialized in a specific topic only (e.g., searching for shoes within the football topic implies someone looking for football shoes), enterprise search, which involves searching for documents in a corporate intranet, image search, which is searching for images similar to a given image, product search, which involves searching for products similar to a given product, desktop search, which is searching for relevant files in our personal computer, or mobile search, which typically takes location and time into account. Users can be searching for different kinds of items, such as webpages, emails, scholarly papers, books, news stories, or even social profiles. Furthermore, with the advent of new technologies and modalities like virtual reality, it is likely that the scope of information retrieval will only increase with time.
Regardless of the type of search and the type of the returned item, the goal of every information retrieval algorithm is to take a search query as input, and to quickly find and output a ranked list of relevant items, i.e., items that contain information that the user was looking for. For example, in our family example, the mother may submit a query of the form "find recipes with salmon" and the expected result is an ordered (ranked) list of recipes containing salmon, ordered by how relevant each recipe is to the query. Although a straightforward approach would be for a retrieval algorithm to simply compare the query text with the recipe text, this approach will not always work due to language ambiguity. For example, when someone submits a query containing the single word "jaguar" it is very difficult for any algorithm to determine whether the user is looking for documents about jaguar the animal or jaguar the vehicle brand. To be effective, an information retrieval system needs to pay special attention to the meaning of queries rather than the actual words used in them.
Along with ambiguity, information retrieval faces a number of important challenges e.g., dealing with unstructured information, ensuring that it takes each user's context and expectations into account when returning the results, and dealing with scalability (e.g., search engines typically index and search almost instantly, billions of items, in order to answer each user's query, along with answering more than a trillion queries per year). Researchers are continuing to address these challenges.
- Pigi Kouki
Naver Labs and Foursquare, the US-based search-and-discovery social networking service, will cooperate to develop a POI search engine, the companies announced. The two will also work together to find new business models that use the developed engine. Naver Labs will use Foursquare's various search data to develop the engine, which will read consumer needs. The US social networking firm services include Foursquare City Guide, a suggest and review guide for global cities, and Foursquare Swarm, which allows users to share their experiences of locations they visited. It also operates API, SDK, as well as advertisement solutions.
In this post we will focus on configuring the elasticsearch bit. I have chosen the Wikipedia people dump for the dataset. This is the wiki pages of a subset of people on Wikipedia. This dataset consists of three columns – URI, name, text. As the column names suggest, URI is the actual wiki link to that person's page, name is the person's name.
You might have heard about those ATMs that use facial recognition instead of cards and PIN numbers for authentication. You might also have seen on the news a smart security algorithm that helps police identify suspects and cracks criminal cases. Artificial intelligence (AI), the wiz behind these advanced technologies, is permeating our daily lives--everything from financial services to public safety to healthcare and transportation. YITU Technology, one of China's front-running AI startups, has developed solutions that help solve real-world problems. YITU now has the ability to enable accurate facial recognition with a large database of over 1 billion faces in just one second, and their technology has in fact assisted Chinese law enforcement in criminal investigations.
The internet might seem like a level playing field, but it isn't. Safiya Umoja Noble came face to face with that fact one day when she used Google's search engine to look for subjects her nieces might find interesting. She entered the term "black girls" and came back with pages dominated by pornography.
The explosion of user-generated content on the internet during the last decades has left the world of querying multimedia data with unprecedented challenges. There is a demand for this data to be processed and indexed in order to make it available for different types of queries, whilst ensuring acceptable response times.
It was not long ago that Artificial Intelligence (AI) was only in the realm of science fiction. Today, it has become a reality and is only growing more prominent in many different industries every day. This includes the internet as AI in search engine technology has been around for a few years. The algorithms used to rank pages have been affected considerably by AI already and that trend will continue into the foreseeable future.