Information Retrieval
Google Built AI Software with Human-Like English Skills
Computers don't have the ability to understand English language, and to address this issue, this is where the Google's free AI software dubbed Parsey McParseface, based on SyntaxNet machine learning framework comes in. The search engine giant announced on Thursday that it will release a new piece of software that will understand written English. More interestingly, Google will release the AI software as open-source which means developers can freely use the software to train computer programs with ability to process natural language. Google says its latest innovation will be equipped with Natural Language Understanding (NLU) to parse written English sentences with up to 94 percent of accuracy. Google added that trained human linguists can only achieve up to 96 percent of accuracy. As an AI machine to achieve such milestone, it is a huge breakthrough in the field of artificial intelligence.
Google Removing Payday Loan Ads From Its Search Engine
Search giant Google announced Wednesday that it would cut payday loan providers from its advertising platforms, citing the potentially damaging effects to borrowers of short-term, high-interest cash loans. "Research has shown that these loans can result in unaffordable payment and high default rates for users so we will be updating our policies globally to reflect that," Google's head of global product policy, David Graf,f said in an announcement posted to the company's blog. The average payday loan borrower spends five months of the year in debt, paying more in fees than originally received, according to research compiled by the Pew Charitable Trusts. "Our hope is that fewer people will be exposed to misleading or harmful products," Graff said. The policy change, which follows a similar move by Facebook, won plaudits from advocacy groups concerned with the impact of payday loans on low-income borrowers.
Finding Similar Music using Matrix Factorization
In a previous post I wrote about how to build a'People Who Like This Also Like ...' feature for displaying lists of similar musicians. My goal was to show how simple Information Retrieval techniques can do a good job calculating lists of related artists. For instance, using BM25 distance on The Beatles shows the most similar artists being John Lennon and Paul McCartney. One interesting technique I didn't cover was using Matrix Factorization methods to reduce the dimensionality of the data before calculating the related artists. This kind of analysis can generate matches that are impossible to find with the techniques in my original post.
The search engine for ARGUMENTS: Engineers plan to make a tool to help settle complex political online discussions
These days, many an argument over trivial questions like the year a celebrity died or when an historical event happened can be settled with a quick search on Google. But search engines could soon be used to settle more complex arguments, involving serious political debate. A group of researchers in Germany are looking into ways to use search engines to shed light on the most complex political discussions. In just seconds, digital systems should be able to evaluate millions of documents, such as online discussions on controversial topics like the Transatlantic Trade and Investment Partnership (TTIP). The'Robust Argumentation Machines' project has been set up by Bielefeld University in Germany.
Drill Data with Apache Drill
Apache Drill is a low-latency distributed query engine for large-scale datasets, including structured and semi-structured/nested data. Inspired by Google's Dremel, Drill is designed to scale to several thousands of nodes and query petabytes of data at interactive speeds that BI/Analytics environments require. Apache Drill includes a distributed execution environment, purpose built for large-scale data processing. At the core of Apache Drill is the "Drillbit" service which is responsible for accepting requests from the client, processing the queries, and returning results to the client. When a Drillbit runs on each data node in a cluster, Drill can maximize data locality during query execution without moving data over the network or between nodes.
SlashPixels: an ambitious image search engine for designers
Google is so dominant in the search engine market at large that it becomes hard to launch anything that remotly looks like a search tool. A team of Russian developers decided to still give it a go and focus on a niche market: image search. The team's objective seems very ambitious, create an artificial intelligence based image search engine to help designers find inspiration or resources in an easier way. They promise that SlashPixels will understand each image that it indexes, thus giving it a big advantage when it comes to sort the pictures. Unfortunatly, all this doesn't exist yet, but you can support the team's IndieGogo campaign to help them build this new tool.
Exploiting Crowd-Based Labels for Domain Focused Information Retrieval
Miniter, John Cory (University of Massachusetts Lowell) | Mehta, Vineet (University of Massachusetts Lowell) | Chandra, Kavitha (University of Massachusetts Lowell)
Information search and retrieval from online sources or social forums is often performed with term based boolean queries. Such queries can produce low relevance documents in situations where the user is interested in retrieving in- formation related to a concept, or belonging to a specific domain. In this work an approach for concept-based infor- mation retrieval is presented, which exploits word and doc- ument distributions derived from topic modeling performed on data from online sources. Documents acquired from the Reddit and Stack Exchange online social forums are used for extracting concepts, and subsequently training and testing a detector that aids in identifying and retrieving documents associated with the concept of interest. The selection of training sets for our concept based detector is aided by pre-partitioning of documents by online users (or crowd) into concept focused sub-forums, such as sub-reddits. Topics derived from a sample of the overall document set are taken to represent concepts. These topics then form the basis for identifying sub-forums that have a strong correspondence with the concept of interest, and documents within are assigned (noisy) binary labels. The applicability of our approach is demonstrated by creating a domain focused detector for Cyber Security content from Reddit data. The cross utility of this detector is demonstrated by success- fully retrieving relevant Cyber Security documents from an alternate test online source: Stack Exchange. Document classification results of the proposed approach are compared favorably with classifications performed by human analysts.
China Investigates Search Engine Baidu After Student Dies Of Cancer
Baidu, China's largest search engine, is under investigation after college student with a rare form of cancer said it promoted a fraudulent treatment center. Baidu, China's largest search engine, is under investigation after college student with a rare form of cancer said it promoted a fraudulent treatment center. Chinese health and Internet authorities have launched an investigation into Baidu, the country's largest search engine, following the death of a college student who accused Baidu of misleading him to a fraudulent cancer treatment. Experts believe the scandal will damage the credibility of Baidu's search results, and its long-term economic prospects. On Monday, news of the government investigation caused Baidu's stock to tumble by nearly 8% on the NASDAQ.
Two great ideas to create a much better search engine
When you do a search for "career objectives" on Google India (www.google.in), the first result showing up is from a US-based job board specializing in data mining and analytical jobs. The Google link in question redirects to a page that does not even contain the string "career objective". In short, Google is pushing a US web site that has nothing to do with "career objectives" as the #1 web site for "career objectives" in India. In addition, Google totally failed to recognize that the web site in question is about analytics and data mining.