Information Retrieval
A cross-language search engine enables English monolingual researchers to find relevant foreign-language documents
"About 6,000 languages are currently spoken in the world today," says Elizabeth Salesky of MIT Lincoln Laboratory's Human Language Technology (HLT) Group. "Within the law enforcement community, there are not enough multilingual analysts who possess the necessary level of proficiency to understand and analyze content across these languages," she continues. This problem of too many languages and too few specialized analysts is one Salesky and her colleagues are now working to solve for law enforcement agencies, but their work has potential application for the Department of Defense and Intelligence Community. The research team is taking advantage of major advances in language recognition, speaker recognition, speech recognition, machine translation, and information retrieval to automate language processing tasks so that the limited number of linguists available for analyzing text and spoken foreign languages can be used more efficiently. "With HLT, an equivalent of 20 times more foreign language analysts are at your disposal," says Salesky.
22% of B2B Salespeople will be Replaced by Search Engines by 2020
In Forrester's US B2B eCommerce Forecast: 2015 to 2020 they quote that "74% of B2B buyers research, at least one-half of their work purchases online. With that percentage nearly doubling to 56% by 2017, B2B sellers will see a significant volume of offline business move online in the next few years." Taking those facts further, at the Forrester Sales Enablement Forum, a study by Andy Hoar, Principal Analyst at Forrester, revealed that he expected 22% of B2B Sales jobs go by 2020. With Enterprise purchases taking place more and more online This means the traditional B2B sales person is being replaced by Search Engines, YouTube, websites etc. The Diagram above is Forester's view on what will replace the B2B Salesperson.
Measuring Information Retrieval Performance Using Extrapolated Precision
This is a brief overview of my paper "Information Retrieval Performance Measurement Using Extrapolated Pr...," which I'll be presenting on June 8th at the DESI VI workshop at ICAIL 2015. The paper provides a novel method for extrapolating a precision-recall point to a different level of recall, and advocates making performance comparisons by extrapolating results for all systems to the same level of recall if the systems cannot be evaluated at exactly the same recall. Recall, R, is the proportion of the relevant documents retrieved by the information retrieval (IR) system, and precision, P, is the proportion of retrieved documents that are relevant. It is sometimes desirable to have high recall while also having high precision in order to find most of the relevant documents without having a lot of non-relevant documents mixed in, but higher recall is usually accompanied by lower precision. Some IR systems generate a relevance score for each document, allowing the documents to be sorted so that the ones that are deemed most likely to be relevant appear at the top of the list.
Facebook is using artificial intelligence to become a better search engine
Today, Facebook announced Deep Text, an AI engine it's building to understand the meaning and sentiment behind all of the text posted by users to Facebook. In a blog post, Facebook said that it was building the system to help it surface content that people may be interested in, and weed out spam. This might sound like a minor improvement, but it actually has the potential--in theory--to transform the social network most of us use every day into something else we use daily: a powerful search engine. "We want Deep Text to be used in categorizing content within Facebook to facilitate searching for it and also surfacing the right content to users," Hussein Mehanna, an engineering director at Facebook's machine learning team, told Quartz. The universe of that search may not be the whole worldwide web that Google crawls, but it's still massive.
Scalable machine learning with InsightEdge: mobile advertisement clicks prediction โ InsightEdge
This blog post will provide an introduction into using machine learning algorithms with InsightEdge. We will go through an exercise to predict mobile advertisement click-through rate with Avazu's dataset. There are several compensation models in online advertising industry, probably the most notable is CPC (Cost Per Click), in which an advertiser pays a publisher when the ad is clicked. Search engine advertising is one of the most popular forms of CPC. It allows advertisers to bid for ad placement in a search engine's sponsored links when someone searches on a keyword that is related to their business offering.
What are data scientists interested in? Insights from our search engine data
We've gathered data from our newly created DSC search box, and based on 20,000 search queries over the last four months (most of them in the last 30 days), we discovered that the top queries so far are: The number in parenthesis indicates the number of queries, over the last four months. Note that some keywords have a high number of queries, because they are listed as top queries in one of our popular articles. Starred queries were not promoted in any way. Today we created a new data science search engine, ad-free, where anyone can submit his blog for indexation. We invite you to try it and share it.
Reimagining Search
Ever since Gerard Salton of Cornell University developed the first computerized search engine (Salton's Magical Automatic Retriever of Text, or SMART) in the 1960s, search developers have spent decades essentially refining Salton's idea: take a query string, match it against a collection of documents, then calculate a set of relevant results and display them in a list. All of today's major Internet search engines--including Google, Amazon, and Bing--continue to follow Salton's basic blueprint. Yet as the Web has evolved from a loose-knit collection of academic papers to an ever-expanding digital universe of apps, catalogs, videos, and cat GIFs, users' expectations of search results have shifted. Today, many of us have less interest in sifting through a collection of documents than in getting something done: booking a flight, finding a job, buying a house, making an investment, or any number of other highly focused tasks. Meanwhile, the Web continues to expand at a dizzying pace.