Information Retrieval
A search engine just for science visualizations
In 1973, the statistician Francis Anscombe devised a fascinating demonstration showing why data should always be plotted before it is analyzed. The demonstration consisted of four data sets that had almost identical statistical properties. By this measure they are essentially the same. But when plotted, the data sets look entirely different. Anscombe's quartet, as it has become known, shows how good graphics allow people to analyze data in a different way, to think and talk about it on another level. Most scientists recognize the importance of good graphics for communicating complex ideas.
A Semi-supervised learning approach to enhance health care Community-based Question Answering: A case study in alcoholism
Wongchaisuwat, Papis, Klabjan, Diego, Jonnalagadda, Siddhartha R.
Community-based Question Answering (CQA) sites play an important role in addressing health information needs. However, a significant number of posted questions remain unanswered. Automatically answering the posted questions can provide a useful source of information for online health communities. In this study, we developed an algorithm to automatically answer health-related questions based on past questions and answers (QA). We also aimed to understand information embedded within online health content that are good features in identifying valid answers. Our proposed algorithm uses information retrieval techniques to identify candidate answers from resolved QA. In order to rank these candidates, we implemented a semi-supervised leaning algorithm that extracts the best answer to a question. We assessed this approach on a curated corpus from Yahoo! Answers and compared against a rule-based string similarity baseline. On our dataset, the semi-supervised learning algorithm has an accuracy of 86.2%. UMLS-based (health-related) features used in the model enhance the algorithm's performance by proximately 8 %. A reasonably high rate of accuracy is obtained given that the data is considerably noisy. Important features distinguishing a valid answer from an invalid answer include text length, number of stop words contained in a test question, a distance between the test question and other questions in the corpus as well as a number of overlapping health-related terms between questions. Overall, our automated QA system based on historical QA pairs is shown to be effective according to the data set in this case study. It is developed for general use in the health care domain which can also be applied to other CQA sites.
SEO and Artificial Intelligence? How the Future is Coming Faster than You Think
With search engine optimization (SEO) maintaining its presence as one of the most effective marketing techniques on the Internet, it's no wonder that over time, new technical changes in algorithms have made the task of ranking both sites and keywords more difficult to understand and execute. While this used to be an easy task that involved including keywords into the content of your website, Google has continually progressed its ranking technology to provide more quality results to its users. With the launch of RankBrain, Google's new algorithm solution, earlier this year, has it stepped into the field of artificial intelligence? One might think so, as RankBrain queries for so much more than the keywords of the past, taking SEO to the next level. Understanding RankBrain, boils down into recognizing that Google wants to provide users ranking content that is relevant. The way users are using Google to search has changed, as now it is seeing complete questions and more complicated keyword searches being inputted.
Best kept machine learning secret in security
The allure of using machine learning in data security comes from its ability to generalize attack detection based on historical data and to detect attacks that would not be obvious otherwise. Machine learning in security analytics is gaining widespread adoption, and the security analytics market is projected to hit 7.1 billion by 2020. The biggest challenge in using machine learning for data security has to do with triaging, or prioritizing, alerts effectively. In my last post, I explored how to prevent false alerts in data security. Here, we'll explore how a generalizable algorithm-based system can detect security breaches, using ranking algorithms from the information retrieval domain.
NLP in the Cloud: Measuring the Quality of NLP APIs
Natural Language Processing seems to have become somewhat of a commodity in recent years. More than a few companies have sprung up that offer basic NLP capabilities through a cloud API. If you'd like to know whether a text carries a positive or negative message, or what people or companies it mentions, you can just send it to one of these black boxes, and receive the answer in less than a second. Superficially, all these NLP APIs look more or less the same. Textrazor, AlchemyAPI, Aylien, MeaningCloud and Lexalytics all offer similar services (named entity recognition, sentiment analysis, keyword extraction, topic identification, etc.), and do so through similar interfaces.
China tells search engines to ID paid results after man died
BEIJING โ China has issued new regulations demanding that search engines clearly identify paid search results, months after a terminally ill cancer patient complained that he was misled by the giant search engine Baidu. Wei Zexi, a college student who died in April of a rare cancer, had written a long post on a Chinese website detailing how he was led to a Beijing hospital for treatments after searching on Baidu. He said that the treatment turned out to be ineffective and expensive and that later he learned the therapy was yet to be fully approved. Wei accused Baidu of taking money to promote less proven treatments. The Cyberspace Administration of China (CAC) announced on its website Saturday the new regulations, which also ban search engines from showing subversive content and obscene information.
The Divided Kingdom: a machine learning analysis on the Brexit result MonkeyLearn Blog
Today was a day for the history books. The UK has voted to leave the European Union and opened a deep crack in the heart of Europe. As a consequence of this result, Prime Minister David Cameron will step down by October urging for a fresh leadership. At this point nobody knows the repercussions of these results. Will the Brexit hurt the economy of the UK and ignite a new recession?
China Tightens Internet Rules For Search Engines, Announces Fresh Regulations For Paid Ads
In what is being perceived as another attempt to tighten its control over the internet, China's internet regulator on Saturday announced new rules that ban search engines from showing subversive information and obligate them to clearly identify paid results. The new regulations, which will take effect from Aug. 1, come close on the heels of the death of a 21-year-old college student, who is believed to have undergone an unapproved, experimental cancer treatment he found using the search engine Baidu. "Some search results lack objectivity and fairness, go against corporate morals and standards, misleading and influencing people's judgment," the Cyberspace Administration of China -- the country's internet regulator -- reportedly said. "Internet search providers should earnestly accept corporate responsibility toward society, and strengthen their own management in accordance with the law and rules, to provide objective, fair and authoritative search results to users." In addition, search engines would also be required to censor "rumors, obscenities, pornography, violence, murder, terrorism and other illegal information" -- regulations that the Chinese government claims are needed to safeguard the security of its citizens.
Meet RankBrain, the New AI Behind Google's Search Results
As we all know, Google is constantly looking to provide more relevant results for its users -- hence, the regular algorithm updates that frequently frustrate webmasters and anyone else's SEO efforts. The Pagerank algorithm that founders Sergey Brin and Larry Page introduced in the early days of Google was a step in the right direction, but it certainly wasn't the ultimate solution for improving the quality of search results. In fact, the search giant recently unveiled a new AI (yes, that stands for artificial intelligence) called RankBrain to help the engine better understand the queries users type into the search field. The real intention of this AI wasn't to change visitors' search engine results pages (SERPs) -- rather, it was to predict them. As a machine-learning system, RankBrain actually teaches itself how to do something instead of needing a human to program it.
Google Chrome may be to blame if your laptop battery keeps dying
We've all been there - you are just in the middle of finishing an important email when your laptop crashes and you lose your hard work. But it seems that your laptop alone might not be to blame. A new series of tests conducted by Microsoft suggest that Google Chrome could be the reason your laptop battery is always dying. The tests streamed the same video on four unplugged, identical laptops, each on a different browser. While Google Chrome is the most popular internet browser, a series of tests by Microsoft - who are Google's rival - have shown how much the search engine is draining your laptop battery.