Goto

Collaborating Authors

 Information Retrieval


Deep Language Modeling for Question Answering using Keras

#artificialintelligence

Question answering has recieved more focus as large search engines have basically mastered general information retrieval and are starting to cover more edge cases. Question answering happens to be one of those edge cases, because it could involve a lot of syntatic nuance that doesn't get captured by standard information retrieval models, like LDA or LSI. Hypothetically, deep learning models would be better suited to this type of task because of their ability to capture higher-order syntax. Two papers, "Applying deep learning to answer selection: a study and an open task" (Feng et. Personally, I am a lot lazier than them, and I don't understand CNNs very well, so I would like to use an existing framework to build one of their models to see if I could get similar results. Keras is a really popular one that has support for everything we might need to put the model together. The Github repository for this project can be found here. See the instructions here on how to install Keras.


6 Best SEO Practices For Machine Learning - Shane Barker

#artificialintelligence

With the face of SEO constantly evolving, machine learning has become a huge concern for Internet marketers. An exciting post on the Moz blog about machine learning persuaded me to dig deeper into it. Eric Enge very clearly explained how machine learning works and ways Google may be using it. Google has been dominating the search engine world for decades, but this new concept may spark a complete overhaul to the Google spam-fighting algorithm updates. Internet marketers will have to adapt the best SEO practices according to this latest development.


How close are AI systems to human-level intelligence? The Allen AI challenge.

#artificialintelligence

With respect to artificial intelligence, some people are squarely in the "optimist" camp, believing that we are "nearly there" as far as producing human-level intelligence. Microsoft co-founder's Paul Allen has been somewhat more prudent: While we have learned a great deal about how to build individual AI systems that do seemingly intelligent things, our systems have always remained brittle--their performance boundaries are rigidly set by their internal assumptions and defining algorithms, they cannot generalize, and they frequently give nonsensical answers outside of their specific focus areas. So Allen does not believe that we will see human-level artificial intelligence in this century. But he nevertheless generously created a foundation aiming to develop such human-level intelligence, the Allen Institute for Artificial Intelligence Science. The Institute is lead by Oren Etzioni who obviously shares some of Allen's "pessimistic" views.


Search Engine Optimisation: Few Things to Know

@machinelearnbot

SEO or search engine optimisation is an internet marketing process to increase the placement of your website in search results found on search engines like Google and Bing. In order to make your website search engine friendly, SEO companies use some white-hat on-page techniques. In other words, SEO or search engine optimisation includes a set of rules, which are followed by blogs or website owners in order to optimise their websites for search engines. As a business owner one should know what the benefits of SEO services are. SEO is the best marketing strategy to secure your position in the Google algorithm.


You Could Look It Up by Jack Lynch review – search engines can't do everything

The Guardian

For some years now, the most satisfyingly passive-aggressive way of responding to a factual query on social media has been to reply with a link from the website "Let Me Google That For You". On opening the link, your pesterer sees an animation of their exact query being typed into the Google search field, the "I'm feeling lucky" box being clicked and a page showing what is almost certainly the answer to their question. It is a sadistically elaborate vehicle for a simple message: you are wasting both our time by asking a person something, when you could ask a search engine. But the search engine is hardly infallible. It is commonly assumed these days that all useful information is on the internet, but it isn't.


The Effects of Machine Learning on Rankings and SEO

#artificialintelligence

For a long time search engines relied on static ranking factors. Those webmasters and SEOs who knew what to pay attention for were able to reach the best positions on Google's SERPs. This has changed recently and will be changing in the future: The increasing usage of machine learning techniques leads to both dynamic ranking criteria and – as confusing as it may sound – a greater influence of human signals. Machine learning is nothing new. Its roots go back to the 50s of the last century.


How close are AI systems to human-level intelligence? The Allen AI challenge.

#artificialintelligence

With respect to artificial intelligence, some people are squarely in the "optimist" camp, believing that we are "nearly there" as far as producing human-level intelligence. Microsoft co-founder's Paul Allen has been somewhat more prudent: While we have learned a great deal about how to build individual AI systems that do seemingly intelligent things, our systems have always remained brittle--their performance boundaries are rigidly set by their internal assumptions and defining algorithms, they cannot generalize, and they frequently give nonsensical answers outside of their specific focus areas. So Allen does not believe that we will see human-level artificial intelligence in this century. But he nevertheless generously created a foundation aiming to develop such human-level intelligence, the Allen Institute for Artificial Intelligence Science. The Institute is lead by Oren Etzioni who obviously shares some of Allen's "pessimistic" views.


Approximations and Refinements of Certain Answers via Many-Valued Logics

AAAI Conferences

Computing certain answers is the preferred way of answering queries in scenarios involving incomplete data. This, however, is computationally expensive, so practical systems use efficient techniques based on a particular three-valued logic, even though this often leads to incorrect results. Our goal is to provide a general many-valued framework for correctly approximating certain answers. We do so by defining the semantics of many-valued answers and queries, following the principle that additional knowledge about the input must translate into additional knowledge about the output. This framework lets us compare query outputs and evaluation procedures in terms of their informativeness. For each many-valued logic with a knowledge ordering on its truth values, one can build a syntactic evaluation procedure for all first-order queries, that correctly approximates certain answers; additional truth values are used to refine information about certain answers. For concrete examples, we show that a recently proposed approach fixing some of the inconsistencies of SQL query evaluation is an immediate consequence of our framework, and we further refine it by adding a fourth truth value. We show that no evaluation procedure based on Boolean logic delivers correctness guarantees. Finally, we study the relative power of evaluation procedures based on the informativeness of the answers they produce.


Topic Concentration in Query Focused Summarization Datasets

AAAI Conferences

Query-Focused Summarization (QFS) summarizes a document cluster in response to a specific input query. QFS algorithms must combine query relevance assessment, central content identification, and redundancy avoidance. Frustratingly, state of the art algorithms designed for QFS do not significantly improve upon generic summarization methods, which ignore query relevance, when evaluated on traditional QFS datasets. We hypothesize this lack of success stems from the nature of the dataset. We define a task-based method to quantify topic concentration in datasets, i.e., the ratio of sentences within the dataset that are relevant to the query, and observe that the DUC 2005, 2006 and 2007 datasets suffer from very high topic concentration. We introduce TD-QFS, a new QFS dataset with controlled levels of topic concentration. We compare competitive baseline algorithms on TD-QFS and report strong improvement in ROUGE performance for algorithms that properly model query relevance as opposed to generic summarizers. We further present three new and simple QFS algorithms, RelSum, ThresholdSum, and TFIDF-KLSum that outperform state of the art QFS algorithms on the TD-QFS dataset by a large margin.


ClaimEval: Integrated and Flexible Framework for Claim Evaluation Using Credibility of Sources

AAAI Conferences

The World Wide Web (WWW) has become a rapidly growing platform consisting of numerous sources which provide supporting or contradictory information about claims (e.g., "Chicken meat is healthy"). In order to decide whether a claim is true or false, one needs to analyze content of different sources of information on the Web, measure credibility of information sources, and aggregate all these information. This is a tedious process and the Web search engines address only part of the overall problem, viz., producing only a list of relevant sources. In this paper, we present ClaimEval, a novel and integrated approach which given a set of claims to validate, extracts a set of pro and con arguments from the Web information sources, and jointly estimates credibility of sources and correctness of claims. ClaimEval uses Probabilistic Soft Logic (PSL), resulting in a flexible and principled framework which makes it easy to state and incorporate different forms of prior-knowledge. Through extensive experiments on real-world datasets, we demonstrate ClaimEval’s capability in determining validity of a set of claims, resulting in improved accuracy compared to state-of-the-art baselines.