Goto

Collaborating Authors

 Information Retrieval


Real Estate Search Engine Powered by Artificial Intelligence

#artificialintelligence

ITRealty intelligently analyzes the multiple listing services (MLS) that brokers and agents use to find/list properties, establish contractual offers of compensation among brokers, and accumulate and disseminate information to enable appraisals. "It does not matter anymore how far in the past comparable properties were sold", says Yuriy Setko, "Our algorithms will analyze where the market was for that particular type of property, in that particular neighborhood in the past, and apply the time adjustment percentage to the selling price to give you that property's market price as if it was sold yesterday." You usually have a good number of comparables to see if the asking price is right". It also helps to precisely determine the market price when putting up a property for sale. ITRealty tracks price drops on MLS, along with other listing analysis algorithms, to find "motivated sellers", as well as drawing supplementary data on real estate not found on MLS.


What do Google and a toddler have in common? Both need to learn good listening skills. - Search Engine Land

#artificialintelligence

At the Sixth International Conference on Learning Representations, Jannis Bulian and Neil Houlsby, researchers at Google AI, presented a paper that shed light on new methods they're testing to improve search results. While publishing a paper certainly doesn't mean the methods are being used, or even will be, it likely increases the odds when the results are highly successful. And when those methods also combine with other actions Google is taking, one can be almost certain. I believe this is happening, and the changes are significant for search engine optimization specialists (SEOs) and content creators. Let's start with the basics and look topically at what's being discussed.


How San Quentin Inmates Built JOLT, a Search Engine for Prison

WIRED

Marcellino Ornelas had been in and out of juvenile hall seven times by the time he finally went to prison at the age of 19 for assault with a firearm. He'd already been kicked out of high school and was working, he says, as the "local drug dealer," with a side gig at a Ross department store. In the past, every time he got out, he'd start dealing soon after. "It was like, this is how I make money. This is who my friends are," Ornelas says.


A Systematic Classification of Knowledge, Reasoning, and Context within the ARC Dataset

arXiv.org Artificial Intelligence

The recent work of Clark et al. (2018) introduces the AI2 Reasoning Challenge (ARC) and the associated ARC dataset that partitions open domain, complex science questions into an Easy Set and a Challenge Set. That paper includes an analysis of 100 questions with respect to the types of knowledge and reasoning required to answer them; however, it does not include clear definitions of these types, nor does it offer information about the quality of the labels. We propose a comprehensive set of definitions of knowledge and reasoning types necessary for answering the questions in the ARC dataset. Using ten annotators and a sophisticated annotation interface, we analyze the distribution of labels across the Challenge Set and statistics related to them. Additionally, we demonstrate that although naive information retrieval methods return sentences that are irrelevant to answering the query, sufficient supporting text is often present in the (ARC) corpus. Evaluating with human-selected relevant sentences improves the performance of a neural machine comprehension model by 42 points.


A Hidden Instagram Feature Shows Users Time Spent in the App - Search Engine Journal

#artificialintelligence

Instagram is testing a hidden "Usage Insights" feature, which shows users how much time they spend in the app. This feature was discovered by a computer science student who has a history of uncovering new features in Instagram before they're rolled out to the public. All that we have to go on at this time is the screenshot shared by Jade M. Wong, so it's unclear how detailed these insights are. Instagram is testing "Usage Insights" to show the amount of time users have spent on the app Be self-aware or be prepared to be ashamed for Instagram addiction pic.twitter.com/WzyRGWIOgZ It's also unclear whether or not this feature will be widely released, although it's not something that's out of the realm of possibility. Earlier this week, Google announced it will be rolling out a feature designed to keep users informed about how much time they're spending on YouTube.


What negative SEO is and is not - Search Engine Land

#artificialintelligence

Today we are starting a six-part series on Negative SEO. The series will be broken into three areas and will show how negative search engine optimization (SEO) has an effect on links, content and user signals. Positive SEO under this broader view would be any tactic performed with the intent to positively impact rankings for a uniform resource locator (URL), and possibly its host domain, by manipulating a variable within the links, content or user signals areas. Negative SEO would be any tactic performed with the intent to negatively impact rankings for a URL, and possibly its host domain, by manipulating a variable within the links, content or user signal buckets. If you can accidentally hurt your rankings by shifting a variable, then it would logically suggest that an external entity shifting that same variable associated with your site could result in a ranking decrease or outright deindexation.


Exploiting Textual and Citation Information to Identify and Summarize Influential Publications

AAAI Conferences

Given a group of publications, we investigate the prob- lem of identifying the papers with the most impact on others. We refer to these papers as influential in the sense that they introduce new concepts and language that will affect how future articles are written. In this pa- per we propose weighted PageRank algorithm that uses textual information from articles and information from citation graph to rank the impact of publications, then we automatically summarize these publications and ex- tract important keywords. We show that using our algo- rithm outperforms default citation-based techniques in ranking influential papers (those which won best paper award) with no less than 2% in F1-score and NDCG. We also show that our algorithm outperforms previous graph-based keyword extraction techniques with no less than 1.5% in F1-score.


Ambiguity Aware Arabic Document Indexing and Query Expansion: A Morphological Knowledge Learning-Based Approach

AAAI Conferences

In this paper, we propose a morphology-based Arabic Information Retrieval (IR) system. Arabic is an inflectional and derivational language and Arabic texts are highly ambiguous at the morphological level. However, short diacritics have a central role in understanding Arabic texts. That is, we propose to build a morphological knowledge base from huge vocalized corpora to reduce the ambiguity of Arabic documents. This base may be used both for the morphological indexing of queries and documents and to the morphological enrichment of queries. Indeed, it stores (i) the morpho-syntactic attributes of Arabic words; and, (ii) the morphological relations between Arabic tokens. It also represents the Arabic lexicon at several levels (e.g. stems, lemmas and words). We focus on morphological analysis and disambiguation and its impact in information retrieval. We perform experiments, which try to study the problem of indexing units and morphology-based query expansion in Arabic IR.


Search Engine Optimization Tutorial for Beginners

#artificialintelligence

This course centers around the technical steps you need to take to put your online assets (website, blog site, online store, etc.) in the best possible light in the eyes of search engines, more specifically Google and Bing-Yahoo. This course covers thing you absolutely must do to have even a shot at getting to page one of these search engines organically. I follow these lectures with additional guidance to help you construct and display your web page assets so that they not only pass scrutiny when being crawled by "Spiders" in the service of these search engines, but that they receive high-marks from these activities - which will hep you to place higher in the search engine rankings when organic searches are conducted by the public looking for online information. There are many "website designers" out there today completing sites using templates, widgets, etc. And the whole world is out there using similar keywords and keyword phrases trying to get found.


Cross-lingual Document Retrieval using Regularized Wasserstein Distance

arXiv.org Machine Learning

Many information retrieval algorithms rely on the notion of a good distance that allows to efficiently compare objects of different nature. Recently, a new promising metric called Word Mover's Distance was proposed to measure the divergence between text passages. In this paper, we demonstrate that this metric can be extended to incorporate term-weighting schemes and provide more accurate and computationally efficient matching between documents using entropic regularization. We evaluate the benefits of both extensions in the task of cross-lingual document retrieval (CLDR). Our experimental results on eight CLDR problems suggest that the proposed methods achieve remarkable improvements in terms of Mean Reciprocal Rank compared to several baselines.