AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Binary Embedding with Additive Homogeneous Kernels

Kim, Saehoon ( Pohang University of Science and Technology (POSTECH) ) | Choi, Seungjin ( Pohang University of Science and Technology (POSTECH) )

AAAI ConferencesFeb-14-2017

Binary embedding transforms vectors in Euclidean space into the vertices of Hamming space such that Hamming distance between binary codes reflects a particular distance metric. In machine learning, the similarity metrics induced by Mercer kernels are frequently used, leading to the development of binary embedding with Mercer kernels (BE-MK) where the approximate nearest neighbor search is performed in a reproducing kernel Hilbert space (RKHS). Kernelized locality-sensitive hashing (KLSH), which is one of the representative BE-MK, uses kernel PCA to embed data points into a Euclidean space, followed by the random hyperplane binary embedding. In general, it works well when the query and data points in the database follow the same probability distribution. The streaming data environment, however, continuously requires KLSH to update the leading eigenvectors of the Gram matrix, which can be costly or hard to carry out in practice. In this paper we present a completely randomized binary embedding to work with a family of additive homogeneous kernels, referred to as BE-AHK. The proposed algorithm is easy to implement, built on Vedaldi and Zisserman's work on explicit feature maps for additive homogeneous kernels. We show that our BE-AHK is able to preserve kernel values by developing an upper- and lower-bound on its Hamming distance, which guarantees to solve approximate nearest neighbor search efficiently. Numerical experiments demonstrate that BE-AHK actually yields similarity-preserving binary codes in terms of additive homogeneous kernels and is superior to existing methods in case that training data and queries are generated from different distributions. Moreover, in cases where a large code size is allowed, the performance of BE-AHK is comparable to that of KLSH in general cases.

information retrieval, machine learning, natural language, (20 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.55)

Add feedback

A Theoretical Analysis of First Heuristics of Crowdsourced Entity Resolution

Mazumdar, Arya (University of Massachusetts Amherst) | Saha, Barna (University of Massachusetts Amherst)

AAAI ConferencesFeb-14-2017

Entity resolution (ER) is the task of identifying all records in a database that refer to the same underlying entity, and are therefore duplicates of each other. Due to inherent ambiguity of data representation and poor data quality, ER is a challenging task for any automated process. As a remedy, human-powered ER via crowdsourcing has become popular in recent years. Using crowd to answer queries is costly and time consuming. Furthermore, crowd-answers can often be faulty. Therefore, crowd-based ER methods aim to minimize human participation without sacrificing the quality and use a computer generated similarity matrix actively. While, some of these methods perform well in practice, no theoretical analysis exists for them, and further their worst case performances do not reflect the experimental findings. This creates a disparity in the understanding of the popular heuristics for this problem. In this paper, we make the first attempt to close this gap. We provide a thorough analysis of the prominent heuristic algorithms for crowd-based ER. We justify experimental observations with our analysis and information theoretic lower bounds.

artificial intelligence, machine learning, natural language, (21 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.28)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.30)

Add feedback

elasticsearchr – a Lightweight Elasticsearch Client for R

#artificialintelligenceFeb-13-2017, 15:15:34 GMT

Elasticsearch is a distributed NoSQL document store search-engine and column-oriented database, whose fast (near real-time) reads and powerful aggregation engine make it an excellent choice as an'analytics database' for R&D, production-use or both. Installation is simple, it ships with sensible default settings that allow it to work effectively out-of-the-box, and all interaction is made via a set of intuitive and extremely well documented RESTful APIs. I've been using it for two years now and I am evangelical. The elasticsearchr package implements a simple Domain-Specific Language (DSL) for indexing, deleting, querying, sorting and aggregating data in Elasticsearch, from within R. The main purpose of this package is to remove the labour involved with assembling HTTP requests to Elasticsearch's REST APIs and processing the responses. Instead, users of this package need only send and receive data frames to Elasticsearch resources.

elasticsearch, information retrieval, natural language, (18 more...)

#artificialintelligence

Technology:

Information Technology > Information Management > Search (0.36)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.35)

Add feedback

How to build a search engine - Part 2: Configuring elasticsearch

@machinelearnbotFeb-13-2017, 12:30:03 GMT

In this post we will focus on configuring the elasticsearch bit. I have chosen the Wikipedia people dump for the dataset. This is the wiki pages of a subset of people on Wikipedia. This dataset consists of three columns – URI, name, text. As the column names suggest, URI is the actual wiki link to that person's page, name is the person's name.

artificial intelligence, information retrieval, natural language, (8 more...)

@machinelearnbot

Technology:

Information Technology > Communications (1.00)
Information Technology > Information Management > Search (0.62)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)

Add feedback

How Search Engines Use Machine Learning for Pattern Detection

AITopics Original LinksFeb-12-2017, 11:30:06 GMT

Search engines use machine learning for pattern detection. While it's impossible to explain in one short article how machine learning influences our lives, understanding the basics of machine learning can give you some insight into search algorithm updates, such as Google's Panda update. To predict the outcome of future tests, scripts can use supervised learning on past outcomes to define a hypothetical prediction line. The three images below show how plotted examples define averages. These averages are more likely to represent some truth as the training set grows.

information retrieval, natural language, search engine use machine learning, (13 more...)

AITopics Original Links

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.61)

Add feedback

Creative Commons' New Search Engine Makes It Easy To Find Free-To-Use Images

Forbes - TechFeb-8-2017, 09:50:02 GMT

Credit: "Busted" by Jason Scragz is licensed under CC BY 2.0 You copied an image on your blog that you saw on the internet. You didn't think you were doing anything wrong but it turns out you were. How can you avoid all this by finding images that are free to use? Creative Commons is here to help you out. How can you find these images? Google's Advance Image Search has a drop down box that allows you to restrict a search by different types of Creative Commons license.

information retrieval, natural language, new search engine make, (8 more...)

Forbes - Tech

Country: North America > United States > New York > New York County > New York City (0.07)

Industry: Information Technology (1.00)

Technology:

Information Technology > Information Management > Search (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.46)

Add feedback

Facebook Search Now Recognizes Objects in Photos - Search Engine Journal

#artificialintelligenceFeb-4-2017, 22:51:16 GMT

Facebook's artificial intelligence (AI) team has built a visual search system that can recognize content that appears in photos and return relevant search results. Called Lumos, Facebook originally created the platform so that its visually impaired users could understand the content of photos. But Facebook recognized that everyone could benefit from this type of visual search system. Facebook's image search system can detect and segment objects, scenes, animals, places, and clothes that appear in images or videos – and understand them. For instance, let's say you search for "black shirt photo."

artificial intelligence, information retrieval, natural language, (9 more...)

#artificialintelligence

Industry: Information Technology > Services (0.73)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

ATOL: A Framework for Automated Analysis and Categorization of the Darkweb Ecosystem

Ghosh, Shalini (SRI International) | Porras, Phillip (SRI International) | Yegneswaran, Vinod (SRI International) | Nitz, Ken (SRI International) | Das, Ariyam (University of California, Los Angeles)

AAAI ConferencesFeb-4-2017

We present a framework for automated analysis and categorization of .onion websites in the darkweb to facilitate analyst situational awareness of new content that emerges from this dynamic landscape. Over the last two years, our team has developed a large-scale darkweb crawling infrastructure called OnionCrawler that acquires new onion domains on a daily basis, and crawls and indexes millions of pages from these new and previously known .onion sites. It stores this data into a research repository designed to help better understand Tor’s hidden service ecosystem. The analysis component of our framework is called Automated Tool for Onion Labeling (ATOL), which introduces a two-stage thematic labeling strategy: (1) it learns descriptive and discriminative keywords for different categories, and (2) uses these terms to map onion site content to a set of thematic labels. We also present empirical results of ATOL and our ongoing experimentation with it, as we have gained experience applying it to the entirety of our darkweb repository, now over 70 million indexed pages. We find that ATOL can perform site-level thematic label assignment more accurately than keywordbased schemes developed by domain experts — we expand the analyst-provided keywords using an automatic keyword discovery algorithm, and get 12% gain in accuracy by using a machine learning classification model. We also show how ATOL can discover categories on previously unlabeled onions and discuss applications of ATOL in supporting various analyses and investigations of the darkweb.

category, keyword, onion site, (17 more...)

AAAI Conferences

Workshops at the Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.47)

Industry:

Information Technology > Security & Privacy (1.00)
Law Enforcement & Public Safety (0.93)
Government (0.88)
Law (0.68)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science (1.00)
Information Technology > Communications (1.00)
(4 more...)

Add feedback

Building an end-end search engine

@machinelearnbotFeb-3-2017, 22:50:17 GMT

In analytics, we retrieve information from various data sources; it can be structured or unstructured. The biggest challenge here is to retrieve information from unstructured data mainly texts. Here machine learning comes into the picture to overcome this challenge. Different algorithms have been designed in different platforms but here we will discuss one technique that can be applied in python. The process can be explained better by an example.

artificial intelligence, information retrieval, natural language, (10 more...)

@machinelearnbot

Country: North America > United States (0.16)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.56)

Add feedback

Zuckerberg charity buys AI search engine to battle disease

Daily Mail - Science & techJan-24-2017, 23:55:03 GMT

A charitable foundation backed by Mark Zuckerberg and his wife said Monday it has bought a Canadian artificial intelligence startup as part of a mission to eradicate disease. The Chan Zuckerberg Initiative did not disclose financial terms of the deal to acquire Toronto-based Meta, which uses AI to quickly read and comprehend scientific papers and then provide insights to researchers. Meta capabilities will be unified in a tool made available for free to scientists. Meta artificial intelligence can analyze insights across millions of papers, finding connections and patterns at scales and speeds impossible for humans to match unassisted. In the field of biomedicine alone, thousands of research papers are published daily.

artificial intelligence, information retrieval, natural language, (14 more...)

Daily Mail - Science & tech

Country:

North America > Canada > Ontario > Toronto (0.26)
North America > United States > California > San Francisco County > San Francisco (0.06)

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.52)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback