AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Non-metric Similarity Graphs for Maximum Inner Product Search

Morozov, Stanislav, Babenko, Artem

Neural Information Processing SystemsDec-31-2018

In this paper we address the problem of Maximum Inner Product Search (MIPS) that is currently the computational bottleneck in a large number of machine learning applications. While being similar to the nearest neighbor search (NNS), the MIPS problem was shown to be more challenging, as the inner product is not a proper metric function. We propose to solve the MIPS problem with the usage of similarity graphs, i.e., graphs where each vertex is connected to the vertices that are the most similar in terms of some similarity function. Originally, the framework of similarity graphs was proposed for metric spaces and in this paper we naturally extend it to the non-metric MIPS scenario. We demonstrate that, unlike existing approaches, similarity graphs do not require any data transformation to reduce MIPS to the NNS problem and should be used for the original data. Moreover, we explain why such a reduction is detrimental for similarity graphs. By an extensive comparison to the existing approaches, we show that the proposed method is a game-changer in terms of the runtime/accuracy trade-off for the MIPS problem.

information retrieval, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States > New York (0.29)

Genre: Research Report (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

Add feedback

Norm-Ranging LSH for Maximum Inner Product Search

Yan, Xiao, Li, Jinfeng, Dai, Xinyan, Chen, Hongzhi, Cheng, James

Neural Information Processing SystemsDec-31-2018

Neyshabur and Srebro proposed SIMPLE-LSH, which is the state-of-the-art hashing based algorithm for maximum inner product search (MIPS). We found that the performance of SIMPLE-LSH, in both theory and practice, suffers from long tails in the 2-norm distribution of real datasets. We propose NORM-RANGING LSH, which addresses the excessive normalization problem caused by long tails by partitioning a dataset into sub-datasets and building a hash index for each sub-dataset independently. We prove that NORM-RANGING LSH achieves lower query time complexity than SIMPLE-LSH under mild conditions. We also show that the idea of dataset partitioning can improve another hashing based MIPS algorithm. Experiments show that NORM-RANGING LSH probes much less items than SIMPLE-LSH at the same recall, thus significantly benefiting MIPS based applications.

information retrieval, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Technology:

Information Technology > Information Management > Search (0.71)
Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Data Science (0.68)
(2 more...)

Add feedback

Query Complexity of Bayesian Private Learning

Xu, Kuang

Neural Information Processing SystemsDec-31-2018

We study the query complexity of Bayesian Private Learning: a learner wishes to locate a random target within an interval by submitting queries, in the presence of an adversary who observes all of her queries but not the responses. How many queries are necessary and sufficient in order for the learner to accurately estimate the target, while simultaneously concealing the target from the adversary? Our main result is a query complexity lower bound that is tight up to the first order. We show that if the learner wants to estimate the target within an error of $\epsilon$, while ensuring that no adversary estimator can achieve a constant additive error with probability greater than $1/L$, then the query complexity is on the order of $L\log(1/\epsilon)$ as $\epsilon \to 0$. Our result demonstrates that increased privacy, as captured by $L$, comes at the expense of a \emph{multiplicative} increase in query complexity. The proof builds on Fano's inequality and properties of certain proportional-sampling estimators.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.68)

Industry:

Education (0.46)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SMART goals for SEO - Search Engine Land

#artificialintelligenceDec-29-2018, 12:50:50 GMT

As 2017 comes to a close, many SEOs will be looking forward and setting some goals for their campaigns in 2018. In this post, I am going to take a look at the SMART goals methodology that can help you set and achieve aggressive, yet realistic goals. SMART goals set out a series of criteria that can be used for setting marketing objectives. This is all wrapped up in the clever mnemonic acronym -- Specific, Measurable, Achievable, Realistic and Timelined -- which makes SMART goals so easy to remember. Specific objectives are crucial to success in any marketing campaign.

information retrieval, natural language, smart goal, (16 more...)

#artificialintelligence

Industry: Marketing (0.53)

Technology:

Information Technology > Information Management > Search (0.87)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.41)

Add feedback

How Google took on China--and lost

#artificialintelligenceDec-27-2018, 17:50:17 GMT

Google's first foray into Chinese markets was a short-lived experiment. Google China's search engine was launched in 2006 and abruptly pulled from mainland China in 2010 amid a major hack of the company and disputes over censorship of search results. But in August 2018, the investigative journalism website The Intercept reported that the company was working on a secret prototype of a new, censored Chinese search engine, called Project Dragonfly. Amid a furor from human rights activists and some Google employees, US Vice President Mike Pence called on the company to kill Dragonfly, saying it would "strengthen Communist Party censorship and compromise the privacy of Chinese customers." In mid-December, The Intercept reported that Google had suspended its development efforts in response to complaints from the company's own privacy team, who learned about the project from the investigative website's reporting. Observers talk as if the decision about whether to reenter the world's largest market is up to Google: will it compromise its principles and censor search the way China wants?

artificial intelligence, information retrieval, natural language, (21 more...)

#artificialintelligence

Country:

Asia > China (1.00)
North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:

Law > Civil Rights & Constitutional Law (1.00)
Information Technology > Services (1.00)
Government > Regional Government > Asia Government > China Government (1.00)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.57)

Add feedback

Sequence to Sequence Learning for Query Expansion

Zaiem, Salah, Sadat, Fatiha

arXiv.org Machine LearningDec-25-2018

As fas as we are aware, using sequence to sequence algorithms for query expansion hasnot been explored yet in Information Retrievalliterature nor in Question-Answering's. We tried to fill this gap in the literature with a custom Query Expansion system trained and tested on open datasets. One specificity of our engine compared to classic ones is that it does not need the documents to expand the introduced query. We test our expansions on three different tasks: Information Retrieval, Answer preselection and Text classification. Our method yielded a slight improvement in performance in the three tasks .

computational linguistic, expansion, query, (12 more...)

arXiv.org Machine Learning

1812.10119

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD

André, Fabien, Kermarrec, Anne-Marie, Scouarnec, Nicolas Le

arXiv.org Artificial IntelligenceDec-21-2018

Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a foundation of many multimedia retrieval systems. A common approach is to rely on Product Quantization that allows storing large vector databases in memory and also allows efficient distance computations. Yet, implementations of nearest neighbor search with Product Quantization have their performance limited by the many memory accesses they perform. Following this observation, Andr\'e et al. proposed more efficient implementations of $m\times{}4$ product quantizers (PQ) leveraging specific SIMD instructions. Quicker ADC contributes additional implementations not limited to $m\times{}4$ codes and relying on AVX-512, the latest revision of SIMD instruction set. In doing so, Quicker ADC faces the challenge of using efficiently 5,6 and 7-bit shuffles that do not align to computer bytes or words. To this end, we introduce (i) irregular product quantizers combining sub-quantizers of different granularity and (ii) split tables allowing lookup tables larger than registers. We evaluate Quicker ADC with multiple indexes including Inverted Multi-Indexes and IVF HNSW and show that it outperforms FAISS PQ implementation and optimization (i.e., Polysemous codes) for numerous configurations. Finally, we open-source at http://github.com/technicolor-research/faiss-quickeradc a fork of FAISS that includes Quicker ADC.

data mining, information retrieval, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1812.09162

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
(2 more...)

Add feedback

Google to 'shut down plans' for censored Chinese search engine

Daily Mail - Science & techDec-19-2018, 08:33:07 GMT

Google has been forced to abandon its specialist Chinese search engine that censors results in line with the strict government, reports have claimed. The firm is believed to have shut down an internal data analysis system which was being used to develop the search engine, known as Dragonfly. According to a report from The Intercept, this has'effectively ended' the entire project. Members of Google's privacy team raised concerns about the project back in August and it is now extremely unlikely the search engine can be built without the system, according to sources close to the project. Google has been forced to abandon its plan to launch a specialist Chinese search engine that censors results in line with the strict government.

artificial intelligence, information retrieval, natural language, (17 more...)

Daily Mail - Science & tech

Country: Asia > China (0.48)

Industry:

Law > Civil Rights & Constitutional Law (0.70)
Information Technology > Services (0.52)
Government (0.52)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)

Add feedback

Efficient Autotuning of Hyperparameters in Approximate Nearest Neighbor Search

Jääsaari, Elias, Hyvönen, Ville, Roos, Teemu

arXiv.org Machine LearningDec-18-2018

Approximate nearest neighbor algorithms are used to speed up nearest neighbor search in a wide array of applications. However, current indexing methods feature several hyperparameters that need to be tuned to reach an acceptable accuracy--speed trade-off. A grid search in the parameter space is often impractically slow due to a time-consuming index-building procedure. Therefore, we propose an algorithm for automatically tuning the hyperparameters of indexing methods based on randomized space-partitioning trees. In particular, we present results using randomized k-d trees, random projection trees and randomized PCA trees. The tuning algorithm adds minimal overhead to the index-building process but is able to find the optimal hyperparameters accurately. We demonstrate that the algorithm is significantly faster than existing approaches, and that the indexing methods used are competitive with the state-of-the-art methods in query time while being faster to build.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-030-16145-3_46

1812.07484

Country: Europe > Finland (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Add feedback

Google's China search engine project 'effectively ended': report

FOX NewsDec-17-2018, 19:33:23 GMT

Members of the House Judiciary Committee peppered the head of Google about potential bias against conservatives and Russian influence and misinformation; Gillian Turner reports. Google has been forced to shut down and "effectively end" its controversial China search engine project, code-named Project Dragonfly, after members of the company's privacy team raised complaints, according to a new report. The tech giant led by CEO Sundar Pichai was forced to close a data analysis system it was using for the controversial project, according to The Intercept, citing two sources familiar with the matter. The news outlet originally broke the news that Google had been considering launching the app-based search engine. Google has not yet responded to a request for comment from Fox News.

artificial intelligence, information retrieval, natural language, (13 more...)

FOX News

Country:

Asia > China (0.83)
North America > United States (0.80)

Industry:

Media > News (1.00)
Law (0.80)
Information Technology (0.74)
Government > Regional Government > North America Government > United States Government (0.64)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)

Add feedback