AITopics | Information Retrieval

Collaborating Authors

Information Retrieval

Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed.

News Overviews Instructional Materials AI-Alerts Classics

Fast Amortized Inference and Learning in Log-linear Models with Randomly Perturbed Nearest Neighbor Search

Mussmann, Stephen, Levy, Daniel, Ermon, Stefano

arXiv.org Machine LearningJul-11-2017

This is often a bottleneck in natural language processing and computer vision tasks when the output space is feasibly enumerable but very large. We propose a method to perform inference in log-linear models with sublinear amortized cost. Our idea hinges on using Gumbel random variable perturbations and a pre-computed Maximum Inner Product Search data structure to access the most-likely elements in sublinear amortized time.

information retrieval, machine learning, natural language, (18 more...)

arXiv.org Machine Learning

1707.03372

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.82)
(2 more...)

Add feedback

Deep Character-Level Click-Through Rate Prediction for Sponsored Search

Edizel, Bora, Mantrach, Amin, Bai, Xiao

arXiv.org Machine LearningJul-7-2017

Predicting the click-through rate of an advertisement is a critical component of online advertising platforms. In sponsored search, the click-through rate estimates the probability that a displayed advertisement is clicked by a user after she submits a query to the search engine. Commercial search engines typically rely on machine learning models trained with a large number of features to make such predictions. This is inevitably requires a lot of engineering efforts to define, compute, and select the appropriate features. In this paper, we propose two novel approaches (one working at character level and the other working at word level) that use deep convolutional neural networks to predict the click-through rate of a query-advertisement pair. Specially, the proposed architectures only consider the textual content appearing in a query-advertisement pair as input, and produce as output a click-through rate prediction. By comparing the character-level model with the word-level model, we show that language representation can be learnt from scratch at character level when trained on enough data. Through extensive experiments using billions of query-advertisement pairs of a popular commercial search engine, we demonstrate that both approaches significantly outperform a baseline model built on well-selected text features and a state-of-the-art word2vec-based approach. Finally, by combining the predictions of the deep models introduced in this study with the prediction of the model in production of the same commercial search engine, we significantly improve the accuracy and the calibration of the click-through rate prediction of the production system.

prediction, proceedings, representation, (15 more...)

arXiv.org Machine Learning

doi: 10.1145/3077136.3080811

1707.02158

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
North America > United States > California > Santa Clara County > Palo Alto (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Marketing (1.00)
Information Technology > Services (0.66)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Music, AI and Health: What is Music Information Retrieval?

#artificialintelligenceJul-4-2017, 10:55:08 GMT

Sync Project has made products like Sync Music Bot and Unwind using cutting-edge music technology with the aim to develop personalized music as medicine. Let's take a look at Music Information Retrieval, the technological "glue" that allows us to synthesize advances in AI, music production and licensed music distribution to derive insights into music listening behavior and physiology. When we listen to a song, we hear a coherent blend of numerous instruments to create melody, harmony, and rhythm. When a computer listens to that same song, it "hears" a binary code--digital data represented by either "0" or "1"--that corresponds to the complex waveform of the audio file. When we listen to a song, we can easily determine what the genre is, if it's a happy or sad song, what instruments are present.

machine learning, music information retrieval, natural language, (4 more...)

#artificialintelligence

Industry:

Media > Music (0.98)
Leisure & Entertainment (0.98)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Artificial Intelligence and the Future of Search Engines

#artificialintelligenceJul-3-2017, 03:40:08 GMT

It was not long ago that Artificial Intelligence (AI) was only in the realm of science fiction. Today, it has become a reality and is only growing more prominent in many different industries every day. This includes the internet as AI in search engine technology has been around for a few years. The algorithms used to rank pages have been affected considerably by AI already and that trend will continue into the foreseeable future. Currently, Google's RankBrain, an AI process used help set search engine rankings, is having a major impact which is only expected to expand.

artificial intelligence, information retrieval, natural language, (11 more...)

#artificialintelligence

Country:

Asia > China (0.34)
Europe > Norway (0.06)
North America > United States (0.05)
North America > Canada (0.05)

Industry:

Education > Educational Setting > Higher Education (0.32)
Government > Regional Government > Asia Government (0.31)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.87)

Add feedback

To tackle Google's power, regulators have to go after its ownership of data

The GuardianJul-2-2017, 00:50:02 GMT

The problem with regulating technology companies is that, faced with tough new rules, they can eventually innovate their way out, often by switching to newer, unregulated technologies. The risk of targeted regulation informed by little other than economic doctrines might even be fuelling a corporate quest for eternal disruption: instead of surrendering to the regulators, technology firms prefer to abandon their old business model. It's through this lens that we should interpret the likely fallout from the €2.4bn fine imposed on Alphabet, Google's parent company, by the European commission. It arrives after a lengthy, seven-year investigation into whether the company abused its dominance to promote its own online shopping service above search results. The commission's case seems sound; the sad fate of small online retailers, unable to compete with Alphabet over the past decade, suggests as much.

alphabet, information retrieval, natural language, (15 more...)

The Guardian

Country: Europe (0.72)

Industry:

Information Technology (1.00)
Government > Regional Government > Europe Government (0.62)
Retail > Online (0.56)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.40)

Add feedback

Bolt: Accelerated Data Mining with Fast Vector Compression

Blalock, Davis W, Guttag, John V

arXiv.org Machine LearningJun-30-2017

Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm's approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

doi: 10.1145/3097983.3098195

1706.10283

Country: North America (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
(3 more...)

Add feedback

In Search of an Entity Resolution OASIS: Optimal Asymptotic Sequential Importance Sampling

Marchant, Neil G., Rubinstein, Benjamin I. P.

arXiv.org Machine LearningJun-25-2017

Entity resolution (ER) presents unique challenges for evaluation methodology. While crowdsourcing platforms acquire ground truth, sound approaches to sampling must drive labelling efforts. In ER, extreme class imbalance between matching and non-matching records can lead to enormous labelling requirements when seeking statistically consistent estimates for rigorous evaluation. This paper addresses this important challenge with the OASIS algorithm: a sampler and F-measure estimator for ER evaluation. OASIS draws samples from a (biased) instrumental distribution, chosen to ensure estimators with optimal asymptotic variance. As new labels are collected OASIS updates this instrumental distribution via a Bayesian latent variable model of the annotator oracle, to quickly focus on unlabelled items providing more information. We prove that resulting estimates of F-measure, precision, recall converge to the true population values. Thorough comparisons of sampling methods on a variety of ER datasets demonstrate significant labelling reductions of up to 83% without loss to estimate accuracy.

information retrieval, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1703.00617

Country: Oceania > Australia (0.28)

Genre: Research Report (1.00)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.66)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.61)

Add feedback

Digital Marketing Tips For Small Businesses 2015 - Booming

#artificialintelligenceJun-23-2017, 02:40:09 GMT

Today, Businesses Have More Ways – And Places – Than Ever To Market Themselves.Your Local Digital Marketing Strategy Should Specifically Target And Appeal To Potential Customers In Your Geographic Area. Many Local Companies Have Used Some Form Of Digital Marketing Online Even If They Are Not Aware Of It.This Is An Important Local Digital Marketing Tip For Any Business. But For Local Businesses, It Can Be Even More Essential. Customers Who Are Looking For A Restaurant, Store Or Other Local Business Are Likely To Do A Search On Their Phone Or Mobile Device. If You Don't Have A Mobile Optimized Site, Not Only Will It Be Difficult For Them To Interact With Your Site, But It Will Also Be Difficult For Them To Find It In The First Place. If You Want Local Customers, Either On Mobile Or Desktop, To Find You, You Have To Have A Comprehensive Search Strategy.

digital marketing tip, local digital marketing strategy, small business 2015, (6 more...)

#artificialintelligence

Industry: Marketing (1.00)

Technology:

Information Technology > Communications > Social Media (0.89)
Information Technology > Information Management > Search (0.74)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.54)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.38)

Add feedback

Query Complexity of Clustering with Side Information

Mazumdar, Arya, Saha, Barna

arXiv.org Machine LearningJun-23-2017

Suppose, we are given a set of $n$ elements to be clustered into $k$ (unknown) clusters, and an oracle/expert labeler that can interactively answer pair-wise queries of the form, "do two elements $u$ and $v$ belong to the same cluster?". The goal is to recover the optimum clustering by asking the minimum number of queries. In this paper, we initiate a rigorous theoretical study of this basic problem of query complexity of interactive clustering, and provide strong information theoretic lower bounds, as well as nearly matching upper bounds. Most clustering problems come with a similarity matrix, which is used by an automated process to cluster similar points together. Our main contribution in this paper is to show the dramatic power of side information aka similarity matrix on reducing the query complexity of clustering. A similarity matrix represents noisy pair-wise relationships such as one computed by some function on attributes of the elements. A natural noisy model is where similarity values are drawn independently from some arbitrary probability distribution $f_+$ when the underlying pair of elements belong to the same cluster, and from some $f_-$ otherwise. We show that given such a similarity matrix, the query complexity reduces drastically from $\Theta(nk)$ (no similarity matrix) to $O(\frac{k^2\log{n}}{\cH^2(f_+\|f_-)})$ where $\cH^2$ denotes the squared Hellinger divergence. Moreover, this is also information-theoretic optimal within an $O(\log{n})$ factor. Our algorithms are all efficient, and parameter free, i.e., they work without any knowledge of $k, f_+$ and $f_-$, and only depend logarithmically with $n$. Along the way, our work also reveals intriguing connection to popular community detection models such as the {\em stochastic block model}, significantly generalizes them, and opens up many venues for interesting future research.

algorithm, probability, query, (12 more...)

arXiv.org Machine Learning

1706.07719

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.05)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.66)

Add feedback

A Signaling Game Approach to Databases Querying and Interaction

McCamish, Ben, Termehchy, Arash, Touri, Behrouz

arXiv.org Artificial IntelligenceJun-22-2017

As most database users cannot precisely express their information needs, it is challenging for database management systems to understand them. We propose a novel formal framework for representing and understanding information needs in database querying and exploration. Our framework considers querying as a collaboration between the user and the database management system to establish a it mutual language for representing information needs. We formalize this collaboration as a signaling game, where each mutual language is an equilibrium for the game. A query interface is more effective if it establishes a less ambiguous mutual language faster. We discuss some equilibria, strategies, and the convergence in this game. In particular, we propose a reinforcement learning mechanism and analyze it within our framework. We prove that this adaptation mechanism for the query interface improves the effectiveness of answering queries stochastically speaking, and converges almost surely. We extend out results for the cases that the user also modifies her strategy during the interaction.

machine learning, natural language, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

1603.04068

Country: North America > United States (0.93)

Genre: Research Report (0.81)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Game Theory (1.00)
Information Technology > Databases (1.00)
(3 more...)

Add feedback