Goto

Collaborating Authors

 Information Retrieval


The Search Engine Showdown is Far from Over

#artificialintelligence

Back in the 1990s, the search engine category was a hot space. Yahoo, Netscape, AOL, Ask Jeeves, AltaVista, Google search, MSN and others were vying to capture the dominant position. With time, they all fizzled out. Post 2000 was the era of Google Search, the undisputed winner of the space until quite recently. The tide is turning and the crown of Google Search is under threat.


Training a Named Entity Recognition Model Without Data

#artificialintelligence

Named Entity Recognition(NER) is the task of recognizing entity names, such as person name, locations, and organizations, within a text. This task serves as a fundamental module for various NLP applications including chatbots, search engines, and translation systems. We can find NER datasets for generic entities easily, but obtaining data for specific domains can be challenging. Labeling NER data is more difficult than simple text classification, making it challenging to create large-scale domain-specific NER datasets. In this post, I will demonstrate how to train NER model without any labeled data.


Will A.I. Kill the Internet?

Slate

This week, Felix Salmon, Emily Peck, and Elizabeth Spiers discuss Microsoft's attempt to break into artificial intelligence assisted search with a revamp of their Bing search engine. They also talk about record high profits for oil companies and Bed Bath & Beyond's financial shenanigans.


MatKB: Semantic Search for Polycrystalline Materials Synthesis Procedures

arXiv.org Artificial Intelligence

In this paper, we present a novel approach to knowledge extraction and retrieval using Natural Language Processing (NLP) techniques for material science. Our goal is to automatically mine structured knowledge from millions of research articles in the field of polycrystalline materials and make it easily accessible to the broader community. The proposed method leverages NLP techniques such as entity recognition and document classification to extract relevant information and build an extensive knowledge base, from a collection of 9.5 Million publications. The resulting knowledge base is integrated into a search engine, which enables users to search for information about specific materials, properties, and experiments with greater precision than traditional search engines like Google. We hope our results can enable material scientists quickly locate desired experimental procedures, compare their differences, and even inspire them to design new experiments.


Fast Gumbel-Max Sketch and its Applications

arXiv.org Artificial Intelligence

The well-known Gumbel-Max Trick for sampling elements from a categorical distribution (or more generally a non-negative vector) and its variants have been widely used in areas such as machine learning and information retrieval. To sample a random element $i$ in proportion to its positive weight $v_i$, the Gumbel-Max Trick first computes a Gumbel random variable $g_i$ for each positive weight element $i$, and then samples the element $i$ with the largest value of $g_i+\ln v_i$. Recently, applications including similarity estimation and weighted cardinality estimation require to generate $k$ independent Gumbel-Max variables from high dimensional vectors. However, it is computationally expensive for a large $k$ (e.g., hundreds or even thousands) when using the traditional Gumbel-Max Trick. To solve this problem, we propose a novel algorithm, FastGM, which reduces the time complexity from $O(kn^+)$ to $O(k \ln k + n^+)$, where $n^+$ is the number of positive elements in the vector of interest. FastGM stops the procedure of Gumbel random variables computing for many elements, especially for those with small weights. We perform experiments on a variety of real-world datasets and the experimental results demonstrate that FastGM is orders of magnitude faster than state-of-the-art methods without sacrificing accuracy or incurring additional expenses.


Massively Multilingual Language Models for Cross Lingual Fact Extraction from Low Resource Indian Languages

arXiv.org Artificial Intelligence

Massive knowledge graphs like Wikidata attempt to capture world knowledge about multiple entities. Recent approaches concentrate on automatically enriching these KGs from text. However a lot of information present in the form of natural text in low resource languages is often missed out. Cross Lingual Information Extraction aims at extracting factual information in the form of English triples from low resource Indian Language text. Despite its massive potential, progress made on this task is lagging when compared to Monolingual Information Extraction. In this paper, we propose the task of Cross Lingual Fact Extraction(CLFE) from text and devise an end-to-end generative approach for the same which achieves an overall F1 score of 77.46.


Neural Approaches to Multilingual Information Retrieval

arXiv.org Artificial Intelligence

Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether advances in neural document translation and pretrained multilingual neural language models enable improvements in the state of the art over earlier MLIR techniques. The results show that although combining neural document translation with neural ranking yields the best Mean Average Precision (MAP), 98% of that MAP score can be achieved with an 84% reduction in indexing time by using a pretrained XLM-R multilingual language model to index documents in their native language, and that 2% difference in effectiveness is not statistically significant. Key to achieving these results for MLIR is to fine-tune XLM-R using mixed-language batches from neural translations of MS MARCO passages.


Query Processing on Tensor Computation Runtimes

arXiv.org Artificial Intelligence

The huge demand for computation in artificial intelligence (AI) is driving unparalleled investments in hardware and software systems for AI. This leads to an explosion in the number of specialized hardware devices, which are now offered by major cloud vendors. By hiding the low-level complexity through a tensor-based interface, tensor computation runtimes (TCRs) such as PyTorch allow data scientists to efficiently exploit the exciting capabilities offered by the new hardware. In this paper, we explore how database management systems can ride the wave of innovation happening in the AI space. We design, build, and evaluate Tensor Query Processor (TQP): TQP transforms SQL queries into tensor programs and executes them on TCRs. TQP is able to run the full TPC-H benchmark by implementing novel algorithms for relational operators on the tensor routines. At the same time, TQP can support various hardware while only requiring a fraction of the usual development effort. Experiments show that TQP can improve query execution time by up to 10$\times$ over specialized CPU- and GPU-only systems. Finally, TQP can accelerate queries mixing ML predictions and SQL end-to-end, and deliver up to 9$\times$ speedup over CPU baselines.


From Traditional Adaptive Data Caching to Adaptive Context Caching: A Survey

arXiv.org Artificial Intelligence

Context information is in demand more than ever with the rapid increase in the number of context-aware Internet of Things applications developed worldwide. Research in context and context-awareness is being conducted to broaden its applicability in light of many practical and technical challenges. One of the challenges is improving performance when responding to a large number of context queries. Context Management Platforms that infer and deliver context to applications measure this problem using Quality of Service (QoS) parameters. Although caching is a proven way to improve QoS, transiency of context and features such as variability and heterogeneity of context queries pose an additional real-time cost management problem. This paper presents a critical survey of the state-of-the-art in adaptive data caching with the objective of developing a body of knowledge in cost- and performance-efficient adaptive caching strategies. We comprehensively survey a large number of research publications and evaluate, compare, and contrast different techniques, policies, approaches, and schemes in adaptive caching. Our critical analysis is motivated by the focus on adaptively caching context as a core research problem. A formal definition for adaptive context caching is then proposed, followed by identified features and requirements of a well-designed, objective optimal adaptive context caching strategy.


Microsoft's Bing search engine and Edge browser to use AI in challenge to Google

The Japan Times

REDMOND, Washington – Microsoft is revamping its Bing search engine and Edge browser with artificial intelligence, the company said Tuesday, signaling its ambition to retake the lead in consumer technology markets where it has fallen behind. The maker of the Windows operating system is staking its future on AI through billions of dollars of investment as it directly challenges Alphabet's Google, which for years has outpaced Microsoft in search and browser technology. Now, Microsoft is rolling out an intelligent chatbot to live alongside Bing's search results, putting AI that can summarize web pages, synthesize disparate sources, even compose emails and translate them into more consumers' hands. Microsoft expects every percentage point of share it gains will bring in another $2 billion in search advertising revenue. This could be due to a conflict with your ad-blocking or security software.