Information Retrieval
Progressively Optimized Bi-Granular Document Representation for Scalable Embedding Based Retrieval
Xiao, Shitao, Liu, Zheng, Han, Weihao, Zhang, Jianjin, Li, Chaozhuo, Shao, Yingxia, Lian, Defu, Xie, Xing, Sun, Hao, Deng, Denvy, Zhang, Liangjie, Zhang, Qi
Ad-hoc search calls for the selection of appropriate answers from a massive-scale corpus. Nowadays, the embedding-based retrieval (EBR) becomes a promising solution, where deep learning based document representation and ANN search techniques are allied to handle this task. However, a major challenge is that the ANN index can be too large to fit into memory, given the considerable size of answer corpus. In this work, we tackle this problem with Bi-Granular Document Representation, where the lightweight sparse embeddings are indexed and standby in memory for coarse-grained candidate search, and the heavyweight dense embeddings are hosted in disk for fine-grained post verification. For the best of retrieval accuracy, a Progressive Optimization framework is designed. The sparse embeddings are learned ahead for high-quality search of candidates. Conditioned on the candidate distribution induced by the sparse embeddings, the dense embeddings are continuously learned to optimize the discrimination of ground-truth from the shortlisted candidates. Besides, two techniques: the contrastive quantization and the locality-centric sampling are introduced for the learning of sparse and dense embeddings, which substantially contribute to their performances. Thanks to the above features, our method effectively handles massive-scale EBR with strong advantages in accuracy: with up to +4.3% recall gain on million-scale corpus, and up to +17.5% recall gain on billion-scale corpus. Besides, Our method is applied to a major sponsored search platform with substantial gains on revenue (+1.95%), Recall (+1.01%) and CTR (+0.49%).
Amazon.com: Integrated Search Marketing Solution & Organic Search: Search Engine Optimization, Social Media, and Email Marketing: Winning Formula for SERP Dominance eBook : LLC, Thincr: Books
No need to purchase multiple books to master SEO, Social Media and Email Marketing. This book is written to help you get on the top of your online marketing campaigns in no time! Allowing for a coherent online marketing solution throughout 3 major online marketing channels while driven by the theory of attitude change and persuasion and designed for a quick access to proven search marketing tactics at one central location, this 3-In-1 Integrated Search Marketing guide is written to provide top strategic solutions for SEO, Social Media and Email Marketing. The book investigates and analyzes internal and external alignments between business goal and online marketing media. It conceptualizes an online marketing solution based on product type and its life cycle while illustrating the pricing strategy for promotion and market positioning as well as the marketing strategy based on the adoption of BCG matrix. In addtion, it also shows you the tips about the tactical execution of Cognitively Reasonable Price (CRP).
Why artificial intelligence still needs a human touch
Why artificial intelligence still needs a human touch Opinion 5 April 2017 How do people distinguish between fact and falsehood? This is perhaps, one of the most debated questions of the past year Nick Ismail Google and Facebook are both in the spotlight for disseminating so-called "fake news", despite the artificial intelligence (AI) systems that these companies developed and deploy on their platforms. If AI is currently struggling to discern facts from fiction, could it be that human intelligence is still a necessary component for the continued successful integration of AI? In a much simpler time, Google was a search engine that indexed websites. Today, the search giant is evolving towards giving users summarised answers to their billions of questions.
Mining Adverse Drug Reactions from Unstructured Mediums at Scale
Haq, Hasham Ul, Kocaman, Veysel, Talby, David
Adverse drug reactions / events (ADR/ADE) have a major impact on patient health and health care costs. Detecting ADR's as early as possible and sharing them with regulators, pharma companies, and healthcare providers can prevent morbidity and save many lives. While most ADR's are not reported via formal channels, they are often documented in a variety of unstructured conversations such as social media posts by patients, customer support call transcripts, or CRM notes of meetings between healthcare providers and pharma sales reps. In this paper, we propose a natural language processing (NLP) solution that detects ADR's in such unstructured free-text conversations, which improves on previous work in three ways. First, a new Named Entity Recognition (NER) model obtains new state-of-the-art accuracy for ADR and Drug entity extraction on the ADE, CADEC, and SMM4H benchmark datasets (91.75%, 78.76%, and 83.41% F1 scores respectively). Second, two new Relation Extraction (RE) models are introduced - one based on BioBERT while the other utilizing crafted features over a Fully Connected Neural Network (FCNN) - are shown to perform on par with existing state-of-the-art models, and outperform them when trained with a supplementary clinician-annotated RE dataset. Third, a new text classification model, for deciding if a conversation includes an ADR, obtains new state-of-the-art accuracy on the CADEC dataset (86.69% F1 score). The complete solution is implemented as a unified NLP pipeline in a production-grade library built on top of Apache Spark, making it natively scalable and able to process millions of batch or streaming records on commodity clusters.
Topical Classification of Food Safety Publications with a Knowledge Base
Sowinski, Piotr, Wasielewska-Michniewska, Katarzyna, Ganzha, Maria, Paprzycki, Marcin
The vast body of scientific publications presents an increasing challenge of finding those that are relevant to a given research question, and making informed decisions on their basis. This becomes extremely difficult without the use of automated tools. Here, one possible area for improvement is automatic classification of publication abstracts according to their topic. This work introduces a novel, knowledge base-oriented publication classifier. The proposed method focuses on achieving scalability and easy adaptability to other domains. Classification speed and accuracy are shown to be satisfactory, in the very demanding field of food safety. Further development and evaluation of the method is needed, as the proposed approach shows much potential.
Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction
Hilprecht, Benjamin, Binnig, Carsten
In this paper, we introduce zero-shot cost models which enable learned cost estimation that generalizes to unseen databases. In contrast to state-of-the-art workload-driven approaches which require to execute a large set of training queries on every new database, zero-shot cost models thus allow to instantiate a learned cost model out-of-the-box without expensive training data collection. To enable such zero-shot cost models, we suggest a new learning paradigm based on pre-trained cost models. As core contributions to support the transfer of such a pre-trained cost model to unseen databases, we introduce a new model architecture and representation technique for encoding query workloads as input to those models. As we will show in our evaluation, zero-shot cost estimation can provide more accurate cost estimates than state-of-the-art models for a wide range of (real-world) databases without requiring any query executions on unseen databases. Furthermore, we show that zero-shot cost models can be used in a few-shot mode that further improves their quality by retraining them just with a small number of additional training queries on the unseen database.
Top 10 Keyword Extraction API
This article is brought to you by the Eden AI team. We allow you to test and use in production a large number of AI engines from different providers directly through our API and platform. You are a solution provider and want to integrate Eden AI, contact us at : contact@edenai.co Intro: In this article, we are going to see how we can easily integrate a Keyword Extraction engine in your project and how to choose and access the right engine according to your data. Definition: Keyword extraction (a
What is Event Knowledge Graph: A Survey
Guan, Saiping, Cheng, Xueqi, Bai, Long, Zhang, Fujun, Li, Zixuan, Zeng, Yutao, Jin, Xiaolong, Guo, Jiafeng
Besides entity-centric knowledge, usually organized as Knowledge Graph (KG), events are also an essential kind of knowledge in the world, which trigger the spring up of event-centric knowledge representation form like Event KG (EKG). It plays an increasingly important role in many machine learning and artificial intelligence applications, such as intelligent search, question-answering, recommendation, and text generation. This paper provides a comprehensive survey of EKG from history, ontology, instance, and application views. Specifically, to characterize EKG thoroughly, we focus on its history, definitions, schema induction, acquisition, related representative graphs/systems, and applications. The development processes and trends are studied therein. We further summarize perspective directions to facilitate future research on EKG.
COVID-19 will Accelerate High-Efficiency AI in 2021
Francisco Webber: My interest in information technology arose during my medical studies when I worked with the Vienna General Hospital and could not find relevant patient information hidden in data silos. After that experience, I spent a lot of time exploring search engine technologies, looking for the ultimate search engine. The state-of-the-art approaches and the statistical modelling-based information retrieval theories behind them did not satisfy me, and at some point, I turned towards neurosciences. The Hierarchical Temporal Memory (HTM) theory, described by Jeff Hawkins for the first time in his book On Intelligence, gave me the idea that a new interpretation of how the brain processes information could be the code breaker of all hurdles encountered by natural language understanding solutions. And if yes, will that make language computable?
Baihe: SysML Framework for AI-driven Databases
Pfadler, Andreas, Zhu, Rong, Chen, Wei, Huang, Botong, Zeng, Tianjing, Ding, Bolin, Zhou, Jingren
We present Baihe, a SysML Framework for AI-driven Databases. Using Baihe, an existing relational database system may be retrofitted to use learned components for query optimization or other common tasks, such as e.g. learned structure for indexing. To ensure the practicality and real world applicability of Baihe, its high level architecture is based on the following requirements: separation from the core system, minimal third party dependencies, Robustness, stability and fault tolerance, as well as stability and configurability. Based on the high level architecture, we then describe a concrete implementation of Baihe for PostgreSQL and present example use cases for learned query optimizers. To serve both practitioners, as well as researchers in the DB and AI4DB community Baihe for PostgreSQL will be released under open source license.