Information Retrieval
Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review
Sheikhalishahi, Seyedmostafa, Miotto, Riccardo, Dudley, Joel T, Lavelli, Alberto, Rinaldi, Fabio, Osmani, Venet
Of the 2652 articles considered, 106 met the inclusion criteria. Review of the included papers resulted in identification of 43 chronic diseases, which were then further classified into 10 disease categories using ICD-10. The majority of studies focused on diseases of the circulatory system (n=38) while endocrine and metabolic diseases were fewest (n=14). This was due to the structure of clinical records related to metabolic diseases, which typically contain much more structured data, compared with medical records for diseases of the circulatory system, which focus more on unstructured data and consequently have seen a stronger focus of NLP. The review has shown that there is a significant increase in the use of machine learning methods compared to rule-based approaches; however, deep learning methods remain emergent (n=3). Consequently, the majority of works focus on classification of disease phenotype with only a handful of papers addressing extraction of comorbidities from the free text or integration of clinical notes with structured data. There is a notable use of relatively simple methods, such as shallow classifiers (or combination with rule-based methods), due to the interpretability of predictions, which still represents a significant issue for more complex methods. Finally, scarcity of publicly available data may also have contributed to insufficient development of more advanced methods, such as extraction of word embeddings from clinical notes. Further efforts are still required to improve (1) progression of clinical NLP methods from extraction toward understanding; (2) recognition of relations among entities rather than entities in isolation; (3) temporal extraction to understand past, current, and future clinical events; (4) exploitation of alternative sources of clinical knowledge; and (5) availability of large-scale, de-identified clinical corpora.
Reasoning-Driven Question-Answering for Natural Language Understanding
Natural language understanding (NLU) of text is a fundamental challenge in AI, and it has received significant attention throughout the history of NLP research. This primary goal has been studied under different tasks, such as Question Answering (QA) and Textual Entailment (TE). In this thesis, we investigate the NLU problem through the QA task and focus on the aspects that make it a challenge for the current state-of-the-art technology. This thesis is organized into three main parts: In the first part, we explore multiple formalisms to improve existing machine comprehension systems. We propose a formulation for abductive reasoning in natural language and show its effectiveness, especially in domains with limited training data. Additionally, to help reasoning systems cope with irrelevant or redundant information, we create a supervised approach to learn and detect the essential terms in questions. In the second part, we propose two new challenge datasets. In particular, we create two datasets of natural language questions where (i) the first one requires reasoning over multiple sentences; (ii) the second one requires temporal common sense reasoning. We hope that the two proposed datasets will motivate the field to address more complex problems. In the final part, we present the first formal framework for multi-step reasoning algorithms, in the presence of a few important properties of language use, such as incompleteness, ambiguity, etc. We apply this framework to prove fundamental limitations for reasoning algorithms. These theoretical results provide extra intuition into the existing empirical evidence in the field.
A Survey of Cross-lingual Word Embedding Models
Ruder, Sebastian, Vulić, Ivan, Søgaard, Anders
Cross-lingual representations of words enable us to reason about word meaning in multilingual contexts and are a key facilitator of cross-lingual transfer when developing natural language processing models for low-resource languages. In this survey, we provide a comprehensive typology of cross-lingual word embedding models. We compare their data requirements and objective functions. The recurring theme of the survey is that many of the models presented in the literature optimize for the same objectives, and that seemingly different models are often equivalent, modulo optimization strategies, hyper-parameters, and such. We also discuss the different ways cross-lingual word embeddings are evaluated, as well as future challenges and research horizons.
Nonconvex Zeroth-Order Stochastic ADMM Methods with Lower Function Query Complexity
Huang, Feihu, Gao, Shangqian, Pei, Jian, Huang, Heng
Zeroth-order (gradient-free) method is a class of powerful optimization tool for many machine learning problems because it only needs function values (not gradient) in the optimization. In particular, zeroth-order method is very suitable for many complex problems such as black-box attacks and bandit feedback, whose explicit gradients are difficult or infeasible to obtain. Recently, although many zeroth-order methods have been developed, these approaches still exist two main drawbacks: 1) high function query complexity; 2) not being well suitable for solving the problems with complex penalties and constraints. To address these challenging drawbacks, in this paper, we propose a novel fast zeroth-order stochastic alternating direction method of multipliers (ADMM) method (\emph{i.e.}, ZO-SPIDER-ADMM) with lower function query complexity for solving nonconvex problems with multiple nonsmooth penalties. Moreover, we prove that our ZO-SPIDER-ADMM has the optimal function query complexity of $O(dn + dn^{\frac{1}{2}}\epsilon^{-1})$ for finding an $\epsilon$-approximate local solution, where $n$ and $d$ denote the sample size and dimension of data, respectively. In particular, the ZO-SPIDER-ADMM improves the existing best nonconvex zeroth-order ADMM methods by a factor of $O(d^{\frac{1}{3}}n^{\frac{1}{6}})$. Moreover, we propose a fast online ZO-SPIDER-ADMM (\emph{i.e.,} ZOO-SPIDER-ADMM). Our theoretical analysis shows that the ZOO-SPIDER-ADMM has the function query complexity of $O(d\epsilon^{-\frac{3}{2}})$, which improves the existing best result by a factor of $O(\epsilon^{-\frac{1}{2}})$. Finally, we utilize a task of structured adversarial attack on black-box deep neural networks to demonstrate the efficiency of our algorithms.
5 Ways AI Has Changed Ecommerce - Search Engine Journal
This is a sponsored post written by Atomic Reach. The opinions expressed in this article are the sponsor's own. When it comes to shopping, many customers have decided to take their business online. Statista has estimated that 1.92 billion global buyers will participate in ecommerce activities in 2019. The number is expected to rise to more than 2 billion by 2021. This demand for online goods has caused companies to be more creative in how they reach audiences online.
The Green Google: Berlin Search Engine Uses Profits to Plant Trees
At first glance, the Berlin startup doesn't seem so different from others: a factory floor in the rear courtyard of a building in the city's Neukölln district, stacked preserving jars filled with muesli in the kitchen, a discarded ping-pong surface repurposed as a conference table. The employees are young, relaxed and very international. The company's head and founder, Christian Kroll, is 35 years old, the same age as Mark Zuckerberg. The two men also share a quirk: To avoid wasting time in the mornings choosing an outfit, he always wears the same thing -- in his case, blank white T-shirts made from organic cotton. Zuckerberg's favorite color, by contrast, is gray.
Production Ranking Systems: A Review
Iqbal, Murium, Subedi, Nishan, Aryafar, Kamelia
The problem of ranking is a multi-billion dollar problem. In this paper we present an overview of several production quality ranking systems. We show that due to conflicting goals of employing the most effective machine learning models and responding to users in real time, ranking systems have evolved into a system of systems, where each subsystem can be viewed as a component layer. We view these layers as being data processing, representation learning, candidate selection and online inference. Each layer employs different algorithms and tools, with every end-to-end ranking system spanning multiple architectures. Our goal is to familiarize the general audience with a working knowledge of ranking at scale, the tools and algorithms employed and the challenges introduced by adopting a layered approach.
Is Google creating a voice-activated search engine for TODDLERS?
Google is potentially creating a search engine for toddlers, despite recent privacy scandals. The tech giant has filed a European patent, entitled Gamifying Voice Search Experience for Children, which gives it exclusive rights to develop the concept. Aimed at nursery-age youngsters, the prospective product would use a child-friendly bubble-interface to engage with infants. This would be separate to Google Assistant, which already allows people to conduct voice-activated searches on their devices. However, education experts have raised concerns over the risk of potential privacy violations, such as those associated with Amazon's Echo Device, plus the dangers of making children addicted to technology.
Benefits of Enabling Enterprise Search in your digital Workplace eXo
A disconnected/disengaged workforce, broken business processes and an overall decrease in efficiency represent the most recurrent challenges facing organizations today. As a result, digital workplace solutions have grown in popularity as they offer an holistic solution capable of integrating different tools and applications. A typical digital workplace includes a knowledge management system (KMS), an enterprise social network (ESN), an intranet portal, instant messaging and more. It also integrates different third party software used internally, from CRM to Human Resources Information Systems (HRIS). For better usage and efficiency, a digital workplace needs to collect data from all these data sources and make it widely accessible to users in a centralized place – thus the importance of the enterprise search engine.
Learning to Rank Broad and Narrow Queries in E-Commerce
Devapujula, Siddhartha, Arora, Sagar, Borar, Sumit
Search is a prominent channel for discovering products on an e-commerce platform. Ranking products retrieved from search becomes crucial to address customer's need and optimize for business metrics. While learning to Rank (LETOR) models have been extensively studied and have demonstrated efficacy in the context of web search; it is a relatively new research area to be explored in the e-commerce. In this paper, we present a framework for building LETOR model for an e-commerce platform. We analyze user queries and propose a mechanism to segment queries between broad and narrow based on user's intent. We discuss different types of features - query, product and query-product and discuss challenges in using them. We show that sparsity in product features can be tackled through a denoising auto-encoder while skip-gram based word embeddings help solve the query-product sparsity issues. We also present various target metrics that can be employed for evaluating search results and compare their robustness. Further, we build and compare performances of both pointwise and pairwise LETOR models on fashion category data set. We also build and compare distinct models for broad and narrow queries, analyze feature importance across these and show that these specialized models perform better than a combined model in the fashion world.