Information Retrieval
Data Infrastructure and Approaches for Ontology-Based Drug Repurposing
Boyer, Stephen, Griffin, Thomas, Swaminathan, Sarath, Clarkson, Kenneth L., Zubarev, Dmitry
IBM Almaden Research Center, 650 Harry Road, San Jose, California 95136 Abstract We report development of a data infrastructure for drug repurposing that takes advantage of two currently available chemical ontologies. The data infrastructure includes a database of compoundtarget associations augmented with molecular ontological labels. It also contains two computational tools for prediction of new associations. We describe two drug-repurposing systems: one, Nascent Ontological Information Retrieval for Drug Repurposing (NOIR-DR), based on an information retrieval strategy, and another, based on nonnegative matrix factorization together with compound similarity, that was inspired by recommender systems. We report the performance of both tools on a drug-repurposing task. 1 Introduction Drug repurposing is an efficient strategy for drug discovery, where new targets or activities are found for known drugs [1-5]. Drug repurposing requires the efficient representation of existing information about the activity of chemical compounds as drugs, and the development of algorithms that leverage such information and propose new indications.
Natural Language Processing for Information Extraction
With rise of digital age, there is an explosion of information in the form of news, articles, social media, and so on. Much of this data lies in unstructured form and manually managing and effectively making use of it is tedious, boring and labor intensive. This explosion of information and need for more sophisticated and efficient information handling tools gives rise to Information Extraction(IE) and Information Retrieval(IR) technology. Information Extraction systems takes natural language text as input and produces structured information specified by certain criteria, that is relevant to a particular application. Various sub-tasks of IE such as Named Entity Recognition, Coreference Resolution, Named Entity Linking, Relation Extraction, Knowledge Base reasoning forms the building blocks of various high end Natural Language Processing (NLP) tasks such as Machine Translation, Question-Answering System, Natural Language Understanding, Text Summarization and Digital Assistants like Siri, Cortana and Google Now. This paper introduces Information Extraction technology, its various sub-tasks, highlights state-of-the-art research in various IE subtasks, current challenges and future research directions.
Machine Learning Sifts & Searches Complex Scientific Data
As scientific datasets increase in both size and complexity, the ability to label, filter and search this deluge of information has become a laborious, time-consuming and sometimes impossible task, without the help of automated tools enabled by machine learning. With this in mind, a team of researchers from the Department of Energy's Lawrence Berkeley National Laboratory (Berkeley Lab) and UC Berkeley are developing innovative machine learning tools to pull contextual information from scientific datasets and automatically generate metadata tags for each file. Scientists can then search these files via a web-based search engine for scientific data, called Science Search, that the Berkeley team is building. As a proof-of-concept, the team is working with staff at Berkeley Lab's Molecular Foundry, to demonstrate the concepts of Science Search on the images captured by the facility's instruments. A beta version of the platform has been made available to Foundry researchers.
Search in Pics: Google ice cream pool, AI powered piano & watching the World Cup - Search Engine Land
Note: By submitting this form, you agree to Third Door Media's terms. In this week's Search In Pictures, here are the latest images culled from the web, showing what people eat at the search engine companies, how they play, who they meet, where they speak, what toys they have and more. Note: By submitting this form, you agree to Third Door Media's terms. Have something to say about this article?
Doctrine raises $11.6 million for its legal search engine
French startup Doctrine is raising a $11.6 million funding round (€10 million) from existing investors Otium Venture and Xavier Niel. Doctrine is building a search engine for court decisions and other legal texts. This is a key tool if you're a lawyer or you're working in the legal industry in general. There are now a thousand companies using the service. It currently costs around €129 per user per month.
Record Linkage to Match Customer Names: A Probabilistic Approach
Fatemi, Bahare, Kazemi, Seyed Mehran, Poole, David
Consider the following problem: given a database of records indexed by names (e.g., name of companies, restaurants, businesses, or universities) and a new name, determine whether the new name is in the database, and if so, which record it refers to. This problem is an instance of record linkage problem and is a challenging problem because people do not consistently use the official name, but use abbreviations, synonyms, different order of terms, different spelling of terms, short form of terms, and the name can contain typos or spacing issues. We provide a probabilistic model using relational logistic regression to find the probability of each record in the database being the desired record for a given query and find the best record(s) with respect to the probabilities. Building on term-matching and translational approaches for search, our model addresses many of the aforementioned challenges and provides good results when existing baselines fail. Using the probabilities outputted by the model, we can automate the search process for a portion of queries whose desired documents get a probability higher than a trust threshold. We evaluate our model on a large real-world dataset from a telecommunications company and compare it to several state-of-the-art baselines. The obtained results show that our model is a promising probabilistic model for record linkage for names. We also test if the knowledge learned by our model on one domain can be effectively transferred to a new domain. For this purpose, we test our model on an unseen test set from the business names of the secondString dataset. Promising results show that our model can be effectively applied to unseen datasets. Finally, we study the sensitivity of our model to the statistics of datasets.
Senzing's Software for Real-Time AI for Entity Resolution to Fight Financial Crime - insideBIGDATA
Senzing, a new artificial intelligence-based (AI) software company, announced its Senzing software product to address the $14.37 billion financial fraud market. Senzing is an IBM spinout that has reinvented entity resolution, which senses who is who in real time across multiple big data sources. Senzing is disrupting the fraud solutions market by offering the first real-time, plug-and-play, AI entity resolution software product for fraud detection, insider threats and more. Now, any company can deploy Senzing to quickly and effectively detect bad actors in their big data. Senzing uses entity-centric learning and other unique techniques to pierce through falsified identities and networks to find criminals.
Metadata Enrichment of Multi-Disciplinary Digital Library: A Semantic-based Approach
Al-Natsheh, Hussein T., Martinet, Lucie, Muhlenbach, Fabrice, Rico, Fabien, Zighed, Djamel A.
In the scientific digital libraries, some papers from different research communities can be described by community-dependent keywords even if they share a semantically similar topic. Articles that are not tagged with enough keyword variations are poorly indexed in any information retrieval system which limits potentially fruitful exchanges between scientific disciplines. In this paper, we introduce a novel experimentally designed pipeline for multi-label semantic-based tagging developed for open-access metadata digital libraries. The approach starts by learning from a standard scientific categorization and a sample of topic tagged articles to find semantically relevant articles and enrich its metadata accordingly. Our proposed pipeline aims to enable researchers reaching articles from various disciplines that tend to use different terminologies. It allows retrieving semantically relevant articles given a limited known variation of search terms. In addition to achieving an accuracy that is higher than an expanded query based method using a topic synonym set extracted from a semantic network, our experiments also show a higher computational scalability versus other comparable techniques. We created a new benchmark extracted from the open-access metadata of a scientific digital library and published it along with the experiment code to allow further research in the topic.