Wikipedia-based Semantic Interpretation for Natural Language Processing

AAAI Conferences

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text.


Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

#artificialintelligence

Associations are shown with cluster indices, which summarize properties of clusters derived from affinity propagation clusters of the TIL map--properties that provide details on local structure beyond simple densities. The Ball-Hall index is a particular clustering index, summarizing the mean, through all the clusters, of their mean dispersion and is equivalent to the mean of the squared distances of the points of the cluster with respect to its center. In our data, the Ball-Hall index is correlated (ρSpearman 0.95) with the mean cluster extent, CE. Significance test p value is shown in the lower left. The Banfield-Raftery index is the weighted sum of the logarithms of the mean cluster dispersion and, in our data, often correlates with the number of clusters.


Automated Blood Cell Detection and Counting via Deep Learning for Microfluidic Point-of-Care Medical Devices

arXiv.org Artificial Intelligence

Automated in-vitro cell detection and counting have been a key theme for artificial and intelligent biological analysis such as biopsy, drug analysis and decease diagnosis. Along with the rapid development of microfluidics and lab-on-chip technologies, in-vitro live cell analysis has been one of the critical tasks for both research and industry communities. However, it is a great challenge to obtain and then predict the precise information of live cells from numerous microscopic videos and images. In this paper, we investigated in-vitro detection of white blood cells using deep neural networks, and discussed how state-of-the-art machine learning techniques could fulfil the needs of medical diagnosis. The approach we used in this study was based on Faster Region-based Convolutional Neural Networks (Faster RCNNs), and a transfer learning process was applied to apply this technique to the microscopic detection of blood cells. Our experimental results demonstrated that fast and efficient analysis of blood cells via automated microscopic imaging can achieve much better accuracy and faster speed than the conventionally applied methods, implying a promising future of this technology to be applied to the microfluidic point-of-care medical devices.


Wikipedia-based Semantic Interpretation for Natural Language Processing

Journal of Artificial Intelligence Research

Adequate representation of natural language semantics requires access to vast amounts of common sense and domain-specific world knowledge. Prior work in the field was based on purely statistical techniques that did not make use of background knowledge, on limited lexicographic knowledge bases such as WordNet, or on huge manual efforts such as the CYC project. Here we propose a novel method, called Explicit Semantic Analysis (ESA), for fine-grained semantic interpretation of unrestricted natural language texts. Our method represents meaning in a high-dimensional space of concepts derived from Wikipedia, the largest encyclopedia in existence. We explicitly represent the meaning of any text in terms of Wikipedia-based concepts. We evaluate the effectiveness of our method on text categorization and on computing the degree of semantic relatedness between fragments of natural language text. Using ESA results in significant improvements over the previous state of the art in both tasks. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.


Transcriptome and epigenome landscape of human cortical development modeled in organoids

Science

The human cerebral cortex has undergone an extraordinary increase in size and complexity during mammalian evolution. Cortical cell lineages are specified in the embryo, and genetic and epidemiological evidence implicates early cortical development in the etiology of neuropsychiatric disorders such as autism spectrum disorder (ASD), intellectual disabilities, and schizophrenia. Most of the disease-implicated genomic variants are located outside of genes, and the interpretation of noncoding mutations is lagging behind owing to limited annotation of functional elements in the noncoding genome. We set out to discover gene-regulatory elements and chart their dynamic activity during prenatal human cortical development, focusing on enhancers, which carry most of the weight upon regulation of gene expression. We longitudinally modeled human brain development using human induced pluripotent stem cell (hiPSC)–derived cortical organoids and compared organoids to isogenic fetal brain tissue. Fetal fibroblast–derived hiPSC lines were used to generate cortically patterned organoids and to compare oganoids' epigenome and transcriptome to that of isogenic fetal brains and external datasets. Organoids model cortical development between 5 and 16 postconception weeks, thus enabling us to study transitions from cortical stem cells to progenitors to early neurons. The greatest changes occur at the transition from stem cells to progenitors. The regulatory landscape encompasses a total set of 96,375 enhancers linked to target genes, with 49,640 enhancers being active in organoids but not in mid-fetal brain, suggesting major roles in cortical neuron specification. Enhancers that gained activity in the human lineage are active in the earliest stages of organoid development, when they target genes that regulate the growth of radial glial cells. Parallel weighted gene coexpression network analysis (WGCNA) of transcriptome and enhancer activities defined a number of modules of coexpressed genes and coactive enhancers, following just six and four global temporal patterns that we refer to as supermodules, likely reflecting fundamental programs in embryonic and fetal brain. Correlations between gene expression and enhancer activity allowed stratifying enhancers into two categories: activating regulators (A-regs) and repressive regulators (R-regs).