Collaborating Authors

bert model

Sparse Transformers


Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. If you want to analyze how fast 19 sparse BERT models perform inference, you'll only need a YAML file and 16GB of RAM to find out.

Language AI is really heating up


We are excited to bring Transform 2022 back in-person July 19 and virtually July 20 - 28. Join AI and data leaders for insightful talks and exciting networking opportunities. In just a short number of years, deep learning algorithms have evolved to be able to beat the world's best players at board games and recognize faces with the same accuracy as a human (or perhaps even better). But mastering the unique and far-reaching complexities of human language has proven to be one of AI's toughest challenges. Could that be about to change? The ability for computers to effectively understand all human language would completely transform how we engage with brands, businesses, and organizations across the world.

Analysis of Transformer, Attention Mechanism and BERT


Previous dominant sequence transduction models are based on complex recurrent or convolutional neural networks that include an encoder and a decoder. Given an input sequence, models based on RNN will process the sequence word by word, they generate a sequence of hidden states, as a function of the previous hidden state and the input for position t. This inherently sequential nature precludes parallelization within training examples, which becomes critical at longer sequence lengths, as memory constraints limit batching across examples. However, Transformer is based solely on attention mechanisms, dispensing with recurrence and convolutions entirely, to draw global dependencies between input and output, which also allows for more parallelization. Experiments on two machine translation tasks in this paper show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.

Word2vec vs BERT


Both word2vec and BERT are recent popular methods in NLP which are used for generating vector representation of words. Essentially replacing the use of word index dictionaries and one hot encoded vectors to represent text. Both word-index and one hot encoding methods do not capture the semantic sense of language. Also, one hot encoding becomes computationally infeasible if the size of vocabulary is LARGE. Word2vec [1] is a neural network approach to learn distributed word vectors in a way that words used in similar syntactic or semantic context, lie closer to each other in the distributed vector space.

A Complete Guide to ktrain: A Wrapper for TensorFlow Keras


To make the predictive models more robust and outperforming, we need to use those modules and processes that are lightweight and can work faster. Ktrain is a lightweight python wrapper that provides such features to an extent. It is a lightweight wrapper for the deep learning library TensorFlow Keras that helps in building, training, and deploying neural networks and other machine learning models. In this article, we are going to discuss the ktrain package in detail. We will go through its important features and pre-trained models available with it.



"Just Accepted" papers have undergone full peer review and have been accepted for publication in Radiology: Artificial Intelligence. This article will undergo copyediting, layout, and proof review before it is published in its final version. Please note that during production of the final copyedited article, errors may be discovered which could affect the content. To assess if semisupervised natural language processing (NLP) of text clinical radiology reports could provide useful automated diagnosis categorization for ground truth labeling to overcome manual labeling bottlenecks in the machine learning pipeline. In this retrospective study, 1503 text cardiac MRI reports (from between 2016 and 2019) were manually annotated for five diagnoses by clinicians: normal, dilated cardiomyopathy (DCM), hypertrophic cardiomyopathy (HCM), myocardial infarction (MI), and myocarditis.

Does constituency analysis enhance domain-specific pre-trained BERT models for relation extraction? Artificial Intelligence

Recently many studies have been conducted on the topic of relation extraction. The DrugProt track at BioCreative VII provides a manually-annotated corpus for the purpose of the development and evaluation of relation extraction systems, in which interactions between chemicals and genes are studied. We describe the ensemble system that we used for our submission, which combines predictions of fine-tuned bioBERT, sciBERT and const-bioBERT models by majority voting. We specifically tested the contribution of syntactic information to relation extraction with BERT. We observed that adding constituentbased syntactic information to BERT improved precision, but decreased recall, since relations rarely seen in the train set were less likely to be predicted by BERT models in which the syntactic information is infused. Our code is available online [].

Rationale production to support clinical decision-making Artificial Intelligence

The development of neural networks for clinical artificial intelligence (AI) is reliant on interpretability, transparency, and performance. The need to delve into the black-box neural network and derive interpretable explanations of model output is paramount. A task of high clinical importance is predicting the likelihood of a patient being readmitted to hospital in the near future to enable efficient triage. With the increasing adoption of electronic health records (EHRs), there is great interest in applications of natural language processing (NLP) to clinical free-text contained within EHRs. In this work, we apply InfoCal, the current state-of-the-art model that produces extractive rationales for its predictions, to the task of predicting hospital readmission using hospital discharge notes. We compare extractive rationales produced by InfoCal to competitive transformer-based models pretrained on clinical text data and for which the attention mechanism can be used for interpretation. We find each presented model with selected interpretability or feature importance methods yield varying results, with clinical language domain expertise and pretraining critical to performance and subsequent interpretability.

SocialBERT -- Transformers for Online SocialNetwork Language Modelling Artificial Intelligence

The ubiquity of the contemporary language understanding tasks gives relevance to the development of generalized, yet highly efficient models that utilize all knowledge, provided by the data source. In this work, we present SocialBERT - the first model that uses knowledge about the author's position in the network during text analysis. We investigate possible models for learning social network information and successfully inject it into the baseline BERT model. The evaluation shows that embedding this information maintains a good generalization, with an increase in the quality of the probabilistic model for the given author up to 7.5%. The proposed model has been trained on the majority of groups for the chosen social network, and still able to work with previously unknown groups. The obtained model, as well as the code of our experiments, is available for download and use in applied tasks.