Instructional Material
KG-MTT-BERT: Knowledge Graph Enhanced BERT for Multi-Type Medical Text Classification
He, Yong, Wang, Cheng, Zhang, Shun, Li, Nan, Li, Zhaorong, Zeng, Zhenyu
Medical text learning has recently emerged as a promising area to improve healthcare due to the wide adoption of electronic health record (EHR) systems. The complexity of the medical text such as diverse length, mixed text types, and full of medical jargon, poses a great challenge for developing effective deep learning models. BERT has presented state-of-the-art results in many NLP tasks, such as text classification and question answering. However, the standalone BERT model cannot deal with the complexity of the medical text, especially the lengthy clinical notes. Herein, we develop a new model called KG-MTT-BERT (Knowledge Graph Enhanced Multi-Type Text BERT) by extending the BERT model for long and multi-type text with the integration of the medical knowledge graph. Our model can outperform all baselines and other state-of-the-art models in diagnosis-related group (DRG) classification, which requires comprehensive medical text for accurate classification. We also demonstrated that our model can effectively handle multi-type text and the integration of medical knowledge graph can significantly improve the performance.
The Role of Coverage in Online Reinforcement Learning
Xie, Tengyang, Foster, Dylan J., Bai, Yu, Jiang, Nan, Kakade, Sham M.
The last decade has seen development of reinforcement learning algorithms with strong empirical performance in domains including robotics (Kober et al., 2013; Lillicrap et al., 2015), dialogue systems (Li et al., 2016), and personalization (Agarwal et al., 2016; Tewari and Murphy, 2017). While there is great interest in applying these techniques to real-world decision making applications, the number of samples (steps of interaction) required to do so is often prohibitive, with state-of-the-art algorithms requiring millions of samples to reach human-level performance in challenging domains. Developing algorithms with improved sample efficiency, which entails efficiently generalizing across high-dimensional states and actions while taking advantage of problem structure as modeled practitioners, remains a major challenge. Investigation into design and analysis of algorithms for sample-efficient reinforcement learning has largely focused on two distinct problem formulations: Online reinforcement learning, where the learner can repeatedly interact with the environment by executing a policy and observing the resulting trajectory. Offline reinforcement learning, where the learner has access to logged transitions ands reward gathered from a fixed behavioral policy (e.g., historical data or expert demonstrations), but cannot directly interact with the underlying environment. While these formulations share a common goal (learning a near-optimal policy), the algorithms used to achieve this goal and conditions under which it can be achieved are seemingly quite different.
Providing Insights for Open-Response Surveys via End-to-End Context-Aware Clustering
Esmaeilzadeh, Soheil, Williams, Brian, Shamsi, Davood, Vikingstad, Onar
Teachers often conduct surveys in order to collect data from a predefined group of students to gain insights into topics of interest. When analyzing surveys with open-ended textual responses, it is extremely time-consuming, labor-intensive, and difficult to manually process all the responses into an insightful and comprehensive report. In the analysis step, traditionally, the teacher has to read each of the responses and decide on how to group them in order to extract insightful information. Even though it is possible to group the responses only using certain keywords, such an approach would be limited since it not only fails to account for embedded contexts but also cannot detect polysemous words or phrases and semantics that are not expressible in single words. In this work, we present a novel end-to-end context-aware framework that extracts, aggregates, and abbreviates embedded semantic patterns in open-response survey data. Our framework relies on a pre-trained natural language model in order to encode the textual data into semantic vectors. The encoded vectors then get clustered either into an optimally tuned number of groups or into a set of groups with pre-specified titles. In the former case, the clusters are then further analyzed to extract a representative set of keywords or summary sentences that serve as the labels of the clusters. In our framework, for the designated clusters, we finally provide context-aware wordclouds that demonstrate the semantically prominent keywords within each group. Honoring user privacy, we have successfully built the on-device implementation of our framework suitable for real-time analysis on mobile devices and have tested it on a synthetic dataset. Our framework reduces the costs at-scale by automating the process of extracting the most insightful information pieces from survey data.
Easily Identifying Plant Diseases with Object Detection
As part of our Object Detection release posts, on this post, we would like to showcase the entire application development process from problem identification to model deployment, a seemingly ambitious undertaking. Let me tell you the story of how (and why) I built a plant disease detector web application. You too can build similar applications that will help you in your daily life in just a few hours. If you would like to play with the app, you can find it here and the source code is also available in this repository. A few days ago, I moved to a new home.
Implementing the Transformer Decoder From Scratch in TensorFlow and Keras
There are many similarities between the Transformer encoder and decoder, such as in their implementation of multi-head attention, layer normalization and a fully connected feed-forward network as their final sub-layer. Having implemented the Transformer encoder, we will now proceed to apply our knowledge in implementing the Transformer decoder, as a further step towards implementing the complete Transformer model. Our end goal remains the application of the complete model to Natural Language Processing (NLP). In this tutorial, you will discover how to implement the Transformer decoder from scratch in TensorFlow and Keras. Implementing the Transformer Decoder From Scratch in TensorFlow and Keras Photo by François Kaiser, some rights reserved.
The Vision Transformer Model
With the Transformer architecture revolutionizing the implementation of attention, and achieving very promising results in the natural language processing domain, it was only a matter of time before we could see its application in the computer vision domain too. This was eventually achieved with the implementation of the Vision Transformer (ViT). In this tutorial, you will discover the architecture of the Vision Transformer model, and its application to the task of image classification. The Vision Transformer Model Photo by Paul Skorupskas, some rights reserved. We had seen how the emergence of the Transformer architecture of Vaswani et al. (2017) has revolutionized the use of attention, without relying on recurrence and convolutions as earlier attention models had previously done.
1st ICLR International Workshop on Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data (PAIR^2Struct)
Wang, Hao, Lin, Wanyu, He, Hao, Wang, Di, Mao, Chengzhi, Zhang, Muhan
Recent years have seen advances on principles and guidance relating to accountable and ethical use of artificial intelligence (AI) spring up around the globe. Specifically, Data Privacy, Accountability, Interpretability, Robustness, and Reasoning have been broadly recognized as fundamental principles of using machine learning (ML) technologies on decision-critical and/or privacy-sensitive applications. On the other hand, in tremendous real-world applications, data itself can be well represented as various structured formalisms, such as graph-structured data (e.g., networks), grid-structured data (e.g., images), sequential data (e.g., text), etc. By exploiting the inherently structured knowledge, one can design plausible approaches to identify and use more relevant variables to make reliable decisions, thereby facilitating real-world deployments.
Joint Entropy Search for Multi-objective Bayesian Optimization
Tu, Ben, Gandy, Axel, Kantas, Nikolas, Shafei, Behrang
Many real-world problems can be phrased as a multi-objective optimization problem, where the goal is to identify the best set of compromises between the competing objectives. Multi-objective Bayesian optimization (BO) is a sample efficient strategy that can be deployed to solve these vector-valued optimization problems where access is limited to a number of noisy objective function evaluations. In this paper, we propose a novel information-theoretic acquisition function for BO called Joint Entropy Search (JES), which considers the joint information gain for the optimal set of inputs and outputs. We present several analytical approximations to the JES acquisition function and also introduce an extension to the batch setting.
Dominance-based Rough Set Approach, basic ideas and main trends
Błaszczyński, Jerzy, Greco, Salvatore, Matarazzo, Benedetto, Szeląg, Marcin
Among the many merits of Roman Słowiński in his so long and so rich scientific carrier, we have to consider his pioneering approach to the use of artificial intelligence methodologies to decision support, and, in particular, to Multiple Criteria Decision Aiding (MCDA) (for an updated state of the art see [48]). In this perspective, the proposal and the development of the Dominance-based Rough Set Approach (DRSA) is a cornerstone in the domain. The DRSA basic idea of a decision support procedure based on a decision model expressed in natural language and obtained from simple preference information in terms of exemplary decisions has attracted the interest of experts and it is now considered one of the three main approaches to MCDA, together with the classical Multiple Attribute Utility Theory (MAUT) [58] and the outranking approach [75]. In fact, DRSA is not a mere application to MCDA of concepts and tools already proposed and developed in the domain of artificial intelligence, knowledge discovery, data mining and machine learning. Indeed, consideration of preference orders typical for MCDA problems required a reformulation of many important concepts and methodologies, so that DRSA became a methodology viable and interesting per se also in these domains. Consequently, after more or less 25 years from the proposal of DRSA, we try to present a first assessment taking into consideration the basic ideas and the main developments.
Knowledge Tracing for Complex Problem Solving: Granular Rank-Based Tensor Factorization
Wang, Chunpai, Sahebi, Shaghayegh, Zhao, Siqian, Brusilovsky, Peter, Moraes, Laura O.
Knowledge Tracing (KT), which aims to model student knowledge level and predict their performance, is one of the most important applications of user modeling. Modern KT approaches model and maintain an up-to-date state of student knowledge over a set of course concepts according to students' historical performance in attempting the problems. However, KT approaches were designed to model knowledge by observing relatively small problem-solving steps in Intelligent Tutoring Systems. While these approaches were applied successfully to model student knowledge by observing student solutions for simple problems, they do not perform well for modeling complex problem solving in students.M ost importantly, current models assume that all problem attempts are equally valuable in quantifying current student knowledge.However, for complex problems that involve many concepts at the same time, this assumption is deficient. In this paper, we argue that not all attempts are equivalently important in discovering students' knowledge state, and some attempts can be summarized together to better represent student performance. We propose a novel student knowledge tracing approach, Granular RAnk based TEnsor factorization (GRATE), that dynamically selects student attempts that can be aggregated while predicting students' performance in problems and discovering the concepts presented in them. Our experiments on three real-world datasets demonstrate the improved performance of GRATE, compared to the state-of-the-art baselines, in the task of student performance prediction. Our further analysis shows that attempt aggregation eliminates the unnecessary fluctuations from students' discovered knowledge states and helps in discovering complex latent concepts in the problems.