Country
ZiMM: a deep learning model for long term adverse events with non-clinical claims data
Bacry, Emmanuel, Gaïffas, Stéphane, Kabeshova, Anastasiia, Yu, Yiyang
This paper considers the problem of modeling long-term adverse events following prostatic surgery performed on patients with urination problems, using the French national health insurance database (SNIIRAM), which is a non-clinical claims database built around healthcare reimbursements of more than 65 million people. This makes the problem particularly challenging compared to what could be done using clinical hospital data, albeit a much smaller sample, while we exploit here the claims of almost all French citizens diagnosed with prostatic problems (with between 1.5 and 5 years of history). We introduce a new model, called ZiMM (Zero-inflated Mixture of Multinomial distributions) to capture such long-term adverse events, and we build a deep-learning architecture on top of it to deal with the complex, highly heterogeneous and sparse patterns observable in such a large claims database. This architecture combines several ingredients: embedding layers for drugs, medical procedures, and diagnosis codes; embeddings aggregation through a self-attention mechanism; recurrent layers to encode the health pathways of patients before their surgery and a final decoder layer which outputs the ZiMM's parameters.
Convergence to minima for the continuous version of Backtracking Gradient Descent
The main result of this paper is: {\bf Theorem.} Let $f:\mathbb{R}^k\rightarrow \mathbb{R}$ be a $C^{1}$ function, so that $\nabla f$ is locally Lipschitz continuous. Assume moreover that $f$ is $C^2$ near its generalised saddle points. Fix real numbers $\delta_0>0$ and $0<\alpha <1$. Then there is a smooth function $h:\mathbb{R}^k\rightarrow (0,\delta_0]$ so that the map $H:\mathbb{R}^k\rightarrow \mathbb{R}^k$ defined by $H(x)=x-h(x)\nabla f(x)$ has the following property: (i) For all $x\in \mathbb{R}^k$, we have $f(H(x)))-f(x)\leq -\alpha h(x)||\nabla f(x)||^2$. (ii) For every $x_0\in \mathbb{R}^k$, the sequence $x_{n+1}=H(x_n)$ either satisfies $\lim_{n\rightarrow\infty}||x_{n+1}-x_n||=0$ or $ \lim_{n\rightarrow\infty}||x_n||=\infty$. Each cluster point of $\{x_n\}$ is a critical point of $f$. If moreover $f$ has at most countably many critical points, then $\{x_n\}$ either converges to a critical point of $f$ or $\lim_{n\rightarrow\infty}||x_n||=\infty$. (iii) There is a set $\mathcal{E}_1\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_1$, the sequence $x_{n+1}=H(x_n)$, {\bf if converges}, cannot converge to a {\bf generalised} saddle point. (iv) There is a set $\mathcal{E}_2\subset \mathbb{R}^k$ of Lebesgue measure $0$ so that for all $x_0\in \mathbb{R}^k\backslash \mathcal{E}_2$, any cluster point of the sequence $x_{n+1}=H(x_n)$ is not a saddle point, and more generally cannot be an isolated generalised saddle point. Some other results are proven.
Towards the Use of Neural Networks for Influenza Prediction at Multiple Spatial Resolutions
Aiken, Emily L., Nguyen, Andre T., Santillana, Mauricio
We introduce the use of a Gated Recurrent Unit (GRU) for influenza prediction at the state- and city-level in the US, and experiment with the inclusion of real-time flu-related Internet search data. We find that a GRU has lower prediction error than current state-of-the-art methods for data-driven influenza prediction at time horizons of over two weeks. In contrast with other machine learning approaches, the inclusion of real-time Internet search data does not improve GRU predictions.
Semi-Supervised Natural Language Approach for Fine-Grained Classification of Medical Reports
Deshmukh, Neil, Gumustop, Selin, Gauriau, Romane, Buch, Varun, Wright, Bradley, Bridge, Christopher, Naidu, Ram, Andriole, Katherine, Bizzo, Bernardo
Although machine learning has become a powerful tool to augment doctors in clinical analysis, the immense amount of labeled data that is necessary to train supervised learning approaches burdens each development task as time and resource intensive. The vast majority of dense clinical information is stored in written reports, detailing pertinent patient information. The challenge with utilizing natural language data for standard model development is due to the complex nature of the modality. In this research, a model pipeline was developed to utilize an unsupervised approach to train an encoder-language model, a recurrent network, to generate document encodings; which then can be used as features passed into a decoder-classifier model that requires magnitudes less labeled data than previous approaches to differentiate between fine-grained disease classes accurately. The language model was trained on unlabeled radiology reports from the Massachusetts General Hospital Radiology Department (n=218,159) and terminated with a loss of 1.62. The classification models were trained on three labeled datasets of head CT studies of reported patients, presenting large vessel occlusion (n=1403), acute ischemic strokes (n=331), and intracranial hemorrhage (n=4350), to identify a variety of different findings directly from the radiology report data; resulting in AUCs of 0.98, 0.95, and 0.99, respectively, for the large vessel occlusion, acute ischemic stroke, and intracranial hemorrhage datasets. The output encodings are able to be used in conjunction with imaging data, to create models that can process a multitude of different modalities. The ability to automatically extract relevant features from textual data allows for faster model development and integration of textual modality, overall, allowing clinical reports to become a more viable input for more encompassing and accurate deep learning models.
Zoea -- Composable Inductive Programming Without Limits
The abstraction levels represent a general progression from the test cases through available and derived values to partial and complete solutions. The abstraction levels include: - test cases; - input and output elements; - derived values (symbolic and numeric); - code fragments; - target values; - case solutions; - case set solutions; - program solutions; - solution code. The data on the blackboard represents a set of more or less promising solution fragments at different stages of identification, characterisation and elaboration. It is worth noting that progression from test cases to solution code is not a strictly linear process. Instead knowledge sources respond to changes at one or more specific abstraction levels to produce, enhance or remove elements on different levels. The blackboard model allows this to happen in more or less any order.
Deception through Half-Truths
Estornell, Andrew, Das, Sanmay, Vorobeychik, Yevgeniy
Deception is a fundamental issue across a diverse array of settings, from cybersecurity, where decoys (e.g., honeypots) are an important tool, to politics that can feature politically motivated "leaks" and fake news about candidates.Typical considerations of deception view it as providing false information.However, just as important but less frequently studied is a more tacit form where information is strategically hidden or leaked.We consider the problem of how much an adversary can affect a principal's decision by "half-truths", that is, by masking or hiding bits of information, when the principal is oblivious to the presence of the adversary. The principal's problem can be modeled as one of predicting future states of variables in a dynamic Bayes network, and we show that, while theoretically the principal's decisions can be made arbitrarily bad, the optimal attack is NP-hard to approximate, even under strong assumptions favoring the attacker. However, we also describe an important special case where the dependency of future states on past states is additive, in which we can efficiently compute an approximately optimal attack. Moreover, in networks with a linear transition function we can solve the problem optimally in polynomial time.
Partial-Order, Partially-Seen Observations of Fluents or Actions for Plan Recognition as Planning
Nelson, Jennifer M., Cardona-Rivera, Rogelio E.
This work aims to make plan recognition as planning more ready for real-world scenarios by adapting previous compilations to work with partial-order, half-seen observations of both fluents and actions. We first redefine what observations can be and what it means to satisfy each kind. We then provide a compilation from plan recognition problem to classical planning problem, similar to original work by Ramirez and Geffner, but accommodating these more complex observation types. This compilation can be adapted towards other planning-based plan recognition techniques. Lastly we evaluate this method against an "ignore complexity" strategy that uses the original method by Ramirez and Geffner. Our experimental results suggest that, while slower, our method is equally or more accurate than baseline methods; our technique sometimes significantly reduces the size of the solution to the plan recognition problem, i.e, the size of the optimal goal set. We discuss these findings in the context of plan recognition problem difficulty and present an avenue for future work.
Motion Reasoning for Goal-Based Imitation Learning
Huang, De-An, Chao, Yu-Wei, Paxton, Chris, Deng, Xinke, Fei-Fei, Li, Niebles, Juan Carlos, Garg, Animesh, Fox, Dieter
De-An Huang 1, 2, Y u-Wei Chao, 2, Chris Paxton, 2, Xinke Deng 2, 3, Li Fei-Fei 1, Juan Carlos Niebles 1, Animesh Garg 2, 4, Dieter Fox 2, 5 Abstract -- We address goal-based imitation learning, where the aim is to output the symbolic goal from a third-person video demonstration. This enables the robot to plan for execution and reproduce the same goal in a completely different environment. The key challenge is that the goal of a video demonstration is often ambiguous at the level of semantic actions. The human demonstrators might unintentionally achieve certain subgoals in the demonstrations with their actions. Our main contribution is to propose a motion reasoning framework that combines task and motion planning to disambiguate the true intention of the demonstrator in the video demonstration. This allows us to robustly recognize the goals that cannot be disambiguated by previous action-based approaches. We evaluate our approach by collecting a dataset of 96 video demonstrations in a mockup kitchen environment. We show that our motion reasoning plays an important role in recognizing the actual goal of the demonstrator and improves the success rate by over 20%. We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment. I NTRODUCTION We are interested in allowing robots to learn new tasks from video demonstrations. Recently, there has been rapid progress in imitation learning [1-4], which even enables learning a new task from a single demonstration of the task [5-7]. By leveraging meta-learning [8], the robot learns to follow the actions in the demonstration.
HDDL -- A Language to Describe Hierarchical Planning Problems
Höller, D., Behnke, G., Bercher, P., Biundo, S., Fiorino, H., Pellier, D., Alford, R.
The research in hierarchical planning has made considerable progress in the last few years. Many recent systems do not rely on hand-tailored advice anymore to find solutions, but are supposed to be domain-independent systems that come with sophisticated solving techniques. In principle, this development would make the comparison between systems easier (because the domains are not tailored to a single system anymore) and -- much more important -- also the integration into other systems, because the modeling process is less tedious (due to the lack of advice) and there is no (or less) commitment to a certain planning system the model is created for. However, these advantages are destroyed by the lack of a common input language and feature set supported by the different systems. In this paper, we propose an extension to PDDL, the description language used in non-hierarchical planning, to the needs of hierarchical planning systems. We restrict our language to a basic feature set shared by many recent systems, give an extension of PDDL's EBNF syntax definition, and discuss our extensions with respect to several planner-specific input languages from related work.
SAVEHR: Self Attention Vector Representations for EHR based Personalized Chronic Disease Onset Prediction and Interpretability
Mallya, Sunil, Overhage, Marc, Bodapati, Sravan, Srivastava, Navneet, Genc, Sahika
Chronic disease progression is emerging as an important area of investment for healthcare providers. As the quantity and richness of available clinical data continue to increase along with advances in machine learning, there is great potential to advance our approaches to caring for patients. An ideal approach to this problem should generate good performance on at least three axes namely, a) perform across many clinical conditions without requiring deep clinical expertise or extensive data scientist effort, b) generalization across populations, and c) be explainable (model interpretability). We present SAVEHR, a self-attention based architecture on heterogeneous structured EHR data that achieves $>$ 0.51 AUC-PR and $>$ 0.87 AUC-ROC gains on predicting the onset of four clinical conditions (CHF, Kidney Failure, Diabetes and COPD) 15-months in advance, and transfers with high performance onto a new population. We demonstrate that SAVEHR model performs superior to ten baselines on all three axes stated formerly.