sequence-to-sequence architecture
DRTCI: Learning Disentangled Representations for Temporal Causal Inference
Gupta, Garima, Vig, Lovekesh, Shroff, Gautam
Medical professionals evaluating alternative treatment plans for a patient often encounter time varying confounders, or covariates that affect both the future treatment assignment and the patient outcome. The recently proposed Counterfactual Recurrent Network (CRN) accounts for time varying confounders by using adversarial training to balance recurrent historical representations of patient data. However, this work assumes that all time varying covariates are confounding and thus attempts to balance the full state representation. Given that the actual subset of covariates that may in fact be confounding is in general unknown, recent work on counterfactual evaluation in the static, non-temporal setting has suggested that disentangling the covariate representation into separate factors, where each either influence treatment selection, patient outcome or both can help isolate selection bias and restrict balancing efforts to factors that influence outcome, allowing the remaining factors which predict treatment without needlessly being balanced.
Deep Learning for Natural Language Processing (NLP) -- using RNNs & CNNs
Wouldn't it be cool if a computer could understand the actual human sentiment behind sarcastic texts that can sometimes even trump actual humans? Or what if computers could understand a human language so well that it can estimate a probability telling you how likely it is to encounter any random sentence that you give it? Or maybe it could generate completely fake code snippets of the Linux kernel that look so authentic that they are just as intimidating as the actual source code (well, unless you are a kernel programmer yourself)? What if computers could immaculately translate English to French or over 100 languages from all over the world? Or "see" an image and describe the items found in the photo?
Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
Lin, Sheng-Chieh, Yang, Jheng-Hong, Nogueira, Rodrigo, Tsai, Ming-Feng, Wang, Chuan-Ju, Lin, Jimmy
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs). We leverage PLMs to address the strong token-to-token independence assumption made in the common objective, maximum likelihood estimation, for the CQR task. In CQR benchmarks of task-oriented dialogue systems, we evaluate fine-tuned PLMs on the recently-introduced CANARD dataset as an in-domain task and validate the models using data from the TREC 2019 CAsT Track as an out-domain task. Examining a variety of architectures with different numbers of parameters, we demonstrate that the recent text-to-text transfer transformer (T5) achieves the best results both on CANARD and CAsT with fewer parameters, compared to similar transformer architectures.
Deep Learning for Natural Language Processing (NLP) โ using RNNs & CNNs
Wouldn't it be cool if a computer could understand the actual human sentiment behind sarcastic texts that can sometimes even trump actual humans? Or what if computers could understand a human language so well that it can estimate a probability telling you how likely it is to encounter any random sentence that you give it? Or maybe it could generate completely fake code snippets of the Linux kernel that look so authentic that they are just as intimidating as the actual source code (well, unless you are a kernel programmer yourself)? What if computers could immaculately translate English to French or over 100 languages from all over the world? Or "see" an image and describe the items found in the photo?
Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks
Electronic health records provide a rich source of data for machine learning methods to learn dynamic treatment responses over time. However, any direct estimation is hampered by the presence of time-dependent confounding, where actions taken are dependent on time-varying variables related to the outcome of interest. Drawing inspiration from marginal structural models, a class of methods in epidemiology which use propensity weighting to adjust for time-dependent confounders, we introduce the Recurrent Marginal Structural Network - a sequence-to-sequence architecture for forecasting a patient's expected response to a series of planned treatments.
Sequence-to-Sequence
In this way the network architecture is able to respond to an utterance with an response. Last year this concept was generalized to including a dialog encoder layer on top of the standard encoder. This might further enhance the architecture to keep track of previous utterances in a full dialog. The Sequence-To-Sequence architectures as every machine learning system has to undergo a certain training process. Here, the encoder and the decoder are trained together by presenting corresponding sequence pairs to them.