data method
Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity
Wiederkehr, Christoph, Heumann, Christian, Schomaker, Michael
We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the 'WASH Benefits Bangladesh' dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.
DATa: Domain Adaptation-Aided Deep Table Detection Using Visual-Lexical Representations
Kwon, Hyebin, An, Joungbin, Lee, Dongwoo, Shin, Won-Yong
Considerable research attention has been paid to table detection by developing not only rule-based approaches reliant on hand-crafted heuristics but also deep learning approaches. Although recent studies successfully perform table detection with enhanced results, they often experience performance degradation when they are used for transferred domains whose table layout features might differ from the source domain in which the underlying model has been trained. To overcome this problem, we present DATa, a novel Domain Adaptation-aided deep Table detection method that guarantees satisfactory performance in a specific target domain where few trusted labels are available. To this end, we newly design lexical features and an augmented model used for re-training. More specifically, after pre-training one of state-of-the-art vision-based models as our backbone network, we re-train our augmented model, consisting of the vision-based model and the multilayer perceptron (MLP) architecture. Using new confidence scores acquired based on the trained MLP architecture as well as an initial prediction of bounding boxes and their confidence scores, we calculate each confidence score more accurately. To validate the superiority of DATa, we perform experimental evaluations by adopting a real-world benchmark dataset in a source domain and another dataset in our target domain consisting of materials science articles. Experimental results demonstrate that the proposed DATa method substantially outperforms competing methods that only utilize visual representations in the target domain. Such gains are possible owing to the capability of eliminating high false positives or false negatives according to the setting of a confidence score threshold.
Build a Movie Recommendation Engine frontend using Vue.js (Part 4)
This is the final part of our 4-part series! In the previous 3, we covered the theory of collaborative filtering, how to build a Flask API, and how to deploy the API on AWS ECS. In this post, we'll build a simple Vue.js frontend that aims to simplify the movie recommendations as much as possible. Hence, we'll only ask the user to enter their favourite movie and recommend movies that are similar to it. Key functionalities of this project are the auto-search function for finding movie titles from MovieLens dataset and leveraging open-source scrapped movie posters to display recommended movies by the backend API.
Semiconductor Engineering .:. Big Data Meets Chip Design
The amount of data being handled in chip design is growing significantly at each new node, prompting chipmakers to begin using some of the same concepts, technologies and algorithms used in data centers at companies such as Google, Facebook and GE. While the total data sizes in chip design are still relatively small compared with cloud operations--terabytes per year versus petabytes and exabytes--it's too much to sort through using existing equipment and approaches. "You can take many big data approaches to handle this, but there may be a business problem if you do," said Leon Stok, vice president of EDA at IBM. He said EDA doesn't have the kind of concentrated volume necessary to drive these kinds of techniques, and typically that problem is made worse because the data is often different between design and manufacturing. But for those working on designs, the amount has grown significantly at a time when extracting key data in various parts of the design flow is crucial.