Goto

Collaborating Authors

 Ghosal, Angikar


dame-flame: A Python Library Providing Fast Interpretable Matching for Causal Inference

arXiv.org Artificial Intelligence

The dame-flame Python package is the first major implementation of two algorithms, the dynamic almost matching exactly (DAME) algorithm (Dieng, Liu, Roy, Rudin, and Volfovsky 2019, published in AISTATS'19), and the fast, large-scale almost matching exactly (FLAME) algorithm (Wang, Morucci, Awan, Liu, Roy, Rudin, and Volfovsky 2019, published in JMLR'21), which provide almost exact matching of treatment and control units in discrete observational data for causal analysis. As discussed in Dieng et al. (2019), and Wang et al. (2019), the two algorithms produce high-quality interpretable matched groups, by using machine learning on a holdout training set to learn distance metrics. DAME solves an optimization problem that matches units on as many covariates as possible, prioritizing matches on important covariates. FLAME approximates the solution found by DAME via a much faster backward feature selection procedure. The DAME and FLAME algorithms are discussed in the remainder of this section. We also provide testing and installation details. In Section 2, we discuss the class structure in the dame-flame package, detail special features of dame-flame, and compare dame-flame to other matching packages. In Section 3, we offer examples and a user guide.


Multitask Learning for Citation Purpose Classification

arXiv.org Machine Learning

We present our entry into the 2021 3C Shared Task Citation Context Classification based on Purpose competition. The goal of the competition is to classify a citation in a scientific article based on its purpose. This task is important because it could potentially lead to more comprehensive ways of summarizing the purpose and uses of scientific articles, but it is also difficult, mainly due to the limited amount of available training data in which the purposes of each citation have been hand-labeled, along with the subjectivity of these labels. Our entry in the competition is a multi-task model that combines multiple modules designed to handle the problem from different perspectives, including hand-generated linguistic features, TF-IDF features, and an LSTM-with-attention model. We also provide an ablation study and feature analysis whose insights could lead to future work.