distiller
Detecting and Mitigating Treatment Leakage in Text-Based Causal Inference: Distillation and Sensitivity Analysis
Daoud, Adel, Johansson, Richard, Jerzak, Connor T.
Text-based causal inference increasingly employs textual data as proxies for unobserved confounders, yet this approach introduces a previously undertheorized source of bias: treatment leakage. Treatment leakage occurs when text intended to capture confounding information also contains signals predictive of treatment status, thereby inducing post-treatment bias in causal estimates. Critically, this problem can arise even when documents precede treatment assignment, as authors may employ future-referencing language that anticipates subsequent interventions. Despite growing recognition of this issue, no systematic methods exist for identifying and mitigating treatment leakage in text-as-confounder applications. This paper addresses this gap through three contributions. First, we provide formal statistical and set-theoretic definitions of treatment leakage that clarify when and why bias occurs. Second, we propose four text distillation methods -- similarity-based passage removal, distant supervision classification, salient feature removal, and iterative nullspace projection -- designed to eliminate treatment-predictive content while preserving confounder information. Third, we validate these methods through simulations using synthetic text and an empirical application examining International Monetary Fund structural adjustment programs and child mortality. Our findings indicate that moderate distillation optimally balances bias reduction against confounder retention, whereas overly stringent approaches degrade estimate precision.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > India (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (9 more...)
- Government (1.00)
- Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (0.49)
- Health & Medicine > Therapeutic Area > Immunology (0.46)
- Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)
KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures.
KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
Knowledge distillation (KD) has emerged as an effective technique for compressing models that can enhance the lightweight model. Conventional KD methods propose various designs to allow student model to imitate the teacher better. However, these handcrafted KD designs heavily rely on expert knowledge and may be sub-optimal for various teacher-student pairs. In this paper, we present a novel framework, KD-Zero, which utilizes evolutionary search to automatically discover promising distiller from scratch for any teacher-student architectures. Then, we construct our distiller search space by selecting advanced operations for these three components.
WIDER & CLOSER: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition
Ma, Jun-Yu, Chen, Beiduo, Gu, Jia-Chen, Ling, Zhen-Hua, Guo, Wu, Liu, Quan, Chen, Zhigang, Liu, Cong
Zero-shot cross-lingual named entity recognition (NER) aims at transferring knowledge from annotated and rich-resource data in source languages to unlabeled and lean-resource data in target languages. Existing mainstream methods based on the teacher-student distillation framework ignore the rich and complementary information lying in the intermediate layers of pre-trained language models, and domain-invariant information is easily lost during transfer. In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently. Concretely, a multi-channel distillation framework is designed for sufficient information transfer by aggregating multiple distillers as a mixture. Besides, an unsupervised method adopting parallel domain adaptation is proposed to shorten the channels between the teacher and student models to preserve domain-invariant features. Experiments on four datasets across nine languages demonstrate that the proposed method achieves new state-of-the-art performance on zero-shot cross-lingual NER and shows great generalization and compatibility across languages and fields.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Hong Kong (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (15 more...)
- Research Report > New Finding (0.66)
- Research Report > Experimental Study (0.46)
10 Parallels Between Whiskey Tasting and Artificial Intelligence
In today's world, the power of artificial intelligence is everywhere. From agriculture to healthcare, from shopping to dating, from the vehicles we drive to the way we do business, our experiences are increasingly shaped by AI. This is true even when it comes to whiskey tasting, although in this case the intelligence is driven by our senses and our reasoning rather than sophisticated algorithms. This is a topic that is close to my heart, given that I'm a director of AI, data analytics and high performance computing sales who moonlights as a whiskey sommelier. I often have occasion to reflect on the amazing parallels between the principles of AI and the process of tasting whisky.
- North America > United States > Pennsylvania (0.05)
- Europe > United Kingdom > Scotland (0.05)
- Europe > United Kingdom > England (0.05)
airaria/TextBrewer
TextBrewer is a PyTorch-based toolkit for distillation of NLP models. It includes various distilltion techniques from both NLP and CV, and provides an easy-to-use distillation framkework, which allows users to quickly experiment with state-of-the-art distillation methods to compress the model with a relatively small sacrifice in performance, increase the inference speed and reduce the memory usage. Textbrewer is designed for the knowledge distillation of NLP models. It provides various distillation methods and offers a distillation framework for quickly setting up experiments. TextBrewer has achieved impressive results on several typical NLP tasks.
AI designed this gin, but would you drink it?
Two years ago, I owned a bar with over 300 different gins -- one of the largest collections of juniper-flavored spirits at any establishment in the United States. None of those gins was designed by a computer, a fact that my bartenders would likely have explained was for the best: A gin's non-juniper botanicals are what make it distinctive, and the most popular recipes have traditionally come from experienced distillers. But now that we're in the AI-as-possible-gourmand era, what if a trained AI system took over the process of formulating, naming, labeling, and even marketing a new type of gin? Could artificial intelligence -- aided somewhat by humans -- create a viable product? Somewhat surprisingly, the answer is "yes." This weekend, Bristol, UK-based Circumstance Distillery and creative technologists Tiny Giant debuted Monker's Garkel as "the world's first gin created by artificial intelligence," and though I was skeptical about AI's actual role in the project, machine learning had a greater influence in the outcome than might be imagined.
- North America > United States (0.25)
- Europe > United Kingdom > England > Bristol (0.25)
Neural Network Distiller: A Python Package For DNN Compression Research
Zmora, Neta, Jacob, Guy, Zlotnik, Lev, Elharar, Bar, Novik, Gal
This paper presents the philosophy, design and feature-set of Neural Network Distiller, an open-source Python package for DNN compression research. Distiller is a library of DNN compression algorithms implementations, with tools, tutorials and sample applications for various learning tasks. Its target users are both engineers and researchers, and the rich content is complemented by a design-for-extensibility to facilitate new research. Distiller is open-source and is available on Github at https://github.com/NervanaSystems/distiller.