BiT: Robustly Binarized Multi-distilled Transformer

Aug-15-2025, 03:07:39 GMT–Neural Information Processing Systems

Inspired by the learnable bias proposed in ReActNet (Liu et al., 2020), we further propose elastic In contrast to Bi-Attention proposed in BiBERT (Qin et al., 2021) that removes We conduct meticulous experiments to compare these choices. The binary convolution between the weights and activations that are both binarized to {-1, 1} (i.e. The GLUE benchmark (Wang et al., 2019) includes the following datasets: MNLI Multi-Genre Natural Language Inference is an entailment classification task (Williams et al., QQP Quora Question Pairs is a paraphrase detection task. QNLI Question Natural Language Inference (Wang et al., 2019) is a binary classification task STS-B The Semantic Textual Similarity Benchmark is a sentence pair classification task. The sentence pairs are sourced from online news sources (Dolan & Brockett, 2005).

activation, distillation, robustly binarized multi-distilled transformer, (13 more...)

Neural Information Processing Systems

Aug-15-2025, 03:07:39 GMT

Conferences PDF

Add feedback

Industry:
- Media (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Text Processing (0.35)
  - Machine Learning > Neural Networks (0.30)

Duplicate Docs Excel Report

Title
BiT: RobustlyBinarizedMulti-distilledTransformer AnonymousAuthor(s) Affiliation Address email

Similar Docs Excel Report more

Title	Similarity	Source
None found