Plotting

Sanity-Checking Pruning Methods: Random Tickets can Win the Jackpot Yihang Chen 2, Tianhao Wu2

Neural Information Processing Systems

Network pruning is a method for reducing test-time computational resource requirements with minimal performance degradation. Conventional wisdom of pruning algorithms suggests that: (1) Pruning methods exploit information from training data to find good subnetworks; (2) The architecture of the pruned network is crucial for good performance. In this paper, we conduct sanity checks for the above beliefs on several recent unstructured pruning methods and surprisingly find that: (1) A set of methods which aims to find good subnetworks of the randomly-initialized network (which we call "initial tickets"), hardly exploits any information from the training data; (2) For the pruned networks obtained by these methods, randomly changing the preserved weights in each layer, while keeping the total number of preserved weights unchanged per layer, does not affect the final performance. These findings inspire us to choose a series of simple data-independent prune ratios for each layer, and randomly prune each layer accordingly to get a subnetwork (which we call "random tickets"). Experimental results show that our zero-shot random tickets outperform or attain a similar performance compared to existing "initial tickets". In addition, we identify one existing pruning method that passes our sanity checks. We hybridize the ratios in our random ticket with this method and propose a new method called "hybrid tickets", which achieves further improvement.


eae27d77ca20db309e056e3d2dcd7d69-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for taking their time reading the paper and providing us with insightful comments and suggestions! To R1: Thank you for appreciating our work! Regarding "results from a fixed learning rate": There may be some misunderstandings. Here are our responses to your questions. We will add the discussions in the next version.



A Experimental Details Datasets

Neural Information Processing Systems

For the standard EBM, we train on 300,000 simulated QCD jets. For the hybrid model EBM-CLF, we train on 300,000 simulated Standard Model jets (100,000 QCD jets, 100,000 boosted jets originating from the W boson, and 100,000 boosted jets originating from the top quark). For OOD detection test sets, we employ the hypothetical Higgs boson (in the decay mode of H hh (b b)(b b)) with a mass of 174 GeV, which decays into two lighter Higgs bosons of 80 GeV. All the jet samples are generated with a pipeline of physics simulators. QCD jets are extracted from QCD di-jet events that are generated with MadGraph [4] for LHC 13 TeV, followed by Pythia8 [61] and Delphes [18] for parton shower and fast detector simulation.



A Appendices

Neural Information Processing Systems

A.1 Guarantees on the decrease of the training loss As the scores are updated, the relative order of the importances is likely shuffled, and some connections will be replaced by more important ones. Under certain conditions, we are able to formally prove that as these replacements happen, the training loss is guaranteed to decrease. Our proof is adapted from [Ramanujan et al., 2020] to consider the case of fine-tuable W. We suppose that (a) the training loss L is smooth and admits a first-order Taylor development everywhere it is defined and (b) the learning rate of W (ฮฑ We first consider the case where k = 1 in the TopK masking, meaning that only one connection is remaining (and the other weights are deactivated/masked). The first term is null because of inequalities (6) and the second term is negative because of inequality (7). We note that this proof is not specific to the TopK masking function.




Instruction Tuning Large Language Models to Understand Electronic Health Records

Neural Information Processing Systems

Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to (1) the lack of large-scale instruction-following datasets and (2) the limitations of existing model architectures in handling complex and heterogeneous EHR data. In this paper, we introduce MIMIC-Instr, a dataset comprising over 400K open-ended instruction-following examples derived from the MIMIC-IV EHR database. This dataset covers various topics and is suitable for instructiontuning general-purpose LLMs for diverse clinical use cases. Additionally, we propose Llemr, a general framework that enables LLMs to process and interpret EHRs with complex data structures. Llemr demonstrates competitive performance in answering a wide range of patient-related questions based on EHR data. Furthermore, our evaluations on clinical predictive modeling benchmarks reveal that the fine-tuned Llemr achieves performance comparable to state-of-the-art (SOTA) baselines using curated features. The dataset and code are available at https://github.com/zzachw/llemr.


Instruction Tuning Large Language Models to Understand Electronic Health Records

Neural Information Processing Systems

Large language models (LLMs) have shown impressive capabilities in solving a wide range of tasks based on human instructions. However, developing a conversational AI assistant for electronic health record (EHR) data remains challenging due to (1) the lack of large-scale instruction-following datasets and (2) the limitations of existing model architectures in handling complex and heterogeneous EHR data. In this paper, we introduce MIMIC-Instr, a dataset comprising over 400K open-ended instruction-following examples derived from the MIMIC-IV EHR database. This dataset covers various topics and is suitable for instructiontuning general-purpose LLMs for diverse clinical use cases. Additionally, we propose Llemr, a general framework that enables LLMs to process and interpret EHRs with complex data structures. Llemr demonstrates competitive performance in answering a wide range of patient-related questions based on EHR data. Furthermore, our evaluations on clinical predictive modeling benchmarks reveal that the fine-tuned Llemr achieves performance comparable to state-of-the-art (SOTA) baselines using curated features. The dataset and code are available at https://github.com/zzachw/llemr.