argmax
Enhancing Knowledge Transfer for Task Incremental Learning with Data-free Subnetwork Qiang Gao
DSN primarily seeks to transfer knowledge to the new coming task from the learned tasks by selecting the affiliated weights of a small set of neurons to be activated, including the reused neurons from prior tasks via neuron-wise masks. And it also transfers possibly valuable knowledge to the earlier tasks via data-free replay.
- Asia > China > Sichuan Province > Chengdu (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Leisure & Entertainment (0.47)
- Information Technology > Security & Privacy (0.46)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Sweden > Stockholm > Stockholm (0.04)
- Europe > Portugal > Porto > Porto (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science > Data Mining (0.92)
Building a stable classifier with the inflated argmax
We propose a new framework for algorithmic stability in the context of multiclass classification. In practice, classification algorithms often operate by first assigning a continuous score (for instance, an estimated probability) to each possible label, then taking the maximizer---i.e., selecting the class that has the highest score. A drawback of this type of approach is that it is inherently unstable, meaning that it is very sensitive to slight perturbations of the training data, since taking the maximizer is discontinuous. Motivated by this challenge, we propose a pipeline for constructing stable classifiers from data, using bagging (i.e., resampling and averaging) to produce stable continuous scores, and then using a stable relaxation of argmax, which we call the inflated argmax, to convert these scores to a set of candidate labels. The resulting stability guarantee places no distributional assumptions on the data, does not depend on the number of classes or dimensionality of the covariates, and holds for any base classifier. Using a common benchmark data set, we demonstrate that the inflated argmax provides necessary protection against unstable classifiers, without loss of accuracy.
Building a stable classifier with the inflated argmax
A drawback of this type of approach is that it is inherently unstable, meaning that it is very sensitive to slight perturbations of the training data, since taking the maximizer is discontinuous. Motivated by this challenge, we propose a pipeline for constructing stable classifiers from data, using bagging (i.e., resampling and
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Data Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Efficient Decoding Methods for Language Models on Encrypted Data
Avitan, Matan, Baruch, Moran, Drucker, Nir, Zimerman, Itamar, Goldberg, Yoav
Large language models (LLMs) power modern AI applications, but processing sensitive data on untrusted servers raises privacy concerns. Homomorphic encryption (HE) enables computation on encrypted data for secure inference. However, neural text generation requires decoding methods like argmax and sampling, which are non-polynomial and thus computationally expensive under encryption, creating a significant performance bottleneck. We introduce cutmax, an HE-friendly argmax algorithm that reduces ciphertext operations compared to prior methods, enabling practical greedy decoding under encryption. We also propose the first HE-compatible nucleus (top-p) sampling method, leveraging cutmax for efficient stochastic decoding with provable privacy guarantees. Both techniques are polynomial, supporting efficient inference in privacy-preserving settings. Moreover, their differentiability facilitates gradient-based sequence-level optimization as a polynomial alternative to straight-through estimators. We further provide strong theoretical guarantees for cutmax, proving its convergence via exponential amplification of the gap ratio between the maximum and runner-up elements. Evaluations on realistic LLM outputs show latency reductions of 24x-35x over baselines, advancing secure text generation.
- Europe > Austria > Vienna (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts (0.04)
- (3 more...)
Checklist 1. For all authors (a)
Do the main claims made in the abstract and introduction accurately reflect the paper's If you ran experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [Y es] (b) Did you specify all the training details (e.g., data splits, hyperparameters, how they Did you report error bars (e.g., with respect to the random seed after running experiments multiple times)? Did you include the total amount of compute and the type of resources used (e.g., type Did you include any new assets either in the supplemental material or as a URL? [Y es] Did you discuss whether and how consent was obtained from people whose data you're If you used crowdsourcing or conducted research with human subjects... (a) Hyper-parameter V alues learning rate 0.0005, 0.0001 batch size 16, 32 " annealing period 20000, 10000 RNN hidden dimension 64, 32, 16 Table 2: Hyper-parameters of QMIX in the Tiger-Trampoline Experiment In Section 5.1, we show the results of MAPPO and QMIX on the Tiger-Trampoline game. In the Hanabi experiments, we implement IMPROVISED as follows (better viewed together with the pseudocode). Player 1 and player 2 do not share the random seed beforehand. We do not anticipate any immediate negative impact from this work.
- Europe > Italy > Lombardy > Milan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Singapore > Central Region > Singapore (0.04)
sample space), when extending them to
We thank all reviewers for their detailed constructive feedback and suggestions. Table B (below) demonstrates this empirically. Gumbel-Softmax has) with significantly less training time and resource consumption. These experiments show that when trained with Gumbel-CRF, the AR decoder outperforms REINFORCE. We will clarify this in the paper.
A List of Notations Table 1: Notations and their meanings Notation Meaning C = { C
Based on Minkowski's inequality for sums [2] with order 2: null null null null Using Eq. 1 and 3, Eq. 4 can be proved. Using Eq. 3, 10, and 2, we have the following distance (o Similar to proof in C, Theorem 4 can be proved. Using Eq. 11 and 3, Eq. 4 can be proved. So we prove the left inequality. The above proof shows that a better accuracy of an ensemble can be achieved by combining components with accuracy that is at least equal to the average accuracy of individual components (i.e.