rap
A Adaptive Measurements
(Definition 1). In appendix D.4, we show that using this marginal trick significantly improves the performance of A.3 MWEM update Given the loss function: L The x-axis uses a logarithmic scale. We leave further investigation to future work. In this section we derive the update rule in algorithm 4. Recall that the ultimate goal is to solve In this section we assume that γ = 0 . We present hyperparameters used for methods across all experiments in Tables 1, 2, 3, 4, and 5. To limit the runtime of In Figures 5, 6, and 7, we present the same results for the same experiments described in Section 7.1 (Figures 1 and 2), adding plots for mean error and root mean squared error (RMSE).
- North America > United States > Ohio (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California (0.04)
Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation
Deep neural networks (DNNs) have been shown to be vulnerable to adversarial examples, which can produce erroneous predictions by injecting imperceptible perturbations. In this work, we study the transferability of adversarial examples, which is significant due to its threat to real-world applications where model architecture or parameters are usually unknown. Many existing works reveal that the adversarial examples are likely to overfit the surrogate model that they are generated from, limiting its transfer attack performance against different target models. To mitigate the overfitting of the surrogate model, we propose a novel attack method, dubbed reverse adversarial perturbation (RAP). Specifically, instead of minimizing the loss of a single adversarial point, we advocate seeking adversarial example located at a region with unified low loss value, by injecting the worst-case perturbation (the reverse adversarial perturbation) for each step of the optimization procedure.
- Information Technology > Security & Privacy (0.45)
- Government > Military (0.45)
Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search
Test-time scaling improves large language models (LLMs) on long-horizon reasoning tasks by allocating more compute at inference. LLM Inference via Tree Search (LITS) methods achieve strong performance but are highly inefficient, often running an order of magnitude slower than iterative approaches. We propose Chain-in-Tree (CiT), a plug-in framework that decides when to branch during search rather than expanding at every step. CiT introduces lightweight Branching Necessity (BN) evaluations: BN-DP (Direct Prompting), where an auxiliary LLM judges branching needs, and BN-SC (Self-Consistency), which clusters candidate actions to assess agreement. Integrated into Tree of Thoughts, ReST-MCTS, and RAP, BN-DP achieves 75-85% reductions in token generation, model calls, and runtime on GSM8K and Math500, with often negligible or no accuracy loss. BN-SC typically yields substantial savings (up to 80%) generally but shows instability in 1-4 out of 14 settings, caused by a small subset of examples that produce extremely long reasoning steps. We theoretically prove that BN-DP never increases policy invocations and release both modular LITS implementations and a lightweight CiT function applicable across all LITS variants. The full codebase is publicly available at https://github.com/xinzhel/chain_in_tree.
- North America > United States (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Singapore (0.04)
A Adaptive Measurements
(Definition 1). In appendix D.4, we show that using this marginal trick significantly improves the performance of A.3 MWEM update Given the loss function: L The x-axis uses a logarithmic scale. We leave further investigation to future work. In this section we derive the update rule in algorithm 4. Recall that the ultimate goal is to solve In this section we assume that γ = 0 . We present hyperparameters used for methods across all experiments in Tables 1, 2, 3, 4, and 5. To limit the runtime of In Figures 5, 6, and 7, we present the same results for the same experiments described in Section 7.1 (Figures 1 and 2), adding plots for mean error and root mean squared error (RMSE).
- North America > United States > Ohio (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > California (0.04)
Runtime Adaptive Pruning for LLM Inference
Liu, Huanrong, Tian, Chunlin, Wei, Xuyang, Li, Qingbiao, Li, Li
Large language models (LLMs) excel at language understanding and generation, but their enormous computational and memory requirements hinder deployment. Compression offers a potential solution to mitigate these constraints. However, most existing methods rely on fixed heuristics and thus fail to adapt to runtime memory variations or heterogeneous KV-cache demands arising from diverse user requests. To address these limitations, we propose RAP, an elastic pruning framework driven by reinforcement learning (RL) that dynamically adjusts compression strategies in a runtime-aware manner. Specifically, RAP dynamically tracks the evolving ratio between model parameters and KV-cache across practical execution. Recognizing that FFNs house most parameters, whereas parameter -light attention layers dominate KV-cache formation, the RL agent retains only those components that maximize utility within the current memory budget, conditioned on instantaneous workload and device state. Extensive experiments results demonstrate that RAP outperforms state-of-the-art baselines, marking the first time to jointly consider model weights and KV-cache on the fly.
- North America > United States (0.14)
- Asia > Macao (0.04)
- Asia > China (0.04)
A Additional Experimental Results
Robot action primitives are agnostic to the exact geometry of the underlying robot, provided the robot is a manipulator arm. As noted in the related works section, Dynamic Motion Primitives (DMP) are an alternative skill formulation that is common robotics literature. Each primitive ran 200 low-level actions with a path length of five high level actions, while the low-level path length was 500. With raw actions, each episode took 16.49 We run an ablation to measure how often RAPS uses each primitive.