allocation strategy
VLM-SFD: VLM-Assisted Siamese Flow Diffusion Framework for Dual-Arm Cooperative Manipulation
Chen, Jiaming, Jiang, Yiyu, Huang, Aoshen, Li, Yang, Pan, Wei
Dual-arm cooperative manipulation holds great promise for tackling complex real-world tasks that demand seamless coordination and adaptive dynamics. Despite substantial progress in learning-based motion planning, most approaches struggle to generalize across diverse manipulation tasks and adapt to dynamic, unstructured environments, particularly in scenarios involving interactions between two objects such as assembly, tool use, and bimanual grasping. To address these challenges, we introduce a novel VLM-Assisted Siamese Flow Diffusion (VLM-SFD) framework for efficient imitation learning in dual-arm cooperative manipulation. The proposed VLM-SFD framework exhibits outstanding adaptability, significantly enhancing the ability to rapidly adapt and generalize to diverse real-world tasks from only a minimal number of human demonstrations. Specifically, we propose a Siamese Flow Diffusion Network (SFDNet) employs a dual-encoder-decoder Siamese architecture to embed two target objects into a shared latent space, while a diffusion-based conditioning process - conditioned by task instructions - generates two-stream object-centric motion flows that guide dual-arm coordination. We further design a dynamic task assignment strategy that seamlessly maps the predicted 2D motion flows into 3D space and incorporates a pre-trained vision-language model (VLM) to adaptively assign the optimal motion to each robotic arm over time. Experiments validate the effectiveness of the proposed method, demonstrating its ability to generalize to diverse manipulation tasks while maintaining high efficiency and adaptability. The code and demo videos are publicly available on our project website https://sites.google.com/view/vlm-sfd/.
- Europe > United Kingdom (0.14)
- Asia > China (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Non-Uniform Class-Wise Coreset Selection for Vision Model Fine-tuning
Zhang, Hanyu, Xing, Zhen, He, Ruian, Yang, Wenxuan, Ma, Chenxi, Tan, Weimin, Yan, Bo
Coreset selection aims to identify a small yet highly informative subset of data, thereby enabling more efficient model training while reducing storage overhead. Recently, this capability has been leveraged to tackle the challenges of fine-tuning large foundation models, offering a direct pathway to their efficient and practical deployment. However, most existing methods are class-agnostic, causing them to overlook significant difficulty variations among classes. This leads them to disproportionately prune samples from either overly easy or hard classes, resulting in a suboptimal allocation of the data budget that ultimately degrades the final coreset performance. T o address this limitation, we propose Non-Uniform Class-Wise Coreset Selection (NUCS), a novel framework that both integrates class-level and sample-level difficulty. W e propose a robust metric for global class difficulty, quantified as the winsorized average of per-sample difficulty scores. Guided by this metric, our method performs a theoretically-grounded, nonuniform allocation of data selection budgets inter-class, while adaptively selecting samples intra-class with optimal difficulty ranges. Extensive experiments on a wide range of visual classification tasks demonstrate that NUCS consistently outperforms state-of-the-art methods across 10 diverse datasets and pre-trained models, achieving both superior accuracy and computational efficiency, highlighting the promise of non-uniform class-wise selection strategy for advancing the efficient fine-tuning of large foundation models.
- Asia > China > Shanghai > Shanghai (0.40)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Key Decision-Makers in Multi-Agent Debates: Who Holds the Power?
Zhang, Qian, Zheng, Yan, Liu, Jinyi, Liang, Hebin, Wang, Lanjun
Recent studies on LLM agent scaling have highlighted the potential of Multi-Agent Debate (MAD) to enhance reasoning abilities. However, the critical aspect of role allocation strategies remains underexplored. In this study, we demonstrate that allocating roles with differing viewpoints to specific positions significantly impacts MAD's performance in reasoning tasks. Specifically, we find a novel role allocation strategy, "Truth Last", which can improve MAD performance by up to 22% in reasoning tasks. To address the issue of unknown truth in practical applications, we propose the Multi-Agent Debate Consistency (MADC) strategy, which systematically simulates and optimizes its core mechanisms. MADC incorporates path consistency to assess agreement among independent roles, simulating the role with the highest consistency score as the truth. We validated MADC across a range of LLMs (9 models), including the DeepSeek-R1 Distilled Models, on challenging reasoning tasks. MADC consistently demonstrated advanced performance, effectively overcoming MAD's performance bottlenecks and providing a crucial pathway for further improvements in LLM agent scaling.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)
SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size
Xia, Junhao, Zhao, Ming, Xiao, Limin, Zhang, Xiujun
Large language models (LLMs) face significant computational and memory challenges, making extremely low-bit quantization crucial for their efficient deployment. In this work, we introduce SDQ-LLM: Sigma-Delta Quantization for 1-bit LLMs of any size, a novel framework that enables extremely low-bit quantization of LLMs while preserving their linguistic reasoning capabilities. A distinctive feature of SDQ-LLM is the continuous adjustability of the Over-Sampling Ratio (OSR), enabling dynamic adaptation to memory or VRAM constraints by selecting fractional OSR (e.g. 2.5 times) for an optimal trade-off between model size and accuracy. SDQ-LLM uses upsampling combined with Sigma-Delta Quantizer to binarize or ternarize LLMs weights, encoding high-precision parameters into 1-bit or 1.58-bit representations, replacing the multiplication operations within linear layers with addition. This approach significantly enhances inference efficiency under extremely low-bit quantization. To further reduce the loss of quantization precision, we incorporate Hadamard-based weight smoothing prior to quantization, improving the stability and robustness of the weight representations. Furthermore, to fully leverage the continuity of the OSR and reduce precision loss, recognizing the correlation between quantization sensitivity and weight variance, we propose a fine-grained, layer- and linear-wise OSR allocation strategy, MultiOSR. This strategy distributes OSR both across layers and within each layer, based on weight variance and parameter scale. Finally, extensive experiments on OPT and LLaMA model families demonstrate that SDQ-LLM achieves a more efficient and high-precision performance even under highly aggressive low-OSR settings. Our code is available at https://github.com/Dreamlittlecat/LLM-Quant-Factory.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. In this problem there is a hidden unknown parameter \theta*\in R^d and a finite set of arms X\subseteq R^d. When an arm is pulled you observe a reward x^T\theta*+epsilon where epsilon is a zero mean i.i.d noise with bounded range. The goal is to identify argmax_{x\in X} x^T\theta* with the least number of samples. Results: The paper characterizes the sample complexity of static and dynamic allocation strategies to identify the best arm.
Learning-Based Resource Management in Integrated Sensing and Communication Systems
Lu, Ziyang, Gursoy, M. Cenk, Mohan, Chilukuri K., Varshney, Pramod K.
-- In this paper, we tackle the task of adaptive time allocation in integrated sensing and communication systems equipped with radar and communication units. The dual-functional radar-communication system's task involves allocating dwell times for tracking multiple targets and utilizing the remaining time for data transmission towards estimated target locations. We introduce a novel constrained deep reinforcement learning (CDRL) approach, designed to optimize resource allocation between tracking and communication under time budget constraints, thereby enhancing target communication quality. Our numerical results demonstrate the efficiency of our proposed CDRL framework, confirming its ability to maximize communication quality in highly dynamic environments while adhering to time constraints. A. Background 1) Cognitive Radar: Radar technology, integral to various applications in environmental sensing, space exploration, navigation, and traffic control, has become increasingly important with the emergence of autonomous vehicles and drones.
- Information Technology > Communications > Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.89)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)
ROS 2 Agnocast: Supporting Unsized Message Types for True Zero-Copy Publish/Subscribe IPC
Ishikawa-Aso, Takahiro, Kato, Shinpei
Robot applications, comprising independent components that mutually publish/subscribe messages, are built on inter-process communication (IPC) middleware such as Robot Operating System 2 (ROS 2). In large-scale ROS 2 systems like autonomous driving platforms, true zero-copy communication -- eliminating serialization and deserialization -- is crucial for efficiency and real-time performance. However, existing true zero-copy middleware solutions lack widespread adoption as they fail to meet three essential requirements: 1) Support for all ROS 2 message types including unsized ones; 2) Minimal modifications to existing application code; 3) Selective implementation of zero-copy communication between specific nodes while maintaining conventional communication mechanisms for other inter-node communications including inter-host node communications. This first requirement is critical, as production-grade ROS 2 projects like Autoware rely heavily on unsized message types throughout their codebase to handle diverse use cases (e.g., various sensors), and depend on the broader ROS 2 ecosystem, where unsized message types are pervasive in libraries. The remaining requirements facilitate seamless integration with existing projects. While IceOryx middleware, a practical true zero-copy solution, meets all but the first requirement, other studies achieving the first requirement fail to satisfy the remaining criteria. This paper presents Agnocast, a true zero-copy IPC framework applicable to ROS 2 C++ on Linux that fulfills all these requirements. Our evaluation demonstrates that Agnocast maintains constant IPC overhead regardless of message size, even for unsized message types. In Autoware PointCloud Preprocessing, Agnocast achieves a 16% improvement in average response time and a 25% improvement in worst-case response time.
- Information Technology > Software (1.00)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.35)
Optimizing Latent Dimension Allocation in Hierarchical VAEs: Balancing Attenuation and Information Retention for OOD Detection
Williamson, Dane, Ji, Yangfeng, Dwyer, Matthew
Out-of-distribution (OOD) detection is a critical task in machine learning, particularly for safety-critical applications where unexpected inputs must be reliably flagged. While hierarchical variational autoencoders (HVAEs) offer improved representational capacity over traditional VAEs, their performance is highly sensitive to how latent dimensions are distributed across layers. Existing approaches often allocate latent capacity arbitrarily, leading to ineffective representations or posterior collapse. In this work, we introduce a theoretically grounded framework for optimizing latent dimension allocation in HVAEs, drawing on principles from information theory to formalize the trade-off between information loss and representational attenuation. We prove the existence of an optimal allocation ratio $r^{\ast}$ under a fixed latent budget, and empirically show that tuning this ratio consistently improves OOD detection performance across datasets and architectures. Our approach outperforms baseline HVAE configurations and provides practical guidance for principled latent structure design, leading to more robust OOD detection with deep generative models.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Optimized Cloud Resource Allocation Using Genetic Algorithms for Energy Efficiency and QoS Assurance
Panggabean, Caroline, C, Devaraj Verma, Gogoi, Bhagyashree, Limbu, Ranju, Sarker, Rhythm
Caroline Panggabean Department of CSE (AI) JAIN (Deemed - to - be University) Bangalore, Karnataka carolinepgabean@gmail.com ORCID: https://orcid.org/0009 - 0004 - 9964 - 7986 Ranju Limbu Department of CSE (AIM) JAIN (Deemed - to - be University) Bangalore, Karnataka 21btlca002 @jainuniversity.ac.in Dr. Devaraj Verma C Department of CSE (AI) JAIN (Deemed - to - be University) Bangalore, Karnataka c.devaraj@jainuniversity.ac.in ORCID: https://orcid.org/0000 - 0002 - 1504 - 4263 Rhythm Sarker Department of CSE (AIML) JAIN (Deemed - to - be University) Bangalore, Karnataka 21btrca065 @jainuniversity.ac.in Bhagyashree Gogoi Department of CSE (AI) JAIN (Deemed - to - be University) Bangalore, Karnataka 21btlca001 @ jainuniver s ity.ac.in Abstract -- Cloud computing environments demand dynamic and efficient resource management to ensure optimal performance, reduced energy consumption, and adherence to Service Level Agreements (SLAs). This paper presents a Genetic Algorithm (GA) - based approach for Virtual Machine (VM) placement and consol idation, aiming to minimize power usage while maintaining QoS constraints. The proposed method dynamically adjusts VM allocation based on real - time workload variations, outperforming traditional heuristics such as First Fit Decreasing (FFD) and Best Fit De creasing (BFD). Experimental results show notable reductions in energy consumption, VM migrations, SLA violation rates, and execution time.
- Energy (1.00)
- Information Technology > Services (0.95)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
CAKE: Cascading and Adaptive KV Cache Eviction with Layer Preferences
Qin, Ziran, Cao, Yuchen, Lin, Mingbao, Hu, Wen, Fan, Shixuan, Cheng, Ke, Lin, Weiyao, Li, Jianguo
Large language models (LLMs) excel at processing long sequences, boosting demand for key-value (KV) caching. While recent efforts to evict KV cache have alleviated the inference burden, they often fail to allocate resources rationally across layers with different attention patterns. In this paper, we introduce Cascading and Adaptive KV cache Eviction (CAKE), a novel approach that frames KV cache eviction as a "cake-slicing problem." CAKE assesses layer-specific preferences by considering attention dynamics in both spatial and temporal dimensions, allocates rational cache size for layers accordingly, and manages memory constraints in a cascading manner. This approach enables a global view of cache allocation, adaptively distributing resources across diverse attention mechanisms while maintaining memory budgets. CAKE also employs a new eviction indicator that considers the shifting importance of tokens over time, addressing limitations in existing methods that overlook temporal dynamics. Comprehensive experiments on LongBench and NeedleBench show that CAKE maintains model performance with only 3.2% of the KV cache and consistently outperforms current baselines across various models and memory constraints, particularly in low-memory settings. Additionally, CAKE achieves over 10 speedup in decoding latency compared to full cache when processing contexts of 128K tokens with FlashAttention-2. New models such as GPT-4 (Achiam et al., 2023), Claude 3.5 (Anthropic, 2024), LLaMA 3.1 (Dubey et al., 2024) and Mistral Large 2 (AI, 2024) have extended token processing capacities beyond 128K. Shazeer (2019); Ainslie et al. (2023) partially address this issue by merging key-value heads during the training phase. However, optimizing key-value cache without additional training is crucial for efficient inference of long contexts under memory constraints, particularly in typical deployment scenarios where the model structure is fixed. One way to maintain a manageable KV cache size on the fly is to remove some KV pairs (Xiao et al., 2023; Zhang et al., 2024b; Li et al., 2024b). The idea is to eliminate less important KV pairs based on certain rules. Although recent methods have enhanced pair selection for removal, they typically assign uniform cache sizes across layers, disregarding layer-specific requirements.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Research Report > Promising Solution (0.48)
- Research Report > New Finding (0.46)