bitstream
DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format
Agwa, Shady, Shen, Yikang, Wang, Shiwei, Prodromakis, Themis
Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, and defense. Massive-scale AI models, which mimic the human brain's functionality, typically feature millions and even billions of parameters through data-intensive matrix multiplication tasks. While conventional Von-Neumann architectures struggle with the memory wall and the end of Moore's Law, these AI applications are migrating rapidly towards the edge, such as in robotics and unmanned aerial vehicles for surveillance, thereby adding more constraints to the hardware budget of AI architectures at the edge. Although in-memory computing has been proposed as a promising solution for the memory wall, both analog and digital in-memory computing architectures suffer from substantial degradation of the proposed benefits due to various design limitations. We propose a new digital in-memory stochastic computing architecture, DISCA, utilizing a compressed version of the quasi-stochastic Bent-Pyramid data format. DISCA inherits the same computational simplicity of analog computing, while preserving the same scalability, productivity, and reliability of digital systems. Post-layout modeling results of DISCA show an energy efficiency of 3.59 TOPS/W per bit at 500 MHz using a commercial 180nm CMOS technology. Therefore, DISCA significantly improves the energy efficiency for matrix multiplication workloads by orders of magnitude if scaled and compared to its counterpart architectures.
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)
- Europe > Spain > Andalusia > Seville Province > Seville (0.04)
- Semiconductors & Electronics (0.88)
- Health & Medicine (0.74)
- Information Technology > Robotics & Automation (0.54)
Range Asymmetric Numeral Systems-Based Lightweight Intermediate Feature Compression for Split Computing of Deep Neural Networks
Sung, Mingyu, Im, Suhwan, Palakonda, Vikas, Kang, Jae-Mo
Split computing distributes deep neural network inference between resource-constrained edge devices and cloud servers but faces significant communication bottlenecks when transmitting intermediate features. To this end, in this paper, we propose a novel lightweight compression framework that leverages Range Asymmetric Numeral Systems (rANS) encoding with asymmetric integer quantization and sparse tensor representation to reduce transmission overhead dramatically. Specifically, our approach combines asymmetric integer quantization with a sparse representation technique, eliminating the need for complex probability modeling or network modifications. The key contributions include: (1) a distribution-agnostic compression pipeline that exploits inherent tensor sparsity to achieve bandwidth reduction with minimal computational overhead; (2) an approximate theoretical model that optimizes tensor reshaping dimensions to maximize compression efficiency; and (3) a GPU-accelerated implementation with sub-millisecond encoding/decoding latency. Extensive evaluations across diverse neural architectures (ResNet, VGG16, MobileNetV2, SwinT, DenseNet121, EfficientNetB0) demonstrate that the proposed framework consistently maintains near-baseline accuracy across CIFAR100 and ImageNet benchmarks. Moreover, we validated the framework's effectiveness on advanced natural language processing tasks by employing Llama2 7B and 13B on standard benchmarks such as MMLU, HellaSwag, ARC, PIQA, Winogrande, BoolQ, and OpenBookQA, demonstrating its broad applicability beyond computer vision. Furthermore, this method addresses a fundamental bottleneck in deploying sophisticated artificial intelligence systems in bandwidth-constrained environments without compromising model performance.
- North America > Canada > Ontario > Toronto (0.14)
- Asia > South Korea > Daegu > Daegu (0.04)
- Research Report (0.82)
- Overview (0.67)
- Asia > Singapore (0.14)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > China > Hong Kong (0.04)
Real-time ML-based Defense Against Malicious Payload in Reconfigurable Embedded Systems
Stahle-Smith, Rye, Karakchi, Rasha
The growing use of FPGAs in reconfigurable systems introducessecurity risks through malicious bitstreams that could cause denial-of-service (DoS), data leakage, or covert attacks. We investigated chip-level hardware malicious payload in embedded systems and proposed a supervised machine learning method to detect malicious bitstreams via static byte-level features. Our approach diverges from existing methods by analyzing bitstreams directly at the binary level, enabling real-time detection without requiring access to source code or netlists. Bitstreams were sourced from state-of-the-art (SOTA) benchmarks and re-engineered to target the Xilinx PYNQ-Z1 FPGA Development Board. Our dataset included 122 samples of benign and malicious configurations. The data were vectorized using byte frequency analysis, compressed using TSVD, and balanced using SMOTE to address class imbalance. The evaluated classifiers demonstrated that Random Forest achieved a macro F1-score of 0.97, underscoring the viability of real-time Trojan detection on resource-constrained systems. The final model was serialized and successfully deployed via PYNQ to enable integrated bitstream analysis.
- North America > United States > South Carolina > Richland County > Columbia (0.16)
- North America > United States > Missouri > St. Louis County > St. Louis (0.06)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.05)
- North America > United States > New York > New York County > New York City (0.05)
LINR-PCGC: Lossless Implicit Neural Representations for Point Cloud Geometry Compression
Huang, Wenjie, Yang, Qi, Xia, Shuting, Huang, He, Li, Zhu, Xu, Yiling
Existing AI-based point cloud compression methods struggle with dependence on specific training data distributions, which limits their real-world deployment. Implicit Neural Representation (INR) methods solve the above problem by encoding overfitted network parameters to the bitstream, resulting in more distribution-agnostic results. However, due to the limitation of encoding time and decoder size, current INR based methods only consider lossy geometry compression. In this paper, we propose the first INR based lossless point cloud geometry compression method called Lossless Implicit Neural Representations for Point Cloud Geometry Compression (LINR-PCGC). To accelerate encoding speed, we design a group of point clouds level coding framework with an effective network initialization strategy, which can reduce around 60% encoding time. A lightweight coding network based on multiscale SparseConv, consisting of scale context extraction, child node prediction, and model compression modules, is proposed to realize fast inference and compact decoder size. Experimental results show that our method consistently outperforms traditional and AI-based methods: for example, with the convergence time in the MVUB dataset, our method reduces the bitstream by approximately 21.21% compared to G-PCC TMC13v23 and 21.95% compared to SparsePCGC. Our project can be seen on https://huangwenjie2023.github.io/LINR-PCGC/.
- North America > United States > Missouri > Jackson County > Kansas City (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
Free Privacy Protection for Wireless Federated Learning: Enjoy It or Suffer from It?
Li, Weicai, Lv, Tiejun, Zhao, Xiyu, Yuan, Xin, Ni, Wei
--Inherent communication noises have the potential to preserve privacy for wireless federated learning (WFL) but have been overlooked in digital communication systems predominantly using floating-point number standards, e.g., IEEE 754, for data storage and transmission. This is due to the potentially catastrophic consequences of bit errors in floating-point numbers, e.g., on the sign or exponent bits. This paper presents a novel channel-native bit-flipping differential privacy (DP) mechanism tailored for WFL, where transmit bits are randomly flipped and communication noises are leveraged, to collectively preserve the privacy of WFL in digital communication systems. The key idea is to interpret the bit perturbation at the transmitter and bit errors caused by communication noises as a bit-flipping DP process. This is achieved by designing a new floating-point-to-fixed-point conversion method that only transmits the bits in the fraction part of model parameters, hence eliminating the need for transmitting the sign and exponent bits and preventing the catastrophic consequence of bit errors. We analyze a new metric to measure the bit-level distance of the model parameters and prove that the proposed mechanism satisfies ( λ, ϵ) -Rényi DP and does not violate the WFL convergence. Experiments validate privacy and convergence analysis of the proposed mechanism and demonstrate its superiority to the state-of-the-art Gaussian mechanisms that are channel-agnostic and add Gaussian noise for privacy protection. Privacy-preserving federated learning (FL) integrates privacy models into a distributed machine learning (ML) framework, offering provable privacy assurances [1]-[5]. Combining DP with FL permits clients to train their local models within specified privacy protection levels [11], [12]. This paper was supported in part by the National Natural Science Foundation of China under No. 62271068, and the Beijing Natural Science Foundation under Grant No. L222046.
- Asia > China > Beijing > Beijing (0.24)
- Oceania > Australia > New South Wales > Sydney (0.14)
- North America > United States > California > Santa Clara County > Stanford (0.04)
- (4 more...)
Bitstream Collisions in Neural Image Compression via Adversarial Perturbations
Madden, Jordan, Dorje, Lhamo, Li, Xiaohua
Neural image compression (NIC) has emerged as a promising alternative to classical compression techniques, offering improved compression ratios. Despite its progress towards standardization and practical deployment, there has been minimal exploration into it's robustness and security. This study reveals an unexpected vulnerability in NIC - bitstream collisions - where semantically different images produce identical compressed bitstreams. Utilizing a novel whitebox adversarial attack algorithm, this paper demonstrates that adding carefully crafted perturbations to semantically different images can cause their compressed bitstreams to collide exactly. The collision vulnerability poses a threat to the practical usability of NIC, particularly in security-critical applications. The cause of the collision is analyzed, and a simple yet effective mitigation method is presented.
- North America > United States > New York > Broome County > Binghamton (0.05)
- Asia > Middle East > Jordan (0.04)
Getting Free Bits Back from Rotational Symmetries in LLMs
He, Jiajun, Flamich, Gergely, Hernández-Lobato, José Miguel
Current methods for compressing neural network weights, such as decomposition, pruning, quantization, and channel simulation, often overlook the inherent symmetries within these networks and thus waste bits on encoding redundant information. In this paper, we propose a format based on bits-back coding for storing rotationally symmetric Transformer weights more efficiently than the usual array layout at the same floating-point precision. We evaluate our method on Large Language Models (LLMs) pruned by SliceGPT (Ashkboos et al., 2024) and achieve a 3-5% reduction in total bit usage for free across different model sizes and architectures without impacting model performance within a certain numerical precision. Modern neural networks, particularly Large Language Models (LLMs), typically contain billions of parameters. Therefore, encoding and transmitting these models efficiently is gaining widespread interest. However, these techniques ignore the fact that neural networks typically exhibit symmetries in their weight space. For example, in feedforward networks, applying a random permutation to the neurons in one layer and its inverse to the weights in the subsequent layer leaves the output unchanged. Encoding weights without accounting for these symmetries will lead to suboptimal codelength.
DeepHQ: Learned Hierarchical Quantizer for Progressive Deep Image Coding
Lee, Jooyoung, Jeong, Se Yoon, Kim, Munchurl
Unlike fixed- or variable-rate image coding, progressive image coding (PIC) aims to compress various qualities of images into a single bitstream, increasing the versatility of bitstream utilization and providing high compression efficiency compared to simulcast compression. Research on neural network (NN)-based PIC is in its early stages, mainly focusing on applying varying quantization step sizes to the transformed latent representations in a hierarchical manner. These approaches are designed to compress only the progressively added information as the quality improves, considering that a wider quantization interval for lower-quality compression includes multiple narrower sub-intervals for higher-quality compression. However, the existing methods are based on handcrafted quantization hierarchies, resulting in sub-optimal compression efficiency. In this paper, we propose an NN-based progressive coding method that firstly utilizes learned quantization step sizes via learning for each quantization layer. We also incorporate selective compression with which only the essential representation components are compressed for each quantization layer. We demonstrate that our method achieves significantly higher coding efficiency than the existing approaches with decreased decoding time and reduced model size.
- Asia > South Korea > Daejeon > Daejeon (0.04)
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Plymouth County > Norwell (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)