superblock
UniAttn: Reducing Inference Costs via Softmax Unification for Post-Training LLMs
Xiong, Yizhe, Huang, Wei, Ye, Xin, Chen, Hui, Lin, Zijia, Lian, Haoran, Su, Zhenpeng, Han, Jungong, Ding, Guiguang
Post-training is essential for adapting Large Language Models (LLMs) to real-world applications. Deploying post-trained models faces significant challenges due to substantial memory overhead and noticeable inference latency. Existing work has identified significant redundancies in LLMs and proposed efficient architectures, namely intra-layer KV sharing and cross-layer KV sharing. However, intra-layer KV sharing still results in high inference costs, while cross-layer KV sharing leads to significant performance degradation. As a result, both methods remain suboptimal for post-training pre-trained LLMs. In this paper, we identify that the \texttt{Softmax} operation is a primary bottleneck for LLM inference and discover that it is actually highly redundant during post-training. We propose Softmax \textbf{Uni}fication in \textbf{Att}e\textbf{n}tion (\textbf{UniAttn}), a novel post-training method that unifies Softmax activations across transformer blocks to reduce LLM inference costs. Additionally, UniAttn adopts a linear projection to compensate for the errors induced by Softmax unification. Experiments show that UniAttn matches the performance of standard post-training while significantly reducing inference costs, outperforming existing efficient architectures during post-training. Our code will be available at \url{https://github.com/Bostoncake/UniAttn}.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
DISGO: Automatic End-to-End Evaluation for Scene Text OCR
Hwang, Mei-Yuh, Shi, Yangyang, Ramchandani, Ankit, Pang, Guan, Krishnan, Praveen, Kabela, Lucas, Seide, Frank, Datta, Samyak, Liu, Jun
This paper discusses the challenges of optical character recognition (OCR) on natural scenes, which is harder than OCR on documents due to the wild content and various image backgrounds. We propose to uniformly use word error rates (WER) as a new measurement for evaluating scene-text OCR, both end-to-end (e2e) performance and individual system component performances. Particularly for the e2e metric, we name it DISGO WER as it considers Deletion, Insertion, Substitution, and Grouping/Ordering errors. Finally we propose to utilize the concept of super blocks to automatically compute BLEU scores for e2e OCR machine translation. The small SCUT public test set is used to demonstrate WER performance by a modularized OCR system.
- Europe > Belgium (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Improving Probabilistic Bisimulation for MDPs Using Machine Learning
Mohaghegh, Mohammadsadegh, Salehi, Khayyam
The utilization of model checking has been suggested as a formal verification technique for analyzing critical systems. However, the primary challenge in applying to complex systems is state space explosion problem. To address this issue, bisimulation minimization has emerged as a prominent method for reducing the number of states in a labeled transition system, aiming to overcome the difficulties associated with the state space explosion problem. In the case of systems exhibiting stochastic behaviors, probabilistic bisimulation is employed to minimize a given model, obtaining its equivalent form with fewer states. Recently, various techniques have been introduced to decrease the time complexity of the iterative methods used to compute probabilistic bisimulation for stochastic systems that display nondeterministic behaviors. In this paper, we propose a new technique to partition the state space of a given probabilistic model to its bisimulation classes. This technique uses the PRISM program of a given model and constructs some small versions of the model to train a classifier. It then applies machine learning classification techniques to approximate the related partition. The resulting partition is used as an initial one for the standard bisimulation technique in order to reduce the running time of the method. The experimental results show that the approach can decrease significantly the running time compared to state-of-the-art tools.
- Asia > Middle East > Oman (0.46)
- South America > Peru > Loreto Department (0.14)
- Asia > Middle East > Iran (0.04)
- (6 more...)
Towards A Visual Programming Tool to Create Deep Learning Models
Calò, Tommaso, De Russis, Luigi
Deep Learning (DL) developers come from different backgrounds, e.g., medicine, genomics, finance, and computer science. To create a DL model, they must learn and use high-level programming languages (e.g., Python), thus needing to handle related setups and solve programming errors. This paper presents DeepBlocks, a visual programming tool that allows DL developers to design, train, and evaluate models without relying on specific programming languages. DeepBlocks works by building on the typical model structure: a sequence of learnable functions whose arrangement defines the specific characteristics of the model. We derived DeepBlocks' design goals from a 5-participants formative interview, and we validated the first implementation of the tool through a typical use case. Results are promising and show that developers could visually design complex DL architectures.
- North America > United States > New York > New York County > New York City (0.05)
- Europe > Italy > Piedmont > Turin Province > Turin (0.04)
- North America > United States > Georgia > Chatham County > Savannah (0.04)
- (3 more...)
- Research Report (0.50)
- Questionnaire & Opinion Survey (0.47)
Predicting the Performance of a Computing System with Deep Networks
Cengiz, Mehmet, Forshaw, Matthew, Atapour-Abarghouei, Amir, McGough, Andrew Stephen
Predicting the performance and energy consumption of computing hardware is critical for many modern applications. This will inform procurement decisions, deployment decisions, and autonomic scaling. Existing approaches to understanding the performance of hardware largely focus around benchmarking -- leveraging standardised workloads which seek to be representative of an end-user's needs. Two key challenges are present; benchmark workloads may not be representative of an end-user's workload, and benchmark scores are not easily obtained for all hardware. Within this paper, we demonstrate the potential to build Deep Learning models to predict benchmark scores for unseen hardware. We undertake our evaluation with the openly available SPEC 2017 benchmark results. We evaluate three different networks, one fully-connected network along with two Convolutional Neural Networks (one bespoke and one ResNet inspired) and demonstrate impressive $R^2$ scores of 0.96, 0.98 and 0.94 respectively.
- Europe > Portugal > Coimbra > Coimbra (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- Europe > United Kingdom > England > Tyne and Wear > Newcastle (0.04)
- (5 more...)
AppStreamer: Reducing Storage Requirements of Mobile Games through Predictive Streaming
Theera-Ampornpunt, Nawanol, Suryavansh, Shikhar, Manchanda, Sameer, Panta, Rajesh, Joshi, Kaustubh, Ammar, Mostafa, Chiang, Mung, Bagchi, Saurabh
Storage has become a constrained resource on smartphones. Gaming is a popular activity on mobile devices and the explosive growth in the number of games coupled with their growing size contributes to the storage crunch. Even where storage is plentiful, it takes a long time to download and install a heavy app before it can be launched. This paper presents AppStreamer, a novel technique for reducing the storage requirements or startup delay of mobile games, and heavy mobile apps in general. AppStreamer is based on the intuition that most apps do not need the entirety of its files (images, audio and video clips, etc.) at any one time. AppStreamer can, therefore, keep only a small part of the files on the device, akin to a "cache", and download the remainder from a cloud storage server or a nearby edge server when it predicts that the app will need them in the near future. AppStreamer continuously predicts file blocks for the near future as the user uses the app, and fetches them from the storage server before the user sees a stall due to missing resources. We implement AppStreamer at the Android file system layer. This ensures that the apps require no source code or modification, and the approach generalizes across apps. We evaluate AppStreamer using two popular games: Dead Effect 2, a 3D first-person shooter, and Fire Emblem Heroes, a 2D turn-based strategy role-playing game. Through a user study, 75% and 87% of the users respectively find that AppStreamer provides the same quality of user experience as the baseline where all files are stored on the device. AppStreamer cuts down the storage requirement by 87% for Dead Effect 2 and 86% for Fire Emblem Heroes.
- Asia > Thailand > Phuket > Phuket (0.04)
- North America > United States > Illinois (0.04)
- North America > Mexico > Gulf of Mexico (0.04)
- Asia > India (0.04)
- Leisure & Entertainment > Games > Computer Games (1.00)
- Information Technology (1.00)
- Information Technology > Communications > Networks (1.00)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
Alphabet's smart neighborhood could have shape-shifting 'superblocks'
Too many cities are built around cars rather than people. Sidewalk Labs, an offshoot of Google's parent company Alphabet, wants its smart neighborhood in Toronto to be different. It's considering a so-called superblock concept, modeled after Barcelona's, that bundles smaller streets together and limits vehicles to the perimeter. The smaller lanes inside each superblock would then become safer, quieter spaces for pedestrians and cyclists. Sidewalk Labs wants to go a step further, though, with real-time traffic monitoring and movable street furniture.
- North America > Canada > Ontario > Toronto (0.27)
- North America > United States > California (0.05)
- Europe > United Kingdom > England > Dorset > Bournemouth (0.05)