Goto

Collaborating Authors

 bitmap


Reverse Browser: Vector-Image-to-Code Generator

Toth-Czifra, Zoltan

arXiv.org Artificial Intelligence

Automating the conversion of user interface design into code (image-to-code or image-to-UI) is an active area of software engineering research. However, the state-of-the-art solutions do not achieve high fidelity to the original design, as evidenced by benchmarks. In this work, I approach the problem differently: I use vector images instead of bitmaps as model input. I create several large datasets for training machine learning models. I evaluate the available array of Image Quality Assessment (IQA) algorithms and introduce a new, multi-scale metric. I then train a large open-weights model and discuss its limitations.


Row-Column Hybrid Grouping for Fault-Resilient Multi-Bit Weight Representation on IMC Arrays

Jeon, Kang Eun, Yeon, Sangheum, Kim, Jinhee, Bang, Hyeonsu, Rhe, Johnny, Ko, Jong Hwan

arXiv.org Artificial Intelligence

--This paper addresses two critical challenges in analog In-Memory Computing (IMC) systems that limit their scalability and deployability: the computational unreliability caused by stuck-at faults (SAFs) and the high compilation overhead of existing fault-mitigation algorithms, namely Fault-Free (FF). T o overcome these limitations, we first propose a novel multi-bit weight representation technique, termed row-column hybrid grouping, which generalizes conventional column grouping by introducing redundancy across both rows and columns. This structural redundancy enhances fault tolerance and can be effectively combined with existing fault-mitigation solutions. Further acceleration is achieved through theoretical insights that identify fault patterns amenable to trivial solutions, significantly reducing computation. Experimental results on convolutional networks and small language models demonstrate the effectiveness of our approach, achieving up to 8%p improvement in accuracy, 150 faster compilation, and 2 energy efficiency gain compared to existing baselines. The In-Memory Computing (IMC) paradigm marks a trans-formative shift toward non-von Neumann architectures by allowing data processing to occur directly within the memory array [1]-[4], thereby minimizing the overhead associated with off-chip data movement [5]. Among various implementations, analog IMC systems based on Resistive Random Access Memory (ReRAM) crossbar arrays have emerged as a particularly promising solution. These systems perform energy-efficient matrix-vector multiplication (MVM) [3], [4], a core operation that forms the computational backbone of modern deep learning systems. As such, analog IMC has become a focal point in DNN acceleration and efficient AI research, spearheading cutting-edge investigations in approximate computing, heterogeneous computing, and alternative learning paradigms. To perform MVM in the analog domain, the weights are stored as conductance values in ReRAM cells; input features are applied as voltages to the word lines, and the resulting bit-line currents naturally multiply-and-accumulate following Ohm's and Kirchhoff's laws [6], [7].


GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Sim, Woochang, Ryu, Hyunseok, Choi, Kyungmin, Han, Sungwon, Kim, Sundong

arXiv.org Artificial Intelligence

The Abstraction and Reasoning Corpus (ARC) poses a stringent test of general AI capabilities, requiring solvers to infer abstract patterns from only a handful of examples. Despite substantial progress in deep learning, state-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition, indicative of a significant gap between their performance and human-level reasoning. In this work, we seek to bridge that gap by introducing an analogy-inspired ARC dataset, GIFARC. Leveraging large language models (LLMs) and vision-language models (VLMs), we synthesize new ARC-style tasks from a variety of GIF images that include analogies. Each new task is paired with ground-truth analogy, providing an explicit mapping between visual transformations and everyday concepts. By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate the task analogically before engaging in brute-force pattern search, thus efficiently reducing problem complexity and build a more concise and human-understandable solution. We empirically validate that guiding LLM with analogic approach with GIFARC affects task-solving approaches of LLMs to align with analogic approach of human.


ARB-LLM: Alternating Refined Binarizations for Large Language Models

Li, Zhiteng, Yan, Xianglong, Zhang, Tianao, Qin, Haotong, Xie, Dong, Tian, Jiang, shi, zhongchao, Kong, Linghe, Zhang, Yulun, Yang, Xiaokang

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have greatly pushed forward advancements in natural language processing, yet their high memory and computational demands hinder practical deployment. Binarization, as an effective compression technique, can shrink model weights to just 1 bit, significantly reducing the high demands on computation and memory. However, current binarization methods struggle to narrow the distribution gap between binarized and full-precision weights, while also overlooking the column deviation in LLM weight distribution. To tackle these issues, we propose ARB-LLM, a novel 1-bit post-training quantization (PTQ) technique tailored for LLMs. To narrow the distribution shift between binarized and full-precision weights, we first design an alternating refined binarization (ARB) algorithm to progressively update the binarization parameters, which significantly reduces the quantization error. Moreover, considering the pivot role of calibration data and the column deviation in LLM weights, we further extend ARB to ARB-X and ARB-RC. In addition, we refine the weight partition strategy with column-group bitmap (CGB), which further enhance performance. Equipping ARB-X and ARB-RC with CGB, we obtain ARB-LLM$_\text{X}$ and ARB-LLM$_\text{RC}$ respectively, which significantly outperform state-of-the-art (SOTA) binarization methods for LLMs. As a binary PTQ method, our ARB-LLM$_\text{RC}$ is the first to surpass FP16 models of the same size. The code and models will be available at https://github.com/ZHITENGLI/ARB-LLM.


One-Class Classification as GLRT for Jamming Detection in Private 5G Networks

Varotto, Matteo, Valentin, Stefan, Ardizzon, Francesco, Marzotto, Samuele, Tomasin, Stefano

arXiv.org Artificial Intelligence

5G mobile networks are vulnerable to jamming attacks that may jeopardize valuable applications such as industry automation. In this paper, we propose to analyze radio signals with a dedicated device to detect jamming attacks. We pursue a learning approach, with the detector being a CNN implementing a GLRT. To this end, the CNN is trained as a two-class classifier using two datasets: one of real legitimate signals and another generated artificially so that the resulting classifier implements the GLRT. The artificial dataset is generated mimicking different types of jamming signals. We evaluate the performance of this detector using experimental data obtained from a private 5G network and several jamming signals, showing the technique's effectiveness in detecting the attacks.


Localizing the conceptual difference of two scenes using deep learning for house keeping usages

Atghaei, Ali, Rahnama, Ehsan, Azimi, Kiavash

arXiv.org Artificial Intelligence

Finding the conceptual difference between the two images in an industrial environment has been especially important for HSE purposes and there is still no reliable and conformable method to find the major differences to alert the related controllers. Due to the abundance and variety of objects in different environments, the use of supervised learning methods in this field is facing a major problem. Due to the sharp and even slight change in lighting conditions in the two scenes, it is not possible to naively subtract the two images in order to find these differences. The goal of this paper is to find and localize the conceptual differences of two frames of one scene but in two different times and classify the differences to addition, reduction and change in the field. In this paper, we demonstrate a comprehensive solution for this application by presenting the deep learning method and using transfer learning and structural modification of the error function, as well as a process for adding and synthesizing data. An appropriate data set was provided and labeled, and the model results were evaluated on this data set and the possibility of using it in real and industrial applications was explained.


Efficient Data-Plane Memory Scheduling for In-Network Aggregation

Wang, Hao, Qin, Yuxuan, Lao, ChonLam, Le, Yanfang, Wu, Wenfei, Chen, Kai

arXiv.org Artificial Intelligence

As the scale of distributed training grows, communication becomes a bottleneck. To accelerate the communication, recent works introduce In-Network Aggregation (INA), which moves the gradients summation into network middle-boxes, e.g., programmable switches to reduce the traffic volume. However, switch memory is scarce compared to the volume of gradients transmitted in distributed training. Although literature applies methods like pool-based streaming or dynamic sharing to tackle the mismatch, switch memory is still a potential performance bottleneck. Furthermore, we observe the under-utilization of switch memory due to the synchronization requirement for aggregator deallocation in recent works. To improve the switch memory utilization, we propose ESA, an $\underline{E}$fficient Switch Memory $\underline{S}$cheduler for In-Network $\underline{A}$ggregation. At its cores, ESA enforces the preemptive aggregator allocation primitive and introduces priority scheduling at the data-plane, which improves the switch memory utilization and average job completion time (JCT). Experiments show that ESA can improve the average JCT by up to $1.35\times$.


S2 Reducer: High-Performance Sparse Communication to Accelerate Distributed Deep Learning

Ge, Keshi, Fu, Yongquan, Lai, Zhiquan, Deng, Xiaoge, Li, Dongsheng

arXiv.org Artificial Intelligence

Distributed stochastic gradient descent (SGD) approach has been widely used in large-scale deep learning, and the gradient collective method is vital to ensure the training scalability of the distributed deep learning system. Collective communication such as AllReduce has been widely adopted for the distributed SGD process to reduce the communication time. However, AllReduce incurs large bandwidth resources while most gradients are sparse in many cases since many gradient values are zeros and should be efficiently compressed for bandwidth saving. To reduce the sparse gradient communication overhead, we propose Sparse-Sketch Reducer (S2 Reducer), a novel sketch-based sparse gradient aggregation method with convergence guarantees. S2 Reducer reduces the communication cost by only compressing the non-zero gradients with count-sketch and bitmap, and enables the efficient AllReduce operators for parallel SGD training. We perform extensive evaluation against four state-of-the-art methods over five training models. Our results show that S2 Reducer converges to the same accuracy, reduces 81\% sparse communication overhead, and achieves 1.8$ \times $ speedup compared to state-of-the-art approaches.


Adobe adds AI-powered masking tools to Lightroom

Engadget

Adobe has revealed some new masking upgrades that are coming to Lightroom, Lightroom Classic and Adobe Camera Raw (or ACR, Photoshop's raw photo processing tool). The company calls it the "biggest change to providing control over selectively enhancing photos" since it released Lightroom 2 in 2008. The Adobe Research team wanted to bring AI-powered selection tools such as Select Subject and Sky Replacement from Photoshop into Lightroom and ACR, but the image processing engine used in the latter two was incompatible. The team had to make some big changes under the hood, which gave it a chance to change how selections are handled in Lightroom. Until now, ACR, Lightroom and Lightroom Classic have only supported vector-based selections (which are recorded as mathematical expressions), but the AI-powered masks need bitmap (or image-based) support. So, to bring the AI-based tools to those apps, Adobe had to make both approaches work together.


Behind the painstaking process of creating Chinese computer fonts

MIT Technology Review

Bruce Rosenblum switched on his Apple II, which rang out a high F note followed by the clatter of the floppy drive. After a string of thock thock keystrokes, the 12-inch Sanyo monitor began to phosphoresce. A green grid appeared, 16 units wide and 16 units tall. This was "Gridmaster," a program Bruce had cooked up in the programming language BASIC to build one of the world's first Chinese digital fonts. He was developing the font for an experimental machine called the Sinotype III, which was among the first personal computers to handle Chinese-language input and output.