current implementation
Open-Source Acceleration of Stable-Diffusion.cpp Deployable on All Devices
Ng, Jingxu, Lv, Cheng, Zhao, Pu, Niu, Wei, Lin, Juyi, Pan, Minzhou, Liang, Yun, Wang, Yanzhi
Stable diffusion plays a crucial role in generating high-quality images. However, image generation is time-consuming and memory-intensive. To address this, stable-diffusion.cpp (Sdcpp) emerges as an efficient inference framework to accelerate the diffusion models. Although it is lightweight, the current implementation of ggml_conv_2d operator in Sdcpp is suboptimal, exhibiting both high inference latency and massive memory usage. To address this, in this work, we present an optimized version of Sdcpp leveraging the Winograd algorithm to accelerate 2D convolution operations, which is the primary bottleneck in the pipeline. By analyzing both dependent and independent computation graphs, we exploit the device's locality and parallelism to achieve substantial performance improvements. Our framework delivers correct end-to-end results across various stable diffusion models, including SDv1.4, v1.5, v2.1, SDXL, and SDXL-Turbo. Our evaluation results demonstrate a speedup up to 2.76x for individual convolutional layers and an inference speedup up to 4.79x for the overall image generation process, compared with the original Sdcpp on M1 pro. Homepage: https://github.com/SealAILab/stable-diffusion-cpp
Branch and Bound for Semi-Supervised Support Vector Machines
Semi-supervised SVMs (S3 VM) attempt to learn low-density separators by maximizing the margin over labeled and unlabeled examples. The associated optimization problem is non-convex. To examine the full potential of S3 VMs modulo local minima problems in current implementations, we apply branch and bound techniques for obtaining exact, global ly optimal solutions. Empirical evidence suggests that the globally optimal solution can return excellent generalization performance in situations where other implementations fail completely. While our current implementation is only applicable to small datasets, we discuss variants that can potentially lead to practically useful algorithms.
Predictive Vision in a nutshell
I've elaborated in my previous post on why I think predictive capability is crucial for an intelligent agent and how we get fooled by getting 90% of motor commands right from a purely reactive system. This also relates to a way of thinking of the problem in terms of either statistics or dynamics. The current mainstream (statistical majority) is focused on statistics and that statistically works. However much like with guiding behavior, statistical majority may omit important outliers - important information is often hidden in the tail of the distribution. I've mentioned the Predictive Vision Model which is our (me and a few colleagues that think alike) way to introduce predictive paradigm into machine learning.
A Massively Parallel Self-Tuning Context-Free Parser
ABSTRACT The Parsing and Learning System(PALS) is a massively parallel self-tuning context-free parser. It is capable of parsing sentences of unbounded length mainly due to its parse-tree representation scheme. The system is capable of improving its parsing performance through the presentation of training examples. INTRODUCTION Recent PDP research[Rumelhart et al.- 1986; Feldman and Ballard, 1982; Lippmann, 1987] involving natural language processtng[Fanty, 1988; Selman, 1985; Waltz and Pollack, 1985] have unrealistically restricted sentences to a fixed length. A solution to this problem was presented in the system CONPARSE[Charniak and Santos.
A Massively Parallel Self-Tuning Context-Free Parser
ABSTRACT The Parsing and Learning System(PALS) is a massively parallel self-tuning context-free parser. It is capable of parsing sentences of unbounded length mainly due to its parse-tree representation scheme. The system is capable of improving its parsing performance through the presentation of training examples. INTRODUCTION Recent PDP research[Rumelhart et al.- 1986; Feldman and Ballard, 1982; Lippmann, 1987] involving natural language processtng[Fanty, 1988; Selman, 1985; Waltz and Pollack, 1985] have unrealistically restricted sentences to a fixed length. A solution to this problem was presented in the system CONPARSE[Charniak and Santos.
A Massively Parallel Self-Tuning Context-Free Parser
ABSTRACT The Parsing and Learning System(PALS) is a massively parallel self-tuning context-free parser. It is capable of parsing sentences of unbounded length mainly due to its parse-tree representation scheme. The system is capable of improving its parsing performance through the presentation of training examples. INTRODUCTION Recent PDP research[Rumelhart et al.- 1986; Feldman and Ballard, 1982; Lippmann, 1987] involving natural language processtng[Fanty, 1988; Selman, 1985; Waltz and Pollack, 1985] have unrealistically restricted sentences to a fixed length. A solution to this problem was presented in the system CONPARSE[Charniak and Santos.